Operational Excellence

I had the opportunity and privilege to be part of AWS Elemental MediaStore, a top-notch AWS team building and operating a service to deliver high performance storage for video workflows.

With the objective of sharing for greater good, and perhaps for my own future reference, here begins a series of posts diving into some of the interrelated topics on achieving Operational Excellence.

These notes are insights into development and operational best practices that are critical to large scale software projects, specially those in which customers depend on your availability and performance 24/7.

Disclaimer: the contents of this and related posts are personal notes written by me and not endorsed by AWS or any other company.

Defining Operational Excellence

AWS Well-Architected is one official venue where AWS shares best practice knowledge with the general public, allowing other companies to follow lessons learned through decades of solving hard problems.

Here is what it says about Operational Excellence:

The Operational Excellence pillar includes the ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures.

My take is that Operational Excellence is the air Amazon SDEs breathe day after day. It consists of practices and principles discovered and developed while building and operating one of Earth’s biggest websites and marketplaces, Amazon.com, as well as providing the world’s largest portfolio of infrastructure and software as a service offerings, AWS.

Let’s begin

The ideas that will be discussed in this series apply directly to SaaS products, and they may be as well applicable to other contexts.

This series is by no means all-encompassing. It is, however, a personally organized collection that may help you improve your existing processes and procedures. Without further ado, let’s introduce the opening act: Automated Tests as the Basis for Operational Excellence.