AWS is a Magic Trick

AWS is a magic trick. There’s no “abracadabra,” but call an API and suddenly you have an EC2 Instance, or 10, or hundreds. Call another and you have a database. Or a storage bucket with vaster capacity than you could realistically hope to fill. But behind the curtains, huge amounts of real-world labor, logistics, and problem solving churn constantly to make this magic possible.

To give you an idea, the data centers that make up one AWS availability zone are each about the size of an IKEA store. Multiple floors of massive halls of access aisles and rows upon rows of racks alternating like tilled drills on a farm. These racks each house dozens of servers, home to anything from hundreds to millions of customers depending on the application. Standing amongst the aisles, you can sense the airflow and the calm rumbling of liquid cooling pumps regulating the rooms.

These data centers are clinically pristine, their uniform walls cluttered only by the safety posters stuck haphazardly along each and every corridor, a testament to an obsession with security. Even if you wanted to make a mess, you couldn’t get anything in to make a mess with (you can’t get anything out either). I tell people that the security in our data centers isn’t like the security you see at the airport. It’s more like the security you’d see at a military base. Guards following rigorous procedures, pervasive cameras, air-locked double door entries and exits, metal detectors, on-site physical destruction of storage media, and much more that I can’t go into. It somehow feels like being on a sci-fi movie set, a light construction site, and a greenhouse all at the same time.

That’s just one data center—a large availability zone might have more than a dozen. To keep latency low, they’re physically close to each other. Usually it’s a short walk, though you might have to exit and enter through more security layers to get there. The data center clusters making up different availability zones, though, will be further apart, often on different sides of a metropolitan area. This distance minimizes the risk of a natural disaster, like a flood or a fire, affecting them at the same time. It can take tens of minutes to drive between them, even longer if you’re dealing with DC Beltway or Bay Area traffic. In Tokyo, one of our availability zones is even farther away than usual because of the unique risks of major earthquakes in the area.

Things are constantly changing within our datacenters. Nothing is static. Every day, new chips, servers, and racks are being manufactured and installed. New racks roll in, old racks are securely erased or destroyed, rolled out, and dispatched for recycling. In our availability zones alone, there are hundreds of people working behind the scenes to manage all of this. There are thousands more in the manufacturing and transportation processes. To keep capacity agile, the whole supply chain is run just in time. The time between raw materials and transformation into running systems is measured in days.

At the same time, the network is constantly being expanded to keep up. Every day, there are massive ships sailing across oceans laying the latest high-capacity links. Across land, newly upgraded fiber paths are being pulled through trenches. At our scale, this isn’t just for capacity, but for diversity and resilience. More links following more physical paths so that they don’t break at the same time.

Perhaps the most amazing part is that no one ever sat down and designed all of this, not exactly—there’s a touch of the organic to it. The real magic powering the scale of AWS is a dynamic collaboration of people and departments. Each person and department obsessing over details and interfaces to make sure that a machine of unprecedented scale keeps humming along. Humans in the loop, that’s what’s behind those API calls.