How to Pick Your Primary AWS Region?

Erik OstermanCloud Architecture & Platforms, DevOpsLeave a Comment

While your company might operate in multiple regions, one region should typically be selected as the primary region. Certain resources will not be geographically distributed, and these should be provisioned in this default region.

When building out your AWS infrastructure from scratch, it's a good time to revisit decisions that might have been made decades ago. Many new AWS regions might be better suited for the business.

Customer Proximity

One good option is picking a default region that is closest to where the majority of end-users reside.

Business Headquarters

Frequently we see the default region selected that is closest to where the majority of business operations take place. This is especially true if most of the services in the default region will be consumed by the business itself.


When operating on AWS, selecting a region other than us-east-1 is advisable as this is the default region (or used to be) for most AWS users. It has historically had the most service interruptions presumably because it is one of the most heavily-used regions and operates at a scale much larger than other AWS regions. Therefore we advise using us-east-2 over us-east-1 and the latencies between these regions are very minimal.

High Availability / Availability Zones

Not all AWS regions support the same number of availability zones. Many regions only offer (2) availability zones when a minimum of (3) is recommended when operating kubernetes to avoid “split-brain” problems.


Not all regions cost the same to operate. On the other hand, if you have significant resources deployed in an existing region, migrating to a new region could be cost-prohibitive; data transfer costs are not cheap, and petabyte-scale S3 buckets would be costly to migrate.

Service Availability

Not all regions offer the full suite of AWS services or receive new services at the same rate as others. The newest regions frequently lack many of the newer services. Other times, certain regions receive platform infrastructure updates slower than others. Also, AWS now offers Local Zones (e.g. us-west-2-lax-1a) which operate a subset of AWS services.

Instance Types

Not all instance types are available in all regions


The latency between infrastructure across regions could be a factor. See for more information.


Should You Run Stateful Systems via Container Orchestration?

Erik OstermanCloud Architecture & Platforms, DevOpsLeave a Comment

Recently it was brought up that ThoughtWorks now says that:

We recommend caution in managing stateful systems via container orchestration platforms such as Kubernetes. Some databases are not built with native support for orchestration — they don’t expect a scheduler to kill and relocate them to a different host. Building a highly available service on top of such databases is not trivial, and we still recommend running them on bare metal hosts or a virtual machine (VM) rather than to force-fit them into a container orchestration platform

This is just more FUD that deserves to be cleared up. First, not all container management platforms are the same. I can only address from experience, what it means for Kubernetes. Kubernetes is ideally suited to run these kinds of workloads when used properly.

NOTE: Just so we're clear–our recommendation for production-grade infrastructure is to always use a fully-managed service like RDS, Kinesis, MSK, Elasticache, etc rather than self-hosting it, whether it be on Kubernetes or bare-metal/VMs. Of course, that only works if these services meet your requirements.

To set the record straight, Kubernetes won't randomly kill Pods and relocate them to a different host if configured correctly. First, by setting requested resources equal to the limits, the pods will have a Guaranteed QoS (Quality of Service) – the highest scheduling priority and be the last ones evicted. Then by setting a PodDisruptionBudget, we can be very explicit on what sort of “SLA” we want on our pods.

The other recommendation is to use the appropriate replication controller for the Pods. For databases, it's typically recommended to use StatefulSets (formerly called PetSets for a good reason!). With StatefulSets, we get the same kinds of lifecycle semantics when working with discrete VMs. We can get static IPs, assurances that there won't ever be 2 concurrent pods (“Pets”) with the same name, etc. We've experienced first hand how some applications like Kafka hate it when their IP changes. StatefulSets solve that.

If StatefulSets are not enough of a guarantee, we can provision dedicated node pools. These node pools can even run on bare-metal to assuage even the staunchest critics of virtualization. Using taints and tolerations, we can ensure that the databases on run exactly where want them. There's no risk that the “spot instance” will randomly nuke the pod. Then using affinity rules, we can ensure that the Kubernetes scheduler places the workloads as best as possible on different physical nodes.

Lastly, Kubernetes above all else is a framework for consistent cloud operations. It exposes all the primitives that developers need to codify the business logic required to operate even the most complex business applications. Contrast this to ThoughtWorks' recommendation of running applications on bare metal hosts or a virtual machine (VM) rather than to “force-fit” into a container orchestration platform: when you “roll your own”, almost no organization posses the in-house skillsets to orchestrate and automate this system effectively. In fact, this kind of skillset used to only be posses by technology like Google and Netflix. Kubernetes has leveled the playing field.

Using Kubernetes Operators, the business logic of how to operate a highly available legacy application or cloud-native application can be captured and codified. There's an ever-growing list of operators. Companies have popped up whose whole business model is around building robust operators to manage databases in Kubernetes. Because this business logic is captured in code, it can be iterated and improved upon. As companies encounter new edge-cases those can be addressed by the operator, so that everyone benefits. With the traditional “snowflake” approach where every company implements its own kind of Rube Goldberg apparatus. Hard lessons learned are not shared and we're back in the dark ages of cloud computing.

As with any tool, it's the operator's responsibility to know how to operate it. There are a lot of ways to blow one's leg off using Kubernetes. Kubernetes is a tool that when used the right way, will unlock the superpowers your organization needs.

Rock Solid WordPress

Erik OstermanCloud Architecture & PlatformsLeave a Comment

Learn how Cloud Posse recently architected and implemented WordPress for massive scale on Amazon EC2. We'll show you exactly the tools that we used and our recipe to both secure and power WordPress setups on AWS using Elastic Beanstalk, EFS, CodePipeline, Memcached, Aurora and Varnish.

Managing Secrets in Production

Erik OstermanCloud Architecture & PlatformsLeave a Comment

Secrets are any sensitive piece of information (like a password, API token, TLS private key) that must be kept safe. This presentation is a practical guide covering what we've done at Cloud Posse to lock down secrets in production. It includes our answer to avoid the same pitfalls that Shape Shift encountered when they were hacked. The techniques presented are compatible with automated cloud environments and even legacy systems.

Tips on Writing Go Microservices for Kubernetes

Erik OstermanCloud Architecture & Platforms, Release Engineering & CI/CDLeave a Comment

Kelsey Hightower, a Google Developer Advocate and Google Cloud Platform evangelist, recently gave a very helpful screencast demonstrating some of the tips & tricks he uses when developing Go microservices for Kubernetes & docker. Among other things, he recommends being very verbose during the initialization of your app by outputting environment variables and port bindings. He also raises the important distinction between readiness probes and liveness probes and when to use them. Lastly, in the Q&A he explains why it's advantageous to use resource files instead of editing live configurations because the former fits better in to a pull-request workflow that many companies already use as part of the CI/CD pipeline.