The 12-Factor Pattern Applied to Cloud Architecture

Erik OstermanCloud Architecture & PlatformsLeave a Comment

Heroku has deployed more services in a cloud environment than probably any other company. They operate a massive “Platform-as-a-Service” that enables someone to deploy most kinds of apps just by doing a simple git push. Along the way, they developed a pattern for how to write applications so that they can be easily and consistently deployed in cloud environments. Their platform abides by this pattern, but it can be implemented in many ways. 

The 12-factor pattern can be summed up like this:

Treat all micro-services as disposable services that receive their configuration via environment variables and rely on backing services to provide durability. Any time you need to make a change it should be scripted. Treat all environments (dev, prod, qa, etc) as identical.

Of course, this assumes that the cloud-architecture plays along with this methodology for it to work. For a cloud-architecture to be “12 factor app” compliant, here are some recommended criteria.

1. Codebase

  1. Applications can be pinned to a specific version or branch
  2. All deployments are versioned
  3. Multiple concurrent versions can be deployed at the same time (e.g. prod, dev, qa)

2. Dependencies

  1. Service dependencies are explicitly declared and loosely coupled
  2. Dependencies can be isolated between services
  3. Services can be logically grouped together

3. Config

  1. All configuration is passed via environment variables and not hardcoded.
  2. Services can announce availability and discover other services
  3. Services can be dynamically reconfigured (E.g. using feature flags or changing environment)

4. Backing Services

  1. Services depend on object stores to store assets (if applicable)
  2. Services use environment variables to find backing services
  3. Platform supports backing services like MySQL, Redis or Memcache

5. Build, release, run (PaaS)

  1. Automation of deployments (build, release, run)
  2. All builds produce immutable images
  3. Deployment should result in zero down-time

6. Processes

  1. Micro-services should consist of a single process
  2. Processes are stateless and share-nothing
  3. Ephemeral filesystem can be used for temporary storage

7. Port binding

  1. Services should be able to run on any port defined by the platform
  2. Service discovery should incorporate ports
  3. Some sort of routing layer handles requests and distributes them to port-bound processes

8. Concurrency

  1. Concurrency is easily achieved by replicating the micro-service
  2. Scale automatically without human intervention
  3. Only sends traffic to healthy services

9. Disposability

  1. Services are entirely disposable (not maintain any local state)
  2. They can be easily created or destroyed
  3. They are not upgraded or patched (just redeploy!)

10. Dev/prod parity

  1. All environments function the same way
  2. Guarantees that all services in an environment are identical
  3. Code runs the same way in all places

11. Logs

  1. Logs treated as structured event streams (e.g. JSON) that can be subscribed to by multiple consumers
  2. Logs collected from all nodes in cluster and centrally aggregated for auditing all activity
  3. Alerts can be generated from log events

12. Admin processes

  1. It should never be necessary to login to servers to manually make changes
  2. APIs exist so that everything can be scripted
  3. Regular tasks can be scheduled to run automatically

Building a DevOps Culture

Erik OstermanDevOpsLeave a Comment

A question that often comes from well-established organizations with “mature” infrastructures is the following:

How can an organization instill a new engineering culture where developers and operations are working with each other and not against each other?

We affectionally call this “DevOps” movement; it's a culture where developers and ops work together and not against each other. Often their distinct roles are indistinguishable. Developers are confident on the command-line just as much as ops are confident in the IDE.

The key to succeeding with Devops is demonstrating to developers that it will actually make their job easier, such as when they can debug issues. Likewise, Ops needs to see the developers as a resource who can reduce the number of sleepless nights they experience as a result of failed deployments and buggy code. The role of ops in a DevOps culture is to enable devs to operate more efficiently. The role of devs in the organization is to build applications which are easily deployable.

Here’s what needs to happen. Ops needs to take a first step in standardizing the way software gets deployed. Start with taking a look at the current open source tools available. Choose one. Then they take those recipes and build a local development environment using something like Docker Compose or Vagrant so that developers can start getting familiar with it. Next, developers need to embrace the local dev environment over “native” environments (e.g. those that take a day of configuration and downloading packages). Through this process, they build up operational competency in debugging issues in environments that mirror production. After several months of operating like this, developers should shadow ops in certain roles, such as deploying software.

Once the above system is in place, the next step is to increase monitoring coverage and make this transparent to everyone in the org. In addition to your standard checks (like Nagios), you’re going to need to deploy something like DataDog or NewRelic. This gives your developers insights into how their apps are function in production and the necessary information to diagnose bugs. Tasks get created for every warning or error in production that is not handled in the proper manner. Once these alerts are calibrated to less than a few critical incidents per week, they should get wired up with PagerDuty. This is where the most resistance is usually encountered, but this is what keeps everything going. Without skin in the game, developers have no incentive make sure their code is highly reliable and ops are tied in what they can do to fix problems. But with only a few critical incidents, devs will be quickly motivated to fix the bugs and silence the alarms.

The last triumph in this conversion is to achieve “Continuous Integration” (CI) coupled with “Continuous Deployment”. Before this can even be remotely considered, considerable unit test coverage is needed. Stay tuned for more posts on these topics.