AWS Cost Controls

AWS Cost Controls

We've all been there: that dreaded email notification from AWS with our new cloud bill. We resolve to reduce our costs next month, but another month rolls around and not much has changed.

Here are some of the tools and tricks at your disposal for controlling your AWS cloud spend and reducing your costs. There's an overwhelming number of resources out there to reduce costs. Knowing which ones to prioritize will largely depend on your unique situation and architecture.

“Free” Money

  • AWS Activate Credits
    • VC backed startups are eligible to receive up to $100,000 in usage credits (one time only)
  • Credits for Testing & POC Implementation

Tricks to Reduce Network Transit Costs

  • Move Data Transfer Costs to the Edge (e.g. CDN)
    • Use Cloudflare to radically reduce transfer costs (Cloudfront has zero bandwidth fees)
    • Use CloudFront
  • Configure an S3 Endpoint in your VPC
  • Optimize NAT Traffic
    • Use NAT instances rather than NAT gateway appliances for dev/test environments
  • Use internal private IPs rather than public IPs or Elastic IPs
    • Use CNAMEs to AWS hostnames for resources that need to be accessed internally and externally to take advantage of the split-horizon DNS
    • Use A records to Private IPs for hostnames that should be accessed strictly internally
  • Use a Docker Pull-Thru Registry Cache
    • The pull-thru cache will reduce the ingress traffic, but only really effective if hitting one per AZ
  • Reduce Availability Zones where suitable to reduce cross-zone traffic
    • For dev/test environments, use a single AZ to reduce cross-az transit costs.
    • Traffic within an AZ is free.
  • Reduce Inter-zone/region traffic

Shave Compute Costs

  • Invest in an AWS Savings Plan
  • Buy (convertible) Reserved EC2 Instances
    • Reserved Instances are complicated, but “convertible” RIs are a no-brainer.
    • Pro tip: Use spot.io to manage this process for you
  • Move to a cheaper AWS region
  • Upgrade to newer generations of EC2 instance families
  • Use EC2 AMD Instances
    • https://cloudonaut.io/6-new-ways-to-reduce-your-AWS-bill-with-little-effort/
  • Setup Spot Instances
    • Use SpotInstance.com to get almost immediate gratification
    • Use spot “fleets”
  • Leverage (or Tune) Auto-scaling Capabilities
    • Use the SpotInst Ocean controller for Kubernetes to “right-size” your EC2 instances
  • Right Size Resources
  • Optimize Kubernetes Workloads
    • Ensure proper memory & CPU requests and limits are set
    • Ensure pods can be rescheduled by setting proper annotations
  • Shutdown unused instances
    • Don’t just stop instances, but terminate instances to prevent racking up EBS costs

Reduce Database Costs

  • Use Aurora Serverless where appropriate
  • Use containerized databases for dev/test environments
  • “Right Size” Compute and Storage

Reduce Storage Costs

  • Backup your data in S3 rather than EBS or EFS to save big
  • Add Lifecycle Rules for Retention Policies on S3 buckets
    • Automatically rotate logs to Glacier or delete after N days
  • Use reduced redundancy for less mission-critical artifacts such as logs
  • Optimize EBS Volumes
    • Reduce oversized volumes
  • Optimize EBS Colume Snapshots

Use the various tools at your disposal

  • Komiser (Free/Open Source) [youtube https://www.youtube.com/watch?v=DDWf2KnvgE8&w=640&h=385]
  • KubeCost (Free/Open Source) – Tool to gain visibility into operating costs inside of your Kubernetes clusters
  • Goldilocks (Free/Open Source) – Tool to help “Right Size” your Kubernetes Pods

    Goldilocks Screenshot

  • SpotInst.com (SaaS)
  • CloudChecker (SaaS)
Author Details
CEO
Erik Osterman is a technical evangelist and insanely passionate DevOps guru with over a decade of hands-on experience architecting systems for AWS. After leading major cloud initiatives at CBS Interactive as the Director of Cloud Architecture, he founded Cloud Posse, a DevOps Accelerator that helps high-growth Startups and Fortune 500 Companies own their infrastructure in record time by building it together with customers and showing them the ropes.

Codefresh Variable Substitution

JeremyRelease Engineering & CI/CDLeave a Comment

It’s important to understand the behavior of variables like ${{CF_PULL_REQUEST_NUMBER}} in Codefresh pipelines and how it interacts with shell commands.

When you use ${{CF_PULL_REQUEST_NUMBER}} in a YAML file, Codefresh handles it like this:

  • If the variable is set, the whole thing is replaced with the value of the variable
  • If the variable is not set, it is left unchanged.

When you execute a command in a freestyle step, that command is passed to bash. So when CF_PULL_REQUEST_NUMBER is set to something like 34:

cf_export IMAGE_TAG=pr-${{CF_PULL_REQUEST_NUMBER}}

becomes: cf_export IMAGE_TAG=pr-34, but when the variable is not set, the command is:
cf_export IMAGE_TAG=pr-${{CF_PULL_REQUEST_NUMBER}}
which of course is bad bash variable interpolation syntax.

We (Cloud Posse) have created a tool to help with this called require_vars we ship in geodesic. If your pipeline should not run if certain variables are not supplied, you can create an initial step like this:

validate:
    title: "Validate"
    description: "Ensure build parameters are present"
    stage: Prepare
    image: cloudposse/geodesic:0.116.0
    entry_point: /etc/codefresh/require_vars
    cmd:
      - |-
        ${{GITHUB_REPO_STATUS_TOKEN}} Personal Access Token used to give scripts
        permission to update the "status check" status on GitHub pull requests
      - ${{SLACK_WEBHOOK_URL}} Secret URL used by scripts to send updates to a Slack channel
      - |-
        ${{AWS_DOCKER_REPO_HOST}} The host hame portion of the ECR Docker repo to use.
        Typically something like 123456789012.dkr.ecr.us-east-1.amazonaws.com
      - |-
        ${{CF_PULL_REQUEST_NUMBER}} The PR number from GitHub.
        The PR number is only set if this build was triggered in relation to a PR.
        Requiring this to be present means requiring this pipeline to only work with PRs.

The PR number is only set if this build was triggered with a PR.
Requiring this to be present means requiring this pipeline to only work with PRs.

This step will verify the referenced variables are set and output the text as an error if they are not, causing the build to fail at that step. We recommend using it whenever a pipeline would not function properly with missing variables.

The other thing is that if a variable is optional, you test whether it is set or not by checking to see if the variable's value contains the variable's name. So, if a step should only run if there is not a PR number assigned, you can add a test like this:

when: 
  condition: 
    all: 
      isNotPR: 'includes("${{CF_PULL_REQUEST_NUMBER}}", "CF_PULL_REQUEST_NUMBER") == true'

(It may seem, and really should be, redundant to have == true but it seems to be required.)

It is kind of weird and confusing but makes sense once you get the hang of it. Please feel free to reach out to me if you have further questions.

PodDisruptionBudget “Gotchas”

JeremyBest Practices, DevOpsLeave a Comment

To allow the Kubernetes Cluster Autoscaler to move pods around, pods (sometimes) need PodDistruptionBudgets which can specify either minAvailable or maxUnavailable, but not both. There are a bunch of “gotchas” to look out for.

You cannot set minAvailable: 0
It's not that you can't set minAvailable it to zero, but since you are not allowed to set both minAvailable and maxUnavailable, most Helm charts have code like:

 

{{- if .Values.podDisruptionBudget.minAvailable }}
minAvailable: {{ .Values.podDisruptionBudget.minAvailable }}
{{- end  }}
{{- if .Values.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ .Values.podDisruptionBudget.maxUnavailable }}
{{- end  }}

If you set minAvailable: 0 that is the same as not setting it. That results in neither value getting set, which is effectively the same as setting maxUnavailable: 0 (which, of course, you also cannot do).

The fix: set minAvailable: "0%". So you may wonder, “why to bother setting a PodDisruptionBudget at all, if you are going to allow all the pods to be deleted?” Well, the reason is that this gives the Autoscaler explicit permission to evict pods that it might otherwise be too cautious about evicting. For example, anything that uses emptyDir for anything. The contents of emptyDir will be lost when the pod is evicted, so the Autoscaler will not evict the pods without explicit permission to avoid deleting something important.

It is dangerous to set minAvailable: 1
It may seem innocuous to set minAvailable: 1, but frequently we scale deployments down to 1 instance, and if you combine that with minAvailable: 1 then you are stuck with a single pod that cannot be evicted, which in turn will prevent the instance from being taken out of service.

Not a fix: minAvailable: 25%. Kubernetes does not round these percentages to the nearest, it always rounds up. So with 1 replica, it will round 25% up to 1, so this is not a fix at all.

The fix: set maxUnavailable: 50%. That will avoid knocking out the service when there are replicas to spare but not prevent the service from being evicted.

Sometimes you have to set both minAvailable and maxUnavailable
While you are not allowed to set both minAvailable and maxUnavailable in the actual PodDisruptionBudget resource, if the Helm chart provides a default value for one of them and you want to use the other one, you have to set both in the helmfile. Set the one you do not want to "" to hint to readers that you are unsetting a default.

Jenkins Pros & Cons (2020)

Erik OstermanRelease Engineering & CI/CDLeave a Comment

I spent some time this weekend getting caught up on the state of Jenkins in 2020. This post will focus on the pros and cons of Jenkins (not Jenkins X – which is a complete rewrite). My objective was to set up Jenkins following “Infrastructure as Code” best practices on Kubernetes using Helm. As part of this, I wanted to see a modern & clean UI throughout and create a positive developer experience. Below is more or less a brain dump of this experiment.

Cons

  • Jenkins has a lot of redundant plugins. Knowing which one to use takes some experimentation and failed attempts. The most common example cited is “docker.” Personally, I don't mind the hunt – that's part of the fun.
  • Jenkins has many plugins that seem no longer maintained. It's important to make sure whatever plugins you chose are still receiving regular updates (as in something pushed within the last ~12 months).
  • Not all plugins are compatible with Declarative Pipelines. IMO using Declarative Pipelines is the current gold standard for Jenkins. Raw imperative groovy pipelines are notoriously complicated and unmanageable.
  • No less than a few dozen plugins are required to “modernize” Jenkins. The more plugins, the greater the chance of problems during upgrades. This can be somewhat mitigated by using command-line-driven tools run inside containers instead of installing some of the more exotic plugins (credit: Steve Boardwell).
  • There's no (maintained) YAML interface for Jenkins Pipelines (e.g. Jenkinsfile.yaml). Most modern CI/CD platforms today have adopted YAML for pipeline configuration. In fact, Jenkins X has also moved to YAML. The closest thing I could find was an alpha-grade prototype with no commits in 18 months.
  • The Kubernetes Plugin works well but complicates Docker Builds. Running the Jenkins Slaves on Kubernetes and then building containers requires some trickery. There are a few options, but the easiest one is to modify the PodTemplate to bind-mount /var/run/docker.sock. This is not a best-practice, however, because it exposes the host-OS to bad actors. Basically, if you have access to the docker socket, you can do anything you want on the host OS. The alternatives like running “PodMan”, “Buildah”, “Kaniko”, “Makisu”, or “Docker BuildKit” on Jenkins have virtually zero documentation, so I didn't try it.
  • The PodTemplate approach is problematic in a modern CI/CD environment. Basically, with a PodTemplate you have to define before the Jenkins slave starts the types of containers you're going to need as part of your pipeline. For example, you define one PodTemplate with docker, golang and terraform. When the Jenkins slave starts up, a Kubernetes Pod will be launched with 3 contains (docker, golang and terraform). One nice thing is that all those containers will be able to share a filesystem and talk over localhost since they are in the same Pod. Also, since it's a Pod, Kubernetes will be able to properly schedule where that Pod should start, and if you have autoscaling configured, new nodes will be spun up on-demand. The problem with this, however, is subtle. What if you want a 4th container to run that is a product of the “docker” container and share the same filesystem? there's no really easy way to do that. These days, we will frequently build a docker container in one step, then run that container and execute some tests in the next step. I'm sure this can be achieved, but nowhere near as easily as with codefresh.
  • It's still not practical today to run “multi-master” Jenkins for High Availability without using Jenkins Enterprise. That said, I think it's moot when operating Jenkins on Kubernetes with Helm. Kubernetes is constantly monitoring the Jenkins process and will restart it if unhealthy (and I tested this inadvertently!). Also, when using Helm if the rollout fails health checks, the previous generation will stay online allowing the bugs to be fixed.
  • Docker layer caching is non-trivial if running with Ephemeral Jenkins Slaves under kubernetes. If you have a large pool of nodes, chances are that every build will hit a new node, thus not taking advantage of layer caching. Alternatively, if using the “Docker in Docker” (dnd) build-strategy every build will necessarily pull down all the layers. This will both add considerably to transit costs and build times as docker images are easily 1G these days.
  • There's lots of stale/out-of-date documentation for Jenkins. I frequently stumbled on how to implement something that seemed pretty basic. Anyways, this is true of any mature ecosystem that has a tremendous amount of innovation, lots of open sources, and been around for 20 years.
  • The “yaml” escape hatch for defining pod specs is sinfully ugly. In fact, I think it's a horrible precedent that will turn people off from Jenkins. It's part of what gives it a bad wrap. The rest of the Jenkinsfile DSL is rather clean and readable, but embedding raw YAML into my declarative pipelines is not a practice I would encourage for any team. To be fair,  some of the ugliness could be eliminated by using readFile or readTrusted steps (credit: Steve Boardwell), but again it’s not that simple.

Pros

I would like to end this on a positive note. All in all, I was very pleasantly surprised by how far Jenkins has come in the past few years since we last evaluated it.

  • Helm chart makes it trivial to deploy Jenkins in a “GitOps” friendly way
  • Blue Ocean + Material Theme for Jenkins makes it look like any other modern CI/CD system
  • Rich ecosystem of Plugins enables the ultimate level customization, much more than any SaaS
  • Overall simple architecture to deploy (when compared to “modern” CI/CD systems). No need to run tons of backing services.
  • Easily extract metrics from your build system into a platform like Prometheus. Centralize your monitoring of things running inside of CI/CD infrastructure. This is very difficult (or not even possible) to do with many SaaS offerings.
  • Achieve greater economies of scale by leveraging your existing infrastructure to build your projects.  If you run Jenkins on Kubernetes, you immediately get all the other benefits. Spin up node pools powered by “SpotInst.com Ocean” and get cheap compute capacity with preemptible “spot” instances. If you're running Prometheus with Grafana, you can leverage that to monitor all your build infrastructure.
  • Integrate with Single Signon without paying the SSO tax levied by enterprise software.
  • Arguably the Jenkinsfile Declarative Pipelines DSL is very readable, in fact, it looks a lot like HCL (HashiCorp Configuration Language). To some, this will be a “Con” – especially if YAML is a requirement.
  • Jenkins “Configuration as Code” plugin supports nearly everything you'd need to run Jenkins itself in a “GitOps” compliant manner. And if it doesn’t there is always the configuration-as-code-groovy plugin which allows you to run arbitrary Groovy scripts for the bits you need (credit: Steve Boardwell).
  • Jenkins can be easily deployed for multiple teams. This is an easy way to mitigate one of the common complaints that Jenkins is unstable because lots of different teams have their hands in the cookie jar.
  • Jenkins can be used much like the modern CI/CD forms that use container steps rather than complicated groovy scripts. This is to say, yes, teams can do bad things with Jenkins but with the right “best practices” your pipelines should be about as manageable as any developed with CircleCI or Codefresh. Stick to using container steps to reduce the complexity in the pipelines themselves.
  • Jenkins Shared Libraries are also pretty awesome (and also one of the most polarizing features). What I like about the libraries is the ability for teams to define “Pipelines as Interfaces”. That is, applications or services in your organization should almost always be deployed in the same way. Using versioned libraries of pipelines helps to achieve this without necessarily introducing instability.
  • Just like with GitHub Actions, with Jenkins, it's possible to “auto-discover” new repositories and pipelines. This is sweet because it eliminates all the ClickOps associated with most other CI/CD systems including CircleCI, TravisCI, and Codefresh. I really like it when I can just create a new repository, stick in a Jenkinsfile, and it “just works”.
  • Jenkins supports what seems like an unlimited number of credential backends. This is a big drawback with most SaaS-based CI/CD platforms. With the Jenkins credential backends, it's possible to “plug and play” things like “AWS SSM Parameter Store”, “AWS Secrets Manager” or HashiCorp Vault. I like this more than trusting some smaller third-party to securely handle my AWS credentials!
  • Jenkins PodTemplates supports annotations, which means we can create specially crafted templates that will automatically assume AWS roles. This is rad because we don't even need to hardcode any AWS credentials as part of our CI/CD pipelines. For GitOps, this is a holy grail.
  • Jenkins is 100% Free and Open Source. You can upgrade and get commercial support from Cloud Bees which also includes a “tried and tested” version of Jenkins (albeit more limited in the selection of plugins).

To conclude, Jenkins is still one of the most powerful Swiss Army knifes to get the job done. I feel like with Jenkins anything is possible, albeit sometimes with more effort and 3 dozen plugins. As systems integrators, we're constantly faced with yet-unknown requirements that pop-up at the last minute. Adopting tools that provide “escape hatches” provide a kind of “peace of mind” knowing we can solve any problem.

Parts of it feel dated like the GUI Configurations, but that is mitigated by Configuration as Code and GitOps. I wish some things like building and running Docker containers inside of pipelines on Kubernetes was easier. Let's face it. Jenkins is not the cool kid on the block anymore and there are many great tools out there. But the truth is few will stand the test of time the way Jenkins has in the Open Source and Enterprise space.

Must-Have Plugins

  • kubernetes (we tested 1.21.2) is what enables Jenkins Slaves to be spun upon demand. It comes preconfigured when using the official Jenkins helm chart.
  • workflow-job (we tested 2.36)
  • credentials-binding (we tested 1.20)
  • git (we tested 4.0.0)
  • workflow-multibranch (we tested 2.21) is essential for keeping Jenkinsfiles in your repos. The multi-branch pipeline detects branches, tags, etc, from within a configured repository, so Jenkins works more like Circle, Codefresh or Travis.
  • github-branch-source (we tested 2.5.8) – once configured will scan your GitHub organizations for new repositories and automatically pick up new pipelines. I really wish more CI/CD platforms had this level of autoconfiguration.
  • workflow-aggregator (we tested 2.6)
  • configuration-as-code (we tested 1.35) allows nearly the entire Jenkins configuration to be defined as code
  • greenballs (we tested 1.15) because I've always green is the color of success =P
  • blueocean (we tested 1.21.0) gives Jenkins a modern look. It clearly depicts stages, progress, and most other systems we've seen like CircleCI, Travis or Codefresh.
  • pipeline-input-step(we tested 2.11)
  • simple-theme-plugin (we tested 0.5.1) allows all CSS to be extended. Combined with the “material” theme for Jenkins you get a complete facelift.
  • ansicolor (we tested 0.6.2) – because many tools these days have ANSI output like Terraform or NPM. It's easy to disable the color output, but as a developer, I like the colors as it helps me quickly parse the screen output.
  • slack (we tested 2.35)
  • saml (we tested 1.1.5)
  • timestamper (we tested 1.10) – because with long-running steps, it's helpful to know how much time elapsed between lines of output

Pro Tips

  • Add the following nginx-ingress annotation to make Blue Ocean the default”
    nginx.ingress.kubernetes.io/app-root: /blue/organizations/jenkins/pipelines
  • Use this Material Theme for Jenkins with the simple-theme-plugin to get a beautiful-looking Jenkins
  • Hide the [Pipeline] output with some simple CSS and the simple-theme-plugin:

    .pipeline-new-node {
    display: none;
    }

References

When researching this post and my “Proof of Concept”, I referenced some links and articles.

Effortless Helm Chart Deployments (Video & Slides)

adminDevOps, Meetup, Release Engineering & CI/CDLeave a Comment


Learn how to deploy complex service-oriented architectures easily using Helmfiles. Forget umbrella charts and manual helm deployments. Helmfile is the missing piece of the puzzle. Helmfiles are the declarative way to deploy Helm charts in a 12-factor compatible way. They're great for deploying all your kubernetes services and even for Codefresh continuous delivery to Kubernetes. We'll show you exactly how we do it with a live demo, including public repos for all our helmfiles.

Top 5 DevOps Bad Habbits

adminDevOpsLeave a Comment

Recently, we were asked to name our top (5) DevOps “Worst Practices” (or anti-patterns). Here's what we came up with…

#1. Not looking outside the organization to see how others are solving the problem—always building new things rather than looking for readymade solutions (open source, SaaS, or enterprise offerings) which leads to piles of technical debt & inevitable snowflake infrastructures.

#2. Not building easy tools that the rest of the company can use. We must never forget who we are serving. Developers are our customers too.

#3. Not treating DevOps as a shared responsibility. It needs to be embedded into the engineering organization, not relegated to a select few individuals. “DevOps” is more of a philosophy than a job title.

#4. Not treating Infrastructure as Code. We call this new paradigm GitOps, where Git system of record for all infrastructure and CI/CD is our delivery mechanism.

#5. Never commit to `main`. We don't do it in regular software projects, we shouldn't do it for ops. Everyone should follow the standard Git Workflow on their Infrastructure Code (Feature Branching, Pull Requests, Code Reviews, CI/CD). This increases transparency and helps the rest of the team stay up-to-date with everything going on.