PodDisruptionBudget “Gotchas”

JeremyBest Practices, DevOpsLeave a Comment

2 min read

To allow the Kubernetes Cluster Autoscaler to move pods around, pods (sometimes) need PodDistruptionBudgets which can specify either minAvailable or maxUnavailable, but not both. There are a bunch of “gotchas” to look out for.

You cannot set minAvailable: 0
It's not that you can't set minAvailable it to zero, but since you are not allowed to set both minAvailable and maxUnavailable, most Helm charts have code like:

 

{{- if .Values.podDisruptionBudget.minAvailable }}
minAvailable: {{ .Values.podDisruptionBudget.minAvailable }}
{{- end  }}
{{- if .Values.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ .Values.podDisruptionBudget.maxUnavailable }}
{{- end  }}

If you set minAvailable: 0 that is the same as not setting it. That results in neither value getting set, which is effectively the same as setting maxUnavailable: 0 (which, of course, you also cannot do).

The fix: set minAvailable: "0%". So you may wonder, “why to bother setting a PodDisruptionBudget at all, if you are going to allow all the pods to be deleted?” Well, the reason is that this gives the Autoscaler explicit permission to evict pods that it might otherwise be too cautious about evicting. For example, anything that uses emptyDir for anything. The contents of emptyDir will be lost when the pod is evicted, so the Autoscaler will not evict the pods without explicit permission to avoid deleting something important.

It is dangerous to set minAvailable: 1
It may seem innocuous to set minAvailable: 1, but frequently we scale deployments down to 1 instance, and if you combine that with minAvailable: 1 then you are stuck with a single pod that cannot be evicted, which in turn will prevent the instance from being taken out of service.

Not a fix: minAvailable: 25%. Kubernetes does not round these percentages to the nearest, it always rounds up. So with 1 replica, it will round 25% up to 1, so this is not a fix at all.

The fix: set maxUnavailable: 50%. That will avoid knocking out the service when there are replicas to spare but not prevent the service from being evicted.

Sometimes you have to set both minAvailable and maxUnavailable
While you are not allowed to set both minAvailable and maxUnavailable in the actual PodDisruptionBudget resource, if the Helm chart provides a default value for one of them and you want to use the other one, you have to set both in the helmfile. Set the one you do not want to "" to hint to readers that you are unsetting a default.

Jenkins Pros & Cons (2020)

Erik OstermanRelease Engineering & CI/CDLeave a Comment

9 min read

I spent some time this weekend getting caught up on the state of Jenkins in 2020. This post will focus on the pros and cons of Jenkins (not Jenkins X – which is a complete rewrite). My objective was to set up Jenkins following “Infrastructure as Code” best practices on Kubernetes using Helm. As part of this, I wanted to see a modern & clean UI throughout and create a positive developer experience. Below is more or less a brain dump of this experiment.

Cons

  • Jenkins has a lot of redundant plugins. Knowing which one to use takes some experimentation and failed attempts. The most common example cited is “docker.” Personally, I don't mind the hunt – that's part of the fun.
  • Jenkins has many plugins that seem no longer maintained. It's important to make sure whatever plugins you chose are still receiving regular updates (as in something pushed within the last ~12 months).
  • Not all plugins are compatible with Declarative Pipelines. IMO using Declarative Pipelines is the current gold standard for Jenkins. Raw imperative groovy pipelines are notoriously complicated and unmanageable.
  • No less than a few dozen plugins are required to “modernize” Jenkins. The more plugins, the greater the chance of problems during upgrades. This can be somewhat mitigated by using command-line-driven tools run inside containers instead of installing some of the more exotic plugins (credit: Steve Boardwell).
  • There's no (maintained) YAML interface for Jenkins Pipelines (e.g. Jenkinsfile.yaml). Most modern CI/CD platforms today have adopted YAML for pipeline configuration. In fact, Jenkins X has also moved to YAML. The closest thing I could find was an alpha-grade prototype with no commits in 18 months.
  • The Kubernetes Plugin works well but complicates Docker Builds. Running the Jenkins Slaves on Kubernetes and then building containers requires some trickery. There are a few options, but the easiest one is to modify the PodTemplate to bind-mount /var/run/docker.sock. This is not a best-practice, however, because it exposes the host-OS to bad actors. Basically, if you have access to the docker socket, you can do anything you want on the host OS. The alternatives like running “PodMan”, “Buildah”, “Kaniko”, “Makisu”, or “Docker BuildKit” on Jenkins have virtually zero documentation, so I didn't try it.
  • The PodTemplate approach is problematic in a modern CI/CD environment. Basically, with a PodTemplate you have to define before the Jenkins slave starts the types of containers you're going to need as part of your pipeline. For example, you define one PodTemplate with docker, golang and terraform. When the Jenkins slave starts up, a Kubernetes Pod will be launched with 3 contains (docker, golang and terraform). One nice thing is that all those containers will be able to share a filesystem and talk over localhost since they are in the same Pod. Also, since it's a Pod, Kubernetes will be able to properly schedule where that Pod should start, and if you have autoscaling configured, new nodes will be spun up on-demand. The problem with this, however, is subtle. What if you want a 4th container to run that is a product of the “docker” container and share the same filesystem? there's no really easy way to do that. These days, we will frequently build a docker container in one step, then run that container and execute some tests in the next step. I'm sure this can be achieved, but nowhere near as easily as with codefresh.
  • It's still not practical today to run “multi-master” Jenkins for High Availability without using Jenkins Enterprise. That said, I think it's moot when operating Jenkins on Kubernetes with Helm. Kubernetes is constantly monitoring the Jenkins process and will restart it if unhealthy (and I tested this inadvertently!). Also, when using Helm if the rollout fails health checks, the previous generation will stay online allowing the bugs to be fixed.
  • Docker layer caching is non-trivial if running with Ephemeral Jenkins Slaves under kubernetes. If you have a large pool of nodes, chances are that every build will hit a new node, thus not taking advantage of layer caching. Alternatively, if using the “Docker in Docker” (dnd) build-strategy every build will necessarily pull down all the layers. This will both add considerably to transit costs and build times as docker images are easily 1G these days.
  • There's lots of stale/out-of-date documentation for Jenkins. I frequently stumbled on how to implement something that seemed pretty basic. Anyways, this is true of any mature ecosystem that has a tremendous amount of innovation, lots of open sources, and been around for 20 years.
  • The “yaml” escape hatch for defining pod specs is sinfully ugly. In fact, I think it's a horrible precedent that will turn people off from Jenkins. It's part of what gives it a bad wrap. The rest of the Jenkinsfile DSL is rather clean and readable, but embedding raw YAML into my declarative pipelines is not a practice I would encourage for any team. To be fair,  some of the ugliness could be eliminated by using readFile or readTrusted steps (credit: Steve Boardwell), but again it’s not that simple.

Pros

I would like to end this on a positive note. All in all, I was very pleasantly surprised by how far Jenkins has come in the past few years since we last evaluated it.

  • Helm chart makes it trivial to deploy Jenkins in a “GitOps” friendly way
  • Blue Ocean + Material Theme for Jenkins makes it look like any other modern CI/CD system
  • Rich ecosystem of Plugins enables the ultimate level customization, much more than any SaaS
  • Overall simple architecture to deploy (when compared to “modern” CI/CD systems). No need to run tons of backing services.
  • Easily extract metrics from your build system into a platform like Prometheus. Centralize your monitoring of things running inside of CI/CD infrastructure. This is very difficult (or not even possible) to do with many SaaS offerings.
  • Achieve greater economies of scale by leveraging your existing infrastructure to build your projects.  If you run Jenkins on Kubernetes, you immediately get all the other benefits. Spin up node pools powered by “SpotInst.com Ocean” and get cheap compute capacity with preemptible “spot” instances. If you're running Prometheus with Grafana, you can leverage that to monitor all your build infrastructure.
  • Integrate with Single Signon without paying the SSO tax levied by enterprise software.
  • Arguably the Jenkinsfile Declarative Pipelines DSL is very readable, in fact, it looks a lot like HCL (HashiCorp Configuration Language). To some, this will be a “Con” – especially if YAML is a requirement.
  • Jenkins “Configuration as Code” plugin supports nearly everything you'd need to run Jenkins itself in a “GitOps” compliant manner. And if it doesn’t there is always the configuration-as-code-groovy plugin which allows you to run arbitrary Groovy scripts for the bits you need (credit: Steve Boardwell).
  • Jenkins can be easily deployed for multiple teams. This is an easy way to mitigate one of the common complaints that Jenkins is unstable because lots of different teams have their hands in the cookie jar.
  • Jenkins can be used much like the modern CI/CD forms that use container steps rather than complicated groovy scripts. This is to say, yes, teams can do bad things with Jenkins but with the right “best practices” your pipelines should be about as manageable as any developed with CircleCI or Codefresh. Stick to using container steps to reduce the complexity in the pipelines themselves.
  • Jenkins Shared Libraries are also pretty awesome (and also one of the most polarizing features). What I like about the libraries is the ability for teams to define “Pipelines as Interfaces”. That is, applications or services in your organization should almost always be deployed in the same way. Using versioned libraries of pipelines helps to achieve this without necessarily introducing instability.
  • Just like with GitHub Actions, with Jenkins, it's possible to “auto-discover” new repositories and pipelines. This is sweet because it eliminates all the ClickOps associated with most other CI/CD systems including CircleCI, TravisCI, and Codefresh. I really like it when I can just create a new repository, stick in a Jenkinsfile, and it “just works”.
  • Jenkins supports what seems like an unlimited number of credential backends. This is a big drawback with most SaaS-based CI/CD platforms. With the Jenkins credential backends, it's possible to “plug and play” things like “AWS SSM Parameter Store”, “AWS Secrets Manager” or HashiCorp Vault. I like this more than trusting some smaller third-party to securely handle my AWS credentials!
  • Jenkins PodTemplates supports annotations, which means we can create specially crafted templates that will automatically assume AWS roles. This is rad because we don't even need to hardcode any AWS credentials as part of our CI/CD pipelines. For GitOps, this is a holy grail.
  • Jenkins is 100% Free and Open Source. You can upgrade and get commercial support from Cloud Bees which also includes a “tried and tested” version of Jenkins (albeit more limited in the selection of plugins).

To conclude, Jenkins is still one of the most powerful Swiss Army knifes to get the job done. I feel like with Jenkins anything is possible, albeit sometimes with more effort and 3 dozen plugins. As systems integrators, we're constantly faced with yet-unknown requirements that pop-up at the last minute. Adopting tools that provide “escape hatches” provide a kind of “peace of mind” knowing we can solve any problem.

Parts of it feel dated like the GUI Configurations, but that is mitigated by Configuration as Code and GitOps. I wish some things like building and running Docker containers inside of pipelines on Kubernetes was easier. Let's face it. Jenkins is not the cool kid on the block anymore and there are many great tools out there. But the truth is few will stand the test of time the way Jenkins has in the Open Source and Enterprise space.

Must-Have Plugins

  • kubernetes (we tested 1.21.2) is what enables Jenkins Slaves to be spun upon demand. It comes preconfigured when using the official Jenkins helm chart.
  • workflow-job (we tested 2.36)
  • credentials-binding (we tested 1.20)
  • git (we tested 4.0.0)
  • workflow-multibranch (we tested 2.21) is essential for keeping Jenkinsfiles in your repos. The multi-branch pipeline detects branches, tags, etc, from within a configured repository, so Jenkins works more like Circle, Codefresh or Travis.
  • github-branch-source (we tested 2.5.8) – once configured will scan your GitHub organizations for new repositories and automatically pick up new pipelines. I really wish more CI/CD platforms had this level of autoconfiguration.
  • workflow-aggregator (we tested 2.6)
  • configuration-as-code (we tested 1.35) allows nearly the entire Jenkins configuration to be defined as code
  • greenballs (we tested 1.15) because I've always green is the color of success =P
  • blueocean (we tested 1.21.0) gives Jenkins a modern look. It clearly depicts stages, progress, and most other systems we've seen like CircleCI, Travis or Codefresh.
  • pipeline-input-step(we tested 2.11)
  • simple-theme-plugin (we tested 0.5.1) allows all CSS to be extended. Combined with the “material” theme for Jenkins you get a complete facelift.
  • ansicolor (we tested 0.6.2) – because many tools these days have ANSI output like Terraform or NPM. It's easy to disable the color output, but as a developer, I like the colors as it helps me quickly parse the screen output.
  • slack (we tested 2.35)
  • saml (we tested 1.1.5)
  • timestamper (we tested 1.10) – because with long-running steps, it's helpful to know how much time elapsed between lines of output

Pro Tips

  • Add the following nginx-ingress annotation to make Blue Ocean the default”
    nginx.ingress.kubernetes.io/app-root: /blue/organizations/jenkins/pipelines
  • Use this Material Theme for Jenkins with the simple-theme-plugin to get a beautiful-looking Jenkins
  • Hide the [Pipeline] output with some simple CSS and the simple-theme-plugin:

    .pipeline-new-node {
    display: none;
    }

References

When researching this post and my “Proof of Concept”, I referenced some links and articles.

SweetOps Newsletter – Issue #3

adminNewsletters

7 min read

Job Board

We've launched our jobs site to keep track of new opportunities posted by companies in our community. Have a job you want to post? Ping @erik on slack.

Check it out here: 👉https://sweetops.com/jobs/

“Office Hours” Pod Cast

We've launched a podcast, which is a syndicated version of our weekly “Office Hours”. Tune in on the train or in the car!

Subscribe here: 👉https://podcast.cloudposse.com

Slack Archive Search

You can now search

Slack Archive Search

We've added search functionality to our slack archives with Algolia. Now you can easily find solutions to common bugs or problems encountered by our community.

Kubernetes News

A lot of interesting links were shared related to the Kubernetes ecosystem. Here are some that stood out!

A Practical Guide to Setting Kubernetes Requests and Limits

Setting Kubernetes requests and limits effectively has a major impact on application performance, the stability of applications and the cluster, as well as cost. And yet working with many teams over the past year has shown us that determining the right values for these parameters is hard or worse yet teams don't even set them! Kubecost created this short guide to help teams more accurately set Kubernetes requests and limits for their applications.

Excellent Resource to Learn Kubernetes
Excellent Resource to Learn Kuberneteslearnk8s.io

If you want to learn more about Kubernetes, this is a phenomenal resource for learning the basics. For example, here's how to increase kubectl productivity when looking up resources, customizing the columns in the output format, switching between clusters and namespaces with ease, using auto-generated aliases, and extending kubectl with plugins!

CNCF Cloud Native Interactive Landscape
CNCF Cloud Native Interactive Landscapelandscape.cncf.io

There's an overwhelming number of cloud native technologies and open source tools. How do they all fit in? Here's a site that let's you filter and sort by GitHub stars, funding, commits, contributors, hq location, and tweets.

Terraform News

We've been hard at work upgrading all of our Terraform Modules to support Terraform 0.12 (HCL2). As part of this, we've added automated tests using a combination of bats and terratest. We publish all of our hundreds of Terraform modules in the official HashiCorp registry.

terraform-aws-eks-fargate-profile
terraform-aws-eks-fargate-profile github.com

Terraform module to provision an EKS Fargate Profile to spin up a node group that run pods entirely on Fargate

terraform-aws-eks-node-group
terraform-aws-eks-node-groupgithub.com

Terraform module to provision a fully managed AWS EKS Node Group using terraform. Plus it works with all of our other EKS node pool modules so it's possible to run multiple different types of node pools in the same EKS cluster.

terraform-aws-eks-cluster
terraform-aws-eks-clustergithub.com

We've updated our Terraform modules for provisioning an EKS cluster to support Terraform 0.12. As part of this, we've added terratest to run automated infrastructure tests. Check it out here!

Want more news? Check out our Slack archives to learn what our community is all about.

Jobs

SpotOn Senior DevOps Engineer
SpotOn Senior DevOps Engineersweetops.com

SpotOn is a cutting-edge software company dedicated to redefining the merchant services industry. SpotOn, is hiring in either Krakow, Poland or Mexico City, MX for a seasoned DevOps engineer. BTW, they use TONS of Cloud Posse Terraform modules and Codefresh! =)

SettleMint Experienced SRE
SettleMint Experienced SREsweetops.com

SettleMint makes blockchain technology deployment, development and integration practical for the enterprise. Settlement is looking for someone in Belgium for a full on helm, kubernetes, helmfile gig (fulltime) in the Blockchain space.

What is a DevOps Accelerator?

Curious about what well built infrastructure looks like? Check out what we do at Cloud Posse.

Find out if we can help you by taking our quiz.

Free Weekly “Office Hours” with Cloud Posse

You are invited to our weekly Zoom meetings every Wednesday at 11:30 am PST (GMT-8). Join us to talk shop! This is an informal gathering where you get to ask questions and watch demos.

Register here: 👉https://cloudposse.com/office-hours/

After registering, you will receive a confirmation email containing information about joining the meeting.

SweetOps Newsletter – Issue #2

adminNewsletters

8 min read

This past week we crossed 1,600 members! That's means we've grown by over 60% since July. We now span 57 timezones with over 600 DAU. An enormous amount of insightful information has been shared during this time. Thank you everyone for your contributions and generous support! Please keep them coming.

If you haven't yet signed up for our Slack team, join us!

Kubernetes News

Easily Import Secrets to Kubernetes
Easily Import Secrets to Kubernetesgithub.com

Easily populate Kubernetes secrets from 1Password (and others). This operator fetches secrets from cloud services and injects them in Kubernetes. ContainerSolutions/externalsecret-operator

Kubernetes Development Environments
Kubernetes Development Environmentsgarden.io

Garden looks interesting! It automates the repetitive parts of your workflow to make developing for Kubernetes and cloud faster & easier.

Ship Kubernetes Event Stream to Sentry!
Ship Kubernetes Event Stream to Sentry!github.com

Let's be honest. Errors and warnings in Kubernetes often go unnoticed by operators. Even when they are noticed, we might not realize with what frequency they occur and we lose the context of what else is going on in the cluster. With this tiny service deployed in your cluster, you'll get all errors and warnings loaded into Sentry where they will be cleanly presented and intelligently grouped. Plus, you can leverage all the typical Sentry features such as notifications and comments which can then be used to help operations and give developers additional visibility.

Terraform News

HashiCorp Forums are Live!
HashiCorp Forums are Live!discuss.hashicorp.com

HashiCorp has finally launched their public support forums using Discourse. This is awesome stuff! Get help from the community for all major products like Terraform, Vault and Consul.

Export ClickOps to Terraform
Export ClickOps to Terraformgithub.com
CLI tool to generate terraform files from existing infrastructure (reverse Terraform). Infrastructure to Code – GoogleCloudPlatform/terraformer

Cloud Posse ECS Terraform Modules
Cloud Posse ECS Terraform Modulesgithub.com

We've upgraded all of our ECS terraform modules to support Terraform 0.12 (HCL2). As part of this, we've implemented terratest to all the ECS modules so we can review your contributions quicker and provide greater stability!

Security News

Google Warns LastPass Users Were Exposed To ‘Last Password’ Credential Leak

Google Project Zero security researcher reveals that the LastPass password manager could, somewhat ironically, leak the last password you used to any website you visited. Ouch.

Sudo Flaw Lets Linux Users Run Commands As Root Even When They're Restricted
Sudo Flaw Lets Linux Users Run Commands As Root Even When They're Restrictedthehackernews.com

A vulnerability in Sudo, tracked as CVE-2019-14287, could allow Linux users to run commands as root user even when they're restricted. How can we still be finding bugs in sudo decades later?

If you’re not using SSH certificates you’re doing SSH wrong
If you’re not using SSH certificates you’re doing SSH wrongsmallstep.com

SSH has some pretty gnarly issues when it comes to usability, operability, and security. The good news is this is all easy to fix. SSH is ubiquitous. It’s the de-facto solution for remote administration of *nix systems. SSH certificate authentication makes SSH easier to use, easier to operate, and more secure.

(Pro tip: use teleport by Gravitational)

Kubernetes 'Billion Laughs' Vulnerability Is No Laughing Matter
Kubernetes ‘Billion Laughs' Vulnerability Is No Laughing Matterthenewstack.io
A new vulnerability has been discovered within the Kubernetes API. This flaw is centered around the parsing of YAML manifests by the Kubernetes API server. During this process the API server is open to potential Denial of Service (DoS) attacks. The issue (CVE-2019-11253 — which has yet to have any details fleshed out on the page) has been labeled a ‘Billion Laughs' attack because it targets the parsers to carry out the attack.

Want more? Check out our Slack archives to learn what our community is all about.

Jobs

Are you looking for your next gig? Check out our #jobs channel in SweetOps for recent postings. Here are some recent ones that have been posted.

Brian Tai writes “AuditBoard is hiring a DevOps Engineer! AuditBoard is a fast-growing startup located in the Greater Los Angeles area. Our offices are located in El Segundo and Cerritos. Our SaaS product consists of a suite of solutions for internal auditors to improve and streamline their day-to-day work. (imagine a GitHub/Trello hybrid for auditors) We have signed and continue to sign many new customers including Walmart, Snap, Toyota, and many others in the Fortune 500. “

DevOps Engineer - AuditBoard
DevOps Engineer – AuditBoardsoxhub.recruitee.com
DevOps Engineer – AuditBoard

Amanda Heironimus posted, “PlayQ is looking for a Senior Cloud Services Engineer to join our team in Santa Monica, CA. As a foundational member of our DevOps team, you’d receive the perfect amount of support from our global team while enjoying plenty of room to grow and contribute to new and exciting projects.We empower our teams to produce meaningful and impactful work, so you’ll also have the unique opportunity to take the lead in shaping and informing our infrastructure, managing deployments, and ensuring that mission-critical systems are functioning effectively and consistently.”

Job Application for Senior Cloud Services Engineer at PlayQ
Job Application for Senior Cloud Services Engineer at PlayQboards.greenhouse.io

Free Weekly “Office Hours” with Cloud Posse

You are invited to our weekly “Lunch & Learn” meetings via Zoom every Wednesday at 11:30 am PST (GMT-8). Join us to talk shop! This is an informal gathering of 10-15 people, where you get to ask questions and watch demos.

Register here:

https://zoom.us/meeting/register/dd2072a53834b30a7c24e00bf0acd2b8

After registering, you will receive a confirmation email and invite containing information about joining the meeting.

West LA DevOps Presents: Geodesic – Cloud Automation Shell (Video & Slides)

Erik OstermanSlidesLeave a Comment

2 min read


🚀 Geodesic is a cloud automation shell. It's the easy way to automate everything. Think of it as the superset of all other tools including (terraform, terragrunt, chamber, aws-vault, aws-okta, kops, gomplate, helm, helmfile, aws cli, variant, etc) that we use to automate workflows. It's a bit like a swiss army knife for creating and building consistent platforms to be shared across team environments. It easily versions staging/production/dev environments in a repeatable manner that can be followed by any team member with only a single dependency: docker. Because of this, it works with Mac OSX, Linux, and Windows 10. Learn how you can use the geodesic shell to improve your DevOps workflows! These are the slides from the live demo at the West Los Angeles DevOps Meetup.

Erik Osterman is the founder of Cloud Posse, a DevOps professional services company that specializes in cloud migrations and release engineering. Previously he was the Director of Cloud Architecture, for CBS Interactive where he led cloud strategy across the organization.

 


GitOps with Terraform on Codefresh (Webinar Video & Slides)

Erik OstermanSlides2 Comments

2 min read

🚀 Infrastructure as code, pipelines as code, and now we even have code as code! =P In this talk, we show you how we build and deploy applications with Terraform using GitOps with Codefresh. Cloud Posse is a power user of Terraform and have written over 140 Terraform modules. We’ll share how we handle automation with security while making the process easy for engineers.

Erik Osterman is the founder of Cloud Posse, a DevOps professional services company that specializes in cloud migrations and release engineering. Previously he was the Director of Cloud Architecture, for CBS Interactive where he led cloud strategy across the organization.