Billing
We charge a fraction of what it costs to do it in-house and deliver it in half the time or faster. Check out our calculator just to see how much it can cost to build your own cloud infrastructure. Make sure you review some of the risks of doing it yourself.
We practice “agile” development. We charge a flat fee per sprint but allow for scope changes (which are billed separately) at customer request. A typical engagement consists of 8-10 sprints that are 2 weeks (80 hours) in duration.
We believe in total transparency.
For this reason, you can expect no hidden fees from us.
IMPORTANT: Depending on the features you want to be implemented, certain third-party software subscriptions may be required (SaaS).
We do not include these costs in our contract because they are negotiated between your company and the vendor. Sometimes you may qualify for “startup” pricing.
Examples include:
- AWS
- Datadog
- NewRelic
- Sumologic
- Splunk
- Codefresh
- Teleport
- Kubecost
- Mailgun
- PagerDuty
- Pingdom
We'll sign a few contracts before we get started:
- Mutual NDA that describes how we'll govern sensitive information
- Master Services Agreement that describes the terms for all engagements we'll do down the road
- Statement of Work that describes at length exactly what you'll receive.
Our standard payment terms are Net 15.
Payment for each Sprint is due before work commences.
Our sprints are billed at a weekly rate. Basically we take our industry-standard hourly rate and multiply that times the number of man-hours in a week (40). If we exceed the allotted hours in a Sprint, we will bill and invoice for the overages. This will allow the Client to realize a final product that meets or exceeds its expectations, and prevents the Client from being held financially responsible for software that does not help it reach its objectives.
Upon execution of the agreement, we set up a project for each sprint in our Harvest Time Management System. This allows us to parallelize work while attributing it to a specific deliverable. Sprints represent a number of man-hours, but multiple individuals might be involved in the time & effort.
Upon commencement of a Sprint, we send an invoice for the stated amount. That's a retainer from which we draw against to deliver the integration of our solution. If there's an overage, we have the option of rolling the remaining work into the next Sprint or consuming time against the “General Support” retainer. In all cases, we'll prepare a detailed/itemized invoice (called a “True-Up”) against that sprint and bill for the excess hours in an entirely transparent manner. This is all handled automatically by our bookkeepers.
When you hire Cloud Posse, you're buying an outcome that few others can provide. What a company is really buying from Cloud Posse is an end-to-end solution that includes time for implementation and integration. This is a solution that has cost our customers millions of dollars to implement and we are selling for a tiny fraction of the cost to implement it in-house.
We are not a traditional “DevOps as a Service” company that only does the grunt work; we provide thought leadership combined with expert execution and implementation. We have chosen to use an “Open Source” licensing model to simplify the software distribution because we provide 10x the value in our implementation.
During the course of our engagement, our customers have direct access to our team with tremendous experience in cloud architecture & implementation. Companies hire us to implement in a span of only 3-4 months would take even the most senior experienced team DevOps engineers years to develop, which makes our offering insanely affordable by comparison. By partnering with Cloud Posse, you're sparing all the hard “lessons learned” to achieve a greater outcome in a shorter amount of time with less risk.
You will find the industry-standard rate for experienced independent contractors/freelancers is around $150-250/hr. Note, when you hire freelancers they don't bring to the table the unparalleled library of code and experience that you get when you partner with Cloud Posse. We put our best foot forward on GitHub so you see exactly what you’re getting. Plus, freelancers and employees cannot offer business continuity, which leaves your company with no one to turn to when/if they leave or go on vacation. While a company might shave off a little bit on the hourly rate by going with an independent contractor, it's several orders of magnitude more expensive to implement a custom solution that is remotely comparable to what we offer; that solution will have greater uncertainty and result in greater risk for your business.
Time & Materials (T&M) projects align our goals with yours. Our company will work with you until your needs are met irrespective of the actual deliverables, providing the greatest likelihood of a successful outcome—everyone's end goal. This will allow you to realize a final product that meets or exceeds your expectations and prevents you from being held financially responsible for software that does not help you reach your objectives. It rewards the company for working harder to satisfy those needs or saves you money if the needs are met with less effort. More importantly, it gives you the maximum agility to decide what you need as the project progresses and to pivot at any time. Is there some new feature you just thought of and want right away? We'll get right on that. Something you thought you were going to need but now realize can wait until next year? We're happy to skip that and move on to what you care about most, even if that is different from what it was last week.
Fixed-fee (or Fixed-bid) projects are based on the outdated “waterfall” model where you define everything you are going to need to finish the project before work on the deliverables even starts. On top of locking you into a rigid set of deliverables before you are even sure that is what you want, they require significant extra time and effort (as much as 50% of the total project time) to define “acceptance criteria”, which is a mutually agreed set of tests that, when passed, define the project as finished. In addition, fixed-price bids transfer completion risk to the company, so a company is wise to double the estimated T&M and add 20% (the extra 20% is for all the time spent negotiating the acceptance criteria.) 🙂 Fixed-price bids incentivize the company to ignore your actual needs and focus on delivering the bare minimum to satisfy the acceptance criteria. Was there something important to you that you forgot to capture in the acceptance criteria? Sorry, that is “out of scope”; we will get to it on the next project. This is how the big consulting companies got so big. They know your “waterfall” project will fail, leading to a follow-on waterfall project to fix it, except that, too, will fail for the same reasons, leading to a never-ending stream of work for the consultants. They fatten their profits by charging for all the extra work which fixed-bid contracts require, and keep you on the hook by taking advantage of the “sunk cost fallacy.”
Time and Materials Not To Exceed (T&M NTE) has a couple of ways of working. At its worst, it has all the problems of a Fixed-fee project but it takes away any incentive for the company to give you a good rate. It caps the company’s profits but not their losses. No company can agree to this model and stay in business.
There is a second form of T&M NTE, that larger companies with more complex governance, budget, and financial control systems might prefer. It’s mostly the same, except with built-in circuit breakers ensuring that budget and finance departments will retain oversight so that if a project balloons in scope, appropriate people will be called in to review and triage features before the project becomes an unexpected drain on resources. It is important for all concerned to understand that there is no commitment to “finishing” a T&M NTE project because such a project does not have a defined, agreed-upon endpoint. Still, the company is incentivized to meet expectations with the given budget and to help identify cost savings where appropriate, in the hope of securing additional work on the ongoing project. Larger companies with more bureaucratic management and more sophisticated budgetary and financial controls may prefer this, while smaller, more nimble companies know that a standard T&M contract is tacitly a Not To Exceed in that it can be canceled at any time, for exceeding the budget or any other reason.
We prefer T&M projects because it eliminates the need for either side to argue about ambiguities in the project definitions. Nobody needs to be convinced that the deliverables are acceptable or unacceptable. Instead, we deliver to the best of our ability what we understood our customers wanted. If the customer wants something else, whether it is because their needs changed in the interim or they asked for the wrong thing in the first place, we can just accept that they want something different and get right to work delivering that. Your acceptance criteria can be whatever you want, from fully-automated tests to feedback surveys, and anything you want to be done differently, we are happy to do it. When working with T&M, we are truly on your side.
We use Harvest to track all our time by client, project, sprint, and developer. We then import these hours into Quickbooks for invoicing. We accept payments via ACH, Bill.com, and Check. Customers always have access to real-time reports of hours accrued.
A typical “Statement of Work” includes a set number of Sprints. We try to keep a narrow scope for each Sprint so that we can tightly control how hours get spent to avoid overruns. We typically avoid adding tasks to a running Sprint so that the scope does not grow. That's also why we have an allocation for “General Support”, which is work that falls outside of the current Sprint. This is for special requests, meetings, pair programming sessions, extra documentation, etc.
All of our engagements are on a Time & Materials basis. We charge a 20% upfront deposit which is billed as a retainer for “General Support” and invoiced in advance of services (Due on Receipt) in order to commence an engagement. Sprints are billed as 80-hour retainers and invoiced in advance of services, typically on Net-30 terms. All unused prepaid balances are refundable or can be applied to outstanding invoices. As soon as we exhaust the 80-hours in a Sprint, we move all outstanding work to the next Sprint. Every time we finish a Sprint, we prepare a “True-Up” invoice – it's where we pull all hours worked from the Sprint into an invoice, then apply a credit for the amount already paid towards that particular invoice. The remaining balance is what is owed.
If at any time you have questions about an invoice you've received, please do not hesitate to reach out to our account department.
Unfortunately, we're not able to take on small engagements. You can, however, join us every single week for 100% free “Office Hours”—where we seek to answer your questions. Just register for an invitation.
We hold our “Office Hours” every Wednesday at 11:30 am PT via Zoom. We're typically 30+ people on the call and all skill levels are welcome.
Checkout our past recordings on YouTube or subscribe to our Podcast.
We accept all major forms of payment, including:
- ACH (preferred)
- Bill.com (preferred)
- US Check
- Credit Card (Visa, AMEX, MasterCard) – additional service charges may
Community
SweetOps is a community that is run by Cloud Posse. It exists as a place for our users to collaborate and ask questions related to our large collection of open source projects on GitHub, but also to talk shop and get feedback on anything DevOps related.
Cloud Posse is not affiliated with the github.com/terraform-aws-modules repository, but many users of this organization participate in our SweetOps slack team in the #terraform-aws-modules channel.
Our community of over 6000+ people is open to everyone. You do not need to be a customer to benefit from our community. We have a VERY active and helpful public slack community that is welcoming of all skill levels and backgrounds.
- Go to slack.cloudposse.com to register.
- Sign up for our weekly “office hours”
p.s. if you're interested in a career at Cloud Posse, this is a great first step.
We recommend you first sign up for our slack team. Then join the #office-hours slack channel and ask your question there. We'll make sure to address it on our next call. Of course, you can always just join us this week and ask us there.
You're other option is to register for our FREE weekly office hours. We host these calls every Wednesday at 11:30am PT.
No, it's absolutely FREE for anyone to attend.
Cloud Posse holds public “Office Hours” every Wednesday at 11:30 am PT to answer questions on all things related to DevOps, Terraform, Kubernetes, CI/CD. Basically, it's like an interactive “Lunch & Learn” session where we get together for about an hour and talk shop with our community. These are totally free and just an opportunity to ask us (or our community of experts) any questions you may have.
You can register here: https://cloudposse.com/office-hours
Join the conversation in our SweetOps slack #office-hours
channel: https://slack.cloudposse.com/
What to expect…
- Live Q&A. Ask questions and get answers. We're usually about 30+ people on the call.
- News & Announcements. We'll share cool things we're working on at Cloud Posse along with any cool announcements or projects we come across.
- Live Demos. Watch live demos of some of the things we're building here at Cloud Posse. Also, tune in if you want to share any open source projects you're working on.
- Special Guest Speakers. From time to time we'll bring on special guests who will talk at length on something relevant for our community.
Cloud Posse operates an inclusive, welcoming community of 6000+ people (as of 2022). These are our true fans and valuable contributors making what we do possible.
Early on we recognized the value of community, so we started SweetOps.
You can join us by going signing up on our Slack invitation page.
Engagements
- Take our quiz to find out if we are a good fit!
- Book a discovery call to go over your exact challenges.
- If we can help, we'll execute a Mutual NDA (ours or yours), then collaborate with you on our Engagement Workbook using Google Docs.
- Once we agree on the general scope, we'll prepare a comprehensive Statement of Work (SOW) detailing the entire project.
- Master Services Agreement (MSA) and SOW are executed, we'll send an invoice for the deposit and first Sprint.
- Work will commence shortly thereafter.
We work with companies anywhere in the world.
While most of our customers are based in the United States, we've worked with companies in the United Kingdom, Germany, Australia, Hong Kong, India, Argentina, etc. Our team is distributed across the US and Eastern Europe.
We can start as soon as you sign our Statement of Work. Typically we see this process take 2-3 weeks from the first introductory call to the start of our engagement.
Here's our checklist we'll need to complete before we can start.
- Execute Mutual NDA (ours or yours)
- Collaborate on Engagement Workbook via Google Docs
- Execute Statement of Work, and Master Services Agreement
- Deposit Payment
- Kick-off!
We can kick off the initial introductory call immediately, so please make sure that you schedule it today.
After talking with you and assessing if we're a proper fit, we'll execute a Mutual NDA and then send over an Engagement Workbook so we can gather all the requirements for your project and estimate the cost.
We can add easily additional sprints to a Scope of Work. We just need to agree on what goes into a Sprint which will determine the number of Sprints required.
Our typical engagement model begins with a complete platform rollout. This includes roughly 6-8 sprints, each one 1-2 weeks in duration. During this time we set up all AWS Accounts with IAM federation, Cloud Trail audit logs, a comprehensive release engineering process, total observability with our Site Reliability Engineering (SRE) sprint, Remote Access Management (Teleport and KeyCloak), GitOps Operations by Pull Request.
The first engagement takes roughly 3-4 months to complete. These engagements have extremely well-defined project plans. Ask us and we can show you what that looks like.
Customers most often decide to keep us on after the initial engagement for follow up work.
We provide entirely optional ongoing support for customers who've gone through our DevOps Accelerator.
By in large, most of our customers take over the day to day management of their infrastructure.
We're here though to help out anywhere you need it.
We do not provide 24×7 “on-call” (aka PagerDuty) support.
We offer all of our customers’ ongoing support for as long as they need it. Choose what's right for you.
- We provide free weekly support via our “Office Hours” webinars every Wednesday at 11:30 am PST. These calls last one hour and we'll answer as many of your questions as we can.
- We also provide optional support retainers which include a fixed block of hours that go towards maintenance and support. You'll have direct access to our team via a shared Slack channel in addition to the ability to schedule one-on-one calls via Zoom.
Can you help me understand where the boundaries of CloudPosse's responsibilities end, and where ours would start?
Cloud Posse's mission is to help companies own their infrastructure. We accelerate this journey by architecting your 4 layers with you and by taking the lead on the implementation. Since we have an opinionated framework, customers will need to learn how to leverage everything for their use cases. This will sometimes mean altering how you build and deploy your services.
Getting Started With Us
We always prefer to start with a green-field approach, where we build your infrastructure from the ground up together with your team. As part of our process, we'll walk you through all of the required design decisions, ensuring you have sufficient context to make informed decisions. This is why we expect our customers to have someone on their engineering team invested in the outcome. This part is absolutely critical, as it ensures what we deliver suits your business needs. Everything we do is delivered by pull request for your review and we will happily provide documentation on anything you want. Along the way, we'll assign homework exercises and provide ample documentation. This approach provides the best opportunity to gain a deep hands-on understanding of our solution.
We encourage you to ask as many questions as you want and challenge our assumptions. You also can volunteer for any task you want to take on as “homework” and we'll help you out as needed.
When You Own It
Once our job is done, this is where you take the driver's seat. We'll help you get everything set up for a smooth transition from your heritage environment to your shiny new infrastructure. Rest assured that we'll stick around until your team is confident and has the know-how to operate these platforms in production. We don't expect teams to pick this up overnight, that's why we'll stay engaged for as long as you need. We're happy to answer questions and jump on Zoom for pair programming sessions.
Day-2 Operations
After our engagement, you will have a solid foundation powering your apps, and all the tools you need for infrastructure operations. This means your team is responsible for the ongoing maintenance, including upgrades (e.g. EKS clusters, and all open-source software), patching systems, incident response, triaging, SRE (e.g. adding monitors and alerts), as well as security operations (responding to incidents, staying on top of vulnerabilities/ CVEs). Cloud Posse is continuously updating its Open Source module ecosystem, but it's your responsibility to regularly update your infrastructure. Staying on top of these things is critical for a successful long-term outcome, with minimal technical debt.
For companies that want to focus more on their business and less on maintenance, we provide ongoing support engagements exclusively for customers that have completed our accelerator.
Check out our approach to learn more!
Can you walk through the typical lifecycle of a small change that you might help us with, specifically with how it relates to coordinating changes between your team and ours?
Every change in your environment starts with submitting a pull request as our solution is built with a fully GitOps driven approach. Depending on the CODEOWNERS
configuration for the repository, branch protections will require all pull requests to have approvals by specific stakeholders, in addition to requiring all checks to pass. We also try to automate as much of the review process as possible. For example, when the pull request is opened, it automatically kicks off a job to validate the change against your environment so you can see the impact of any change.
The coordination needed is simply about figuring out who will be responsible for each part of the release process. The tooling handles the rest and we have a policy-driven approach (Open Policy Agent) to enforce it.
This includes:
- Who will submit the pull request, which is entirely dependent on your comfort level with the change, or if you prefer us to take the lead.
- Reviewing the pull request and applying changes to it as needed.
- Approving and merging the pull request.
- Validating and confirming the changes.
The toolchain in your CI/CD process provides Slack notifications and full audit history of everything that happens to give you optimal visibility and traceability.
Lastly, where applicable we implement blue/green rollout strategies for releases, but there are edge cases where a change could be disruptive to active development or live services. In such cases, these would be carefully coordinated to be released at an approved time.
For Developers
Anyone is free to fork our repositories and try themselves, but our support eliminates the guesswork and shortens the time it takes to implement correctly.
Think of it like this: anyone can walk into a hardware store and pick up the materials to build a house. Very few people can build a house that won't fall down if they don't have the experience of using all the tools and hardware correctly. We fill the gap by providing the knowledge and experience to get you where you want to be faster than doing it yourself.
- We start with baseline using modules from our repo that are ready to use for our customers. This is simply a remote reference to our modules. These modules are well maintained and all changes upstream are managed using releases, so you can do version pinning
- We find as we work with customers over time that as their confidence increases, they begin modifying these modules to fit their own needs, which is expected.
Cloud Posse does offer documentation as part of the engagements but the audience is for experienced developers, so if different documentation is required, these can be created upon request.
SweetOps is a community that is run by Cloud Posse. It exists as a place for our users to collaborate and ask questions related to our large collection of open source projects on GitHub, but also to talk shop and get feedback on anything DevOps related.
Cloud Posse is not affiliated with the github.com/terraform-aws-modules repository, but many users of this organization participate in our SweetOps slack team in the #terraform-aws-modules channel.
Our community of over 6000+ people is open to everyone. You do not need to be a customer to benefit from our community. We have a VERY active and helpful public slack community that is welcoming of all skill levels and backgrounds.
- Go to slack.cloudposse.com to register.
- Sign up for our weekly “office hours”
p.s. if you're interested in a career at Cloud Posse, this is a great first step.
- Gruntwork doesn't provide open access to all their modules, they are a subscription service. Cloud Posse open sources everything.
- All of our code is in GitHub and can be forked and used with no concerns about licensing issues (APACHE2).
- Gruntwork's Reference Architecture requires Terragrunt
- Gruntwork is not a consulting company. They do not help with hands-on implementation. That's left up to you.
- We provide a comprehensive project plan consisting of hundreds of implementation tasks and design decisions that we execute together with your team.
- Our Slack community is free for anyone to join, not just paying customers.
- Because our work is Open Source, there's a lower barrier to getting started. That's why it's in use by thousands and thousands of companies. We receive dozens of Pull Requests every week enhancing our modules and fixing bugs.
That's a great question! Here's our philosophy:
- Learn by doing, not just by reading. First identify what you want to achieve (because you need a goal), then read and research enough to get started and go from there.
- Study our terraform modules. Every single one of our modules is a reference example for how to design and implement composable, re-usable, testable modules.
- Get started early writing tests. It's a habit hard to introduce later. We use terratest and each of our modules has a simple example of that.
- HashiCorp has invested heavily in their online curriculum and even offers certifications now. Their docs are free, check them out!
- Check out our weekly #office-hours→ cloudposse.com/office-hours (podcast.cloudposse.com and youtube.com/c/cloudposse) they are free and you can ask questions and get answers from our community of experts.
- Hangout in watering holes like this one. You'll learn a lot in a short amount of time.
We'll answer this based on our experience.
For Terraform Continuous Integration (CI), we use GitHub Actions with all of our modules. This works very well for us since we rely on GitHub. Then on a nightly basis, we run aws-nuke to clean up our environments, since failing tests frequently orphan resources that cost money and can conflict with other tests.
For a proper Terraform Continuous Delivery (CD) workflow, we think your best bet is to start with a SaaS solution and learn from that. Your options are Terraform Cloud, Scalr, Spacelift. Terraform CD is non-trivial to do well. You can easily stick Terraform into any pipeline, but a well-built terraform CD pipeline will have a terraform plan
→ planfile → approval → apply
workflow. You'll need to stash the planfile somewhere and the planfile may contain secrets.
Unfortunately, we're not able to take on small engagements. You can, however, join us every single week for 100% free “Office Hours”—where we seek to answer your questions. Just register for an invitation.
We hold our “Office Hours” every Wednesday at 11:30 am PT via Zoom. We're typically 30+ people on the call and all skill levels are welcome.
Checkout our past recordings on YouTube or subscribe to our Podcast.
We recommend you first sign up for our slack team. Then join the #office-hours slack channel and ask your question there. We'll make sure to address it on our next call. Of course, you can always just join us this week and ask us there.
You're other option is to register for our FREE weekly office hours. We host these calls every Wednesday at 11:30am PT.
Cloud Posse is based in Los Angeles (PT), which is ~GMT-8 depending on the time of year. =) However, our community of more than 3,200 members is truly global. There's always someone online at any given time of the day.
We hold these sessions every Wednesday from 11:30am – 12:30pm PST (GMT-8).
Make sure to register to receive a calendar invitation.
Can you walk through the typical lifecycle of a small change that you might help us with, specifically with how it relates to coordinating changes between your team and ours?
Every change in your environment starts with submitting a pull request as our solution is built with a fully GitOps driven approach. Depending on the CODEOWNERS
configuration for the repository, branch protections will require all pull requests to have approvals by specific stakeholders, in addition to requiring all checks to pass. We also try to automate as much of the review process as possible. For example, when the pull request is opened, it automatically kicks off a job to validate the change against your environment so you can see the impact of any change.
The coordination needed is simply about figuring out who will be responsible for each part of the release process. The tooling handles the rest and we have a policy-driven approach (Open Policy Agent) to enforce it.
This includes:
- Who will submit the pull request, which is entirely dependent on your comfort level with the change, or if you prefer us to take the lead.
- Reviewing the pull request and applying changes to it as needed.
- Approving and merging the pull request.
- Validating and confirming the changes.
The toolchain in your CI/CD process provides Slack notifications and full audit history of everything that happens to give you optimal visibility and traceability.
Lastly, where applicable we implement blue/green rollout strategies for releases, but there are edge cases where a change could be disruptive to active development or live services. In such cases, these would be carefully coordinated to be released at an approved time.
Foundational Infrastructure
We work with companies anywhere in the world.
While most of our customers are based in the United States, we've worked with companies in the United Kingdom, Germany, Australia, Hong Kong, India, Argentina, etc. Our team is distributed across the US and Eastern Europe.
We can start as soon as you sign our Statement of Work. Typically we see this process take 2-3 weeks from the first introductory call to the start of our engagement.
Here's our checklist we'll need to complete before we can start.
- Execute Mutual NDA (ours or yours)
- Collaborate on Engagement Workbook via Google Docs
- Execute Statement of Work, and Master Services Agreement
- Deposit Payment
- Kick-off!
We can kick off the initial introductory call immediately, so please make sure that you schedule it today.
After talking with you and assessing if we're a proper fit, we'll execute a Mutual NDA and then send over an Engagement Workbook so we can gather all the requirements for your project and estimate the cost.
Anyone is free to fork our repositories and try themselves, but our support eliminates the guesswork and shortens the time it takes to implement correctly.
Think of it like this: anyone can walk into a hardware store and pick up the materials to build a house. Very few people can build a house that won't fall down if they don't have the experience of using all the tools and hardware correctly. We fill the gap by providing the knowledge and experience to get you where you want to be faster than doing it yourself.
Cloud Posse does offer documentation as part of the engagements but the audience is for experienced developers, so if different documentation is required, these can be created upon request.
It really depends on when a contract begins and who on our team is on the bench. Generally, we like to put (2) engineers on a project so we have cross-training and continuity in the event a member needs to take time off. Our team is geographically distributed across the continental US as well as Eastern Europe. Throughout the course of a project, we may move team members between projects depending on their subject matter expertise.
We provide entirely optional ongoing support for customers who've gone through our DevOps Accelerator.
By in large, most of our customers take over the day to day management of their infrastructure.
We're here though to help out anywhere you need it.
We do not provide 24×7 “on-call” (aka PagerDuty) support.
- Slack. You will have direct access to the team via a shared Slack channel between our respective teams.
- Zoom. We'll have weekly scheduled cadence calls via Zoom to review the current progress, blockers and give product demos in your environment. These calls can be recorded and shared with your team.
- Google Drive. We also recommend creating a shared Team Drive folder via Google Docs for the sharing of relevant design docs, agendas or other materials.
- Trello. We manage the project via a Trello Team created specifically for each engagement. We invite your team and our team to this team and create (1) board per sprint. This allows us to standardize our process while providing transparency along the way.
- Office Hours. Most engagements include a “Documentation & Training” sprint, we arrange a weekly “Office Hours” via Zoom (recorded) to answer any questions your team may have as they begin to kick the tires.
We'll deliver the end-to-end solution you've seen in all of our demos. It will be preconfigured for your environments under your AWS accounts. We'll create new GitHub repos that will contain all the infrastructure code you need.
Along the way, we'll show you the ropes and how to operate it. In the long run, you'll be responsible for operating it but we'll stick around for as long as you need our help.
- Gruntwork doesn't provide open access to all their modules, they are a subscription service. Cloud Posse open sources everything.
- All of our code is in GitHub and can be forked and used with no concerns about licensing issues (APACHE2).
- Gruntwork's Reference Architecture requires Terragrunt
- Gruntwork is not a consulting company. They do not help with hands-on implementation. That's left up to you.
- We provide a comprehensive project plan consisting of hundreds of implementation tasks and design decisions that we execute together with your team.
- Our Slack community is free for anyone to join, not just paying customers.
- Because our work is Open Source, there's a lower barrier to getting started. That's why it's in use by thousands and thousands of companies. We receive dozens of Pull Requests every week enhancing our modules and fixing bugs.
We work with companies who need to own their infrastructure as their competitive advantage.
Our customers are typically post-Series A technology startups who are seeing success in the market and need to accelerate their DevOps adoption in order to take their company to the next level.
They are backed by some of the biggest names in the industry and are solving really difficult problems with technology.
No, it's absolutely FREE for anyone to attend.
Can you help me understand where the boundaries of CloudPosse's responsibilities end, and where ours would start?
Cloud Posse's mission is to help companies own their infrastructure. We accelerate this journey by architecting your 4 layers with you and by taking the lead on the implementation. Since we have an opinionated framework, customers will need to learn how to leverage everything for their use cases. This will sometimes mean altering how you build and deploy your services.
Getting Started With Us
We always prefer to start with a green-field approach, where we build your infrastructure from the ground up together with your team. As part of our process, we'll walk you through all of the required design decisions, ensuring you have sufficient context to make informed decisions. This is why we expect our customers to have someone on their engineering team invested in the outcome. This part is absolutely critical, as it ensures what we deliver suits your business needs. Everything we do is delivered by pull request for your review and we will happily provide documentation on anything you want. Along the way, we'll assign homework exercises and provide ample documentation. This approach provides the best opportunity to gain a deep hands-on understanding of our solution.
We encourage you to ask as many questions as you want and challenge our assumptions. You also can volunteer for any task you want to take on as “homework” and we'll help you out as needed.
When You Own It
Once our job is done, this is where you take the driver's seat. We'll help you get everything set up for a smooth transition from your heritage environment to your shiny new infrastructure. Rest assured that we'll stick around until your team is confident and has the know-how to operate these platforms in production. We don't expect teams to pick this up overnight, that's why we'll stay engaged for as long as you need. We're happy to answer questions and jump on Zoom for pair programming sessions.
Day-2 Operations
After our engagement, you will have a solid foundation powering your apps, and all the tools you need for infrastructure operations. This means your team is responsible for the ongoing maintenance, including upgrades (e.g. EKS clusters, and all open-source software), patching systems, incident response, triaging, SRE (e.g. adding monitors and alerts), as well as security operations (responding to incidents, staying on top of vulnerabilities/ CVEs). Cloud Posse is continuously updating its Open Source module ecosystem, but it's your responsibility to regularly update your infrastructure. Staying on top of these things is critical for a successful long-term outcome, with minimal technical debt.
For companies that want to focus more on their business and less on maintenance, we provide ongoing support engagements exclusively for customers that have completed our accelerator.
Check out our approach to learn more!
Can you walk through the typical lifecycle of a small change that you might help us with, specifically with how it relates to coordinating changes between your team and ours?
Every change in your environment starts with submitting a pull request as our solution is built with a fully GitOps driven approach. Depending on the CODEOWNERS
configuration for the repository, branch protections will require all pull requests to have approvals by specific stakeholders, in addition to requiring all checks to pass. We also try to automate as much of the review process as possible. For example, when the pull request is opened, it automatically kicks off a job to validate the change against your environment so you can see the impact of any change.
The coordination needed is simply about figuring out who will be responsible for each part of the release process. The tooling handles the rest and we have a policy-driven approach (Open Policy Agent) to enforce it.
This includes:
- Who will submit the pull request, which is entirely dependent on your comfort level with the change, or if you prefer us to take the lead.
- Reviewing the pull request and applying changes to it as needed.
- Approving and merging the pull request.
- Validating and confirming the changes.
The toolchain in your CI/CD process provides Slack notifications and full audit history of everything that happens to give you optimal visibility and traceability.
Lastly, where applicable we implement blue/green rollout strategies for releases, but there are edge cases where a change could be disruptive to active development or live services. In such cases, these would be carefully coordinated to be released at an approved time.
Spacelift checks off all the boxes for managing extremely large environments with a lot of state management. Since Cloud Posse's focus is on deploying large-scale loosely coupled infrastructure components with Terraform, it's common to have several hundred terraform states under management.
Every successful business in existence uses accounting software to manage its finances and understand the health of its business. The sheer number of transactions makes it infeasible to reconcile the books by hand. The same is true of modern infrastructure. With hundreds of states managed programmatically with terraform, and modified constantly by different teams or individuals, the same kind of state reconciliation is required to know the health of its infrastructure. This need goes far beyond continuous delivery and few companies have solved it. With Spacelift, you have an up-to-date view of your assets, liabilities & tech debt across all environments.
Major benefits
- Drift Detection runs on a customizable schedule surfaces inconsistencies with what's deployed and what's in git.
- Reconciliation helps you know what's deployed, what's failing, and what's queued.
- Plan Approvals ensures changes are released when you expect them
- Policy Driven Framework based on OPA (open source standard) is used to trigger runs and enforce permissions. This is like IAM for GitOps.
- Terraform Graph Visualization makes it easier to visualize the entire state across components
- Audit Logs of every change traced back to the commit and filterable by time
- Affordable alternative to other commercial offerings
- Works with more than Terraform (e.g. Pulumi)
- Pull Request Previews show what the proposed changes are before committing them
- Decoupling of Deploy from Release ensures we can merge to trunk and still control when those changes are propogated to environments
- Ephemeral Environments (Auto Deployment, Auto Destruction) enables us to bring up infrastructure with terraform and destroy it when it's no longer needed
- Self-hosted Runners ensure we're in full control over what is executed in our own VPC, with no public endpoints
What level of access do the Spacelift worker pools have?
Spacelift Workers are deployed in your environment with the level of permission that we grant them via IAM instance profiles. When provisioning any infrastructure that requires modifying IAM, the minimum permission is administrative. Thus, workers are provisioned with administrative permissions in all accounts that we grant access to since the terraform we provision requires creating IAM roles and policies. Note, this is not a constraint of Spacelift; this is required regardless of the platform that performs the automation.
What happens if Spacelift as a product goes away?
First off, while Spacelift might be a newer brand in the infrastructure space, it's used by publicly traded companies, Healthcare companies, banks, institutions, Fortune 500 companies, etc. So, Spacelift is not going away.
But just to entertain the hypothetical, let's consider what would happen. Since we manage all terraform states in S3, we have the “break glass” capability to leave the platform at any time and can always run terraform manually. Of course, we would lose all the benefits.
How tough would it be to move everything to a different platform?
Fortunately, with Spacelift, we can still use S3 as our standard state backend. So if at any time we need to move off of the platform, it's easy. Of course, we'd give up all the benefits but the key here is we're not locked into it.
Why not just use Atlantis?
We used to predominately recommend Atlantis but stopped doing so a number of years ago. The project was more or less dormant for 2-3 years, and only recently started accepting any Pull Requests. Atlantis was the first project to define a GitOps workflow for Terraform, but it's been left in the dust compared to newer alternatives.
- With Alantis, there is no regular reconcilation of what terraform state has been applied or not applied. So we really have no idea in atlantis the actual state of anything. With a recent customer, we helped migrate them from Atlantis to Spacelift and it took 2 months to reconcile all the infrastructure that had drifted.
- With Atlantis, there's no drift detection, but with spacelift, we detect it nightly (or as frequently as we want)
- With Atlantis, there's no way to manage dependencies of components, so that when one component changes, any other components that depend on it should be updated.
- With Atlantis, there's no way to setup OPA policies to trigger runs. The OPA support in atlantis is very basic.
- With Atlantis, anyone who can run a plan, can exfiltrate your root credentials. This talked about by others and was recently highlighted at the Defcon 2021 conference.
- With Atlantis, there's no way to limit who can run terraform plan or apply. If you have access to the repo, you can run a terraform plan. If your plan is approved, you can run terraform apply. Cloud Posse even tried to fix it (and maintained our own fork for some time), but the dicussion went no where and we moved on.
- With Atlantis, there's no way to restrict who has access to unlock workspaces via the web GUI. The only way is to install your own authetnication proxy in front of it or restrict it in your load balancer.
- With Atlantis, you have to expose the webhook endpoint publically to GitHub.
What about using GitHub Actions?
We provide a suitable alternative to Spacelift using GitHub Actions for companies looking to unify their deployments under one common platform.
It's an entirely free and open-source alternative that uses atmos and works with self-hosted GitHub Runners as well as GitHub Cloud.
What about using GitLab/Jenkins/etc?
There are plenty of examples of using other tools to implement continuous delivery for Terraform. However, it's solving for all the edge cases which makes it so complicated and therefore seldom, if ever handled by these approaches.
- Where will you store the plan files which are required for approvals? (plan → approve → apply workflow) Note, these planfiles may contain root-level credentials to things like RDS databases, which cannot be avoided.
- How will you clean up those planfiles? Should they persist after a terraform apply succeeds or crashes?
- How will you implement approval steps? If the approval is denied, how will you clean up the terraform planfile?
- If you have multiple open PRs (e.g. many plans) for one workspace, after applying one, all other plans need to be invalidated. How will you implement that invalidation?
- Git is only one source of truth for infrastructure as code. Data sources are another (e.g. terraform remote state). How will you reconcile that your state is current and update it when it drifts? When it drifts, how will you be notified?
- How will you know that your infrastructure changes are applied everywhere? If a build fails, but the code is already merged, how do you escalate and ensure it's resolved?
- If you need to lock an environment from being updated, how will you do it?
- How will you suggest the changes? If the plan is to comment on the PR, that gets VERY noisy, and everyone subscribed will receive the notification. Runs may also accidentally leak secrets in the output. GitHub comments are limited to 65K bytes, which means large plans will need to be split across multiple comments.
- What happens if multiple PRs are merged that want to modify the same environment? How will you enforce ordered consistency?
- How will you restrict who can run terraform plans and applies? Furthermore, how will you restrict it to specific environments?
- How will you provide the short-lived IAM credentials to the terraform processes? e.g. any hardcoded credentials exposed will be a major liability
Why not use Terraform Cloud?
Terraform Cloud is prohibitively expensive for most non-enterprise customers we work with, and possibly 10x the cost of Spacelift. Terraform Cloud for Teams doesn't permit self-hosted runners and requires hardcoded IAM credentials in each workspace. That's insane and we cannot recommend it. Terraform Cloud for Business (and higher) support self-hosted runners, which can leverage AWS IAM Instance profiles, but the number of runners is a significant factor of the cost. When leveraging several hundred loosely-coupled terraform workspaces, there is a significant need for a lot of workers for short periods of time. Unfortunately, even if those are only online for a short period of time, you need to commit to paying for them for the full month on an annualized basis. Terraform Cloud also requires that you use their state backend, which means there's no way to “break glass” and run Terraform if they are down. If you want to migrate off of Terraform Cloud, you need to migrate the state of hundreds of workspaces out of the platform and into another state backend.
Foundational Platform
We work with companies anywhere in the world.
While most of our customers are based in the United States, we've worked with companies in the United Kingdom, Germany, Australia, Hong Kong, India, Argentina, etc. Our team is distributed across the US and Eastern Europe.
We can start as soon as you sign our Statement of Work. Typically we see this process take 2-3 weeks from the first introductory call to the start of our engagement.
Here's our checklist we'll need to complete before we can start.
- Execute Mutual NDA (ours or yours)
- Collaborate on Engagement Workbook via Google Docs
- Execute Statement of Work, and Master Services Agreement
- Deposit Payment
- Kick-off!
We can kick off the initial introductory call immediately, so please make sure that you schedule it today.
After talking with you and assessing if we're a proper fit, we'll execute a Mutual NDA and then send over an Engagement Workbook so we can gather all the requirements for your project and estimate the cost.
Anyone is free to fork our repositories and try themselves, but our support eliminates the guesswork and shortens the time it takes to implement correctly.
Think of it like this: anyone can walk into a hardware store and pick up the materials to build a house. Very few people can build a house that won't fall down if they don't have the experience of using all the tools and hardware correctly. We fill the gap by providing the knowledge and experience to get you where you want to be faster than doing it yourself.
Cloud Posse does offer documentation as part of the engagements but the audience is for experienced developers, so if different documentation is required, these can be created upon request.
It really depends on when a contract begins and who on our team is on the bench. Generally, we like to put (2) engineers on a project so we have cross-training and continuity in the event a member needs to take time off. Our team is geographically distributed across the continental US as well as Eastern Europe. Throughout the course of a project, we may move team members between projects depending on their subject matter expertise.
We provide entirely optional ongoing support for customers who've gone through our DevOps Accelerator.
By in large, most of our customers take over the day to day management of their infrastructure.
We're here though to help out anywhere you need it.
We do not provide 24×7 “on-call” (aka PagerDuty) support.
- Slack. You will have direct access to the team via a shared Slack channel between our respective teams.
- Zoom. We'll have weekly scheduled cadence calls via Zoom to review the current progress, blockers and give product demos in your environment. These calls can be recorded and shared with your team.
- Google Drive. We also recommend creating a shared Team Drive folder via Google Docs for the sharing of relevant design docs, agendas or other materials.
- Trello. We manage the project via a Trello Team created specifically for each engagement. We invite your team and our team to this team and create (1) board per sprint. This allows us to standardize our process while providing transparency along the way.
- Office Hours. Most engagements include a “Documentation & Training” sprint, we arrange a weekly “Office Hours” via Zoom (recorded) to answer any questions your team may have as they begin to kick the tires.
We'll deliver the end-to-end solution you've seen in all of our demos. It will be preconfigured for your environments under your AWS accounts. We'll create new GitHub repos that will contain all the infrastructure code you need.
Along the way, we'll show you the ropes and how to operate it. In the long run, you'll be responsible for operating it but we'll stick around for as long as you need our help.
- Gruntwork doesn't provide open access to all their modules, they are a subscription service. Cloud Posse open sources everything.
- All of our code is in GitHub and can be forked and used with no concerns about licensing issues (APACHE2).
- Gruntwork's Reference Architecture requires Terragrunt
- Gruntwork is not a consulting company. They do not help with hands-on implementation. That's left up to you.
- We provide a comprehensive project plan consisting of hundreds of implementation tasks and design decisions that we execute together with your team.
- Our Slack community is free for anyone to join, not just paying customers.
- Because our work is Open Source, there's a lower barrier to getting started. That's why it's in use by thousands and thousands of companies. We receive dozens of Pull Requests every week enhancing our modules and fixing bugs.
We'll answer this based on our experience.
For Terraform Continuous Integration (CI), we use GitHub Actions with all of our modules. This works very well for us since we rely on GitHub. Then on a nightly basis, we run aws-nuke to clean up our environments, since failing tests frequently orphan resources that cost money and can conflict with other tests.
For a proper Terraform Continuous Delivery (CD) workflow, we think your best bet is to start with a SaaS solution and learn from that. Your options are Terraform Cloud, Scalr, Spacelift. Terraform CD is non-trivial to do well. You can easily stick Terraform into any pipeline, but a well-built terraform CD pipeline will have a terraform plan
→ planfile → approval → apply
workflow. You'll need to stash the planfile somewhere and the planfile may contain secrets.
We work with companies who need to own their infrastructure as their competitive advantage.
Our customers are typically post-Series A technology startups who are seeing success in the market and need to accelerate their DevOps adoption in order to take their company to the next level.
They are backed by some of the biggest names in the industry and are solving really difficult problems with technology.
No, it's absolutely FREE for anyone to attend.
Can you help me understand where the boundaries of CloudPosse's responsibilities end, and where ours would start?
Cloud Posse's mission is to help companies own their infrastructure. We accelerate this journey by architecting your 4 layers with you and by taking the lead on the implementation. Since we have an opinionated framework, customers will need to learn how to leverage everything for their use cases. This will sometimes mean altering how you build and deploy your services.
Getting Started With Us
We always prefer to start with a green-field approach, where we build your infrastructure from the ground up together with your team. As part of our process, we'll walk you through all of the required design decisions, ensuring you have sufficient context to make informed decisions. This is why we expect our customers to have someone on their engineering team invested in the outcome. This part is absolutely critical, as it ensures what we deliver suits your business needs. Everything we do is delivered by pull request for your review and we will happily provide documentation on anything you want. Along the way, we'll assign homework exercises and provide ample documentation. This approach provides the best opportunity to gain a deep hands-on understanding of our solution.
We encourage you to ask as many questions as you want and challenge our assumptions. You also can volunteer for any task you want to take on as “homework” and we'll help you out as needed.
When You Own It
Once our job is done, this is where you take the driver's seat. We'll help you get everything set up for a smooth transition from your heritage environment to your shiny new infrastructure. Rest assured that we'll stick around until your team is confident and has the know-how to operate these platforms in production. We don't expect teams to pick this up overnight, that's why we'll stay engaged for as long as you need. We're happy to answer questions and jump on Zoom for pair programming sessions.
Day-2 Operations
After our engagement, you will have a solid foundation powering your apps, and all the tools you need for infrastructure operations. This means your team is responsible for the ongoing maintenance, including upgrades (e.g. EKS clusters, and all open-source software), patching systems, incident response, triaging, SRE (e.g. adding monitors and alerts), as well as security operations (responding to incidents, staying on top of vulnerabilities/ CVEs). Cloud Posse is continuously updating its Open Source module ecosystem, but it's your responsibility to regularly update your infrastructure. Staying on top of these things is critical for a successful long-term outcome, with minimal technical debt.
For companies that want to focus more on their business and less on maintenance, we provide ongoing support engagements exclusively for customers that have completed our accelerator.
Check out our approach to learn more!
Can you walk through the typical lifecycle of a small change that you might help us with, specifically with how it relates to coordinating changes between your team and ours?
Every change in your environment starts with submitting a pull request as our solution is built with a fully GitOps driven approach. Depending on the CODEOWNERS
configuration for the repository, branch protections will require all pull requests to have approvals by specific stakeholders, in addition to requiring all checks to pass. We also try to automate as much of the review process as possible. For example, when the pull request is opened, it automatically kicks off a job to validate the change against your environment so you can see the impact of any change.
The coordination needed is simply about figuring out who will be responsible for each part of the release process. The tooling handles the rest and we have a policy-driven approach (Open Policy Agent) to enforce it.
This includes:
- Who will submit the pull request, which is entirely dependent on your comfort level with the change, or if you prefer us to take the lead.
- Reviewing the pull request and applying changes to it as needed.
- Approving and merging the pull request.
- Validating and confirming the changes.
The toolchain in your CI/CD process provides Slack notifications and full audit history of everything that happens to give you optimal visibility and traceability.
Lastly, where applicable we implement blue/green rollout strategies for releases, but there are edge cases where a change could be disruptive to active development or live services. In such cases, these would be carefully coordinated to be released at an approved time.
Spacelift checks off all the boxes for managing extremely large environments with a lot of state management. Since Cloud Posse's focus is on deploying large-scale loosely coupled infrastructure components with Terraform, it's common to have several hundred terraform states under management.
Every successful business in existence uses accounting software to manage its finances and understand the health of its business. The sheer number of transactions makes it infeasible to reconcile the books by hand. The same is true of modern infrastructure. With hundreds of states managed programmatically with terraform, and modified constantly by different teams or individuals, the same kind of state reconciliation is required to know the health of its infrastructure. This need goes far beyond continuous delivery and few companies have solved it. With Spacelift, you have an up-to-date view of your assets, liabilities & tech debt across all environments.
Major benefits
- Drift Detection runs on a customizable schedule surfaces inconsistencies with what's deployed and what's in git.
- Reconciliation helps you know what's deployed, what's failing, and what's queued.
- Plan Approvals ensures changes are released when you expect them
- Policy Driven Framework based on OPA (open source standard) is used to trigger runs and enforce permissions. This is like IAM for GitOps.
- Terraform Graph Visualization makes it easier to visualize the entire state across components
- Audit Logs of every change traced back to the commit and filterable by time
- Affordable alternative to other commercial offerings
- Works with more than Terraform (e.g. Pulumi)
- Pull Request Previews show what the proposed changes are before committing them
- Decoupling of Deploy from Release ensures we can merge to trunk and still control when those changes are propogated to environments
- Ephemeral Environments (Auto Deployment, Auto Destruction) enables us to bring up infrastructure with terraform and destroy it when it's no longer needed
- Self-hosted Runners ensure we're in full control over what is executed in our own VPC, with no public endpoints
What level of access do the Spacelift worker pools have?
Spacelift Workers are deployed in your environment with the level of permission that we grant them via IAM instance profiles. When provisioning any infrastructure that requires modifying IAM, the minimum permission is administrative. Thus, workers are provisioned with administrative permissions in all accounts that we grant access to since the terraform we provision requires creating IAM roles and policies. Note, this is not a constraint of Spacelift; this is required regardless of the platform that performs the automation.
What happens if Spacelift as a product goes away?
First off, while Spacelift might be a newer brand in the infrastructure space, it's used by publicly traded companies, Healthcare companies, banks, institutions, Fortune 500 companies, etc. So, Spacelift is not going away.
But just to entertain the hypothetical, let's consider what would happen. Since we manage all terraform states in S3, we have the “break glass” capability to leave the platform at any time and can always run terraform manually. Of course, we would lose all the benefits.
How tough would it be to move everything to a different platform?
Fortunately, with Spacelift, we can still use S3 as our standard state backend. So if at any time we need to move off of the platform, it's easy. Of course, we'd give up all the benefits but the key here is we're not locked into it.
Why not just use Atlantis?
We used to predominately recommend Atlantis but stopped doing so a number of years ago. The project was more or less dormant for 2-3 years, and only recently started accepting any Pull Requests. Atlantis was the first project to define a GitOps workflow for Terraform, but it's been left in the dust compared to newer alternatives.
- With Alantis, there is no regular reconcilation of what terraform state has been applied or not applied. So we really have no idea in atlantis the actual state of anything. With a recent customer, we helped migrate them from Atlantis to Spacelift and it took 2 months to reconcile all the infrastructure that had drifted.
- With Atlantis, there's no drift detection, but with spacelift, we detect it nightly (or as frequently as we want)
- With Atlantis, there's no way to manage dependencies of components, so that when one component changes, any other components that depend on it should be updated.
- With Atlantis, there's no way to setup OPA policies to trigger runs. The OPA support in atlantis is very basic.
- With Atlantis, anyone who can run a plan, can exfiltrate your root credentials. This talked about by others and was recently highlighted at the Defcon 2021 conference.
- With Atlantis, there's no way to limit who can run terraform plan or apply. If you have access to the repo, you can run a terraform plan. If your plan is approved, you can run terraform apply. Cloud Posse even tried to fix it (and maintained our own fork for some time), but the dicussion went no where and we moved on.
- With Atlantis, there's no way to restrict who has access to unlock workspaces via the web GUI. The only way is to install your own authetnication proxy in front of it or restrict it in your load balancer.
- With Atlantis, you have to expose the webhook endpoint publically to GitHub.
What about using GitHub Actions?
We provide a suitable alternative to Spacelift using GitHub Actions for companies looking to unify their deployments under one common platform.
It's an entirely free and open-source alternative that uses atmos and works with self-hosted GitHub Runners as well as GitHub Cloud.
What about using GitLab/Jenkins/etc?
There are plenty of examples of using other tools to implement continuous delivery for Terraform. However, it's solving for all the edge cases which makes it so complicated and therefore seldom, if ever handled by these approaches.
- Where will you store the plan files which are required for approvals? (plan → approve → apply workflow) Note, these planfiles may contain root-level credentials to things like RDS databases, which cannot be avoided.
- How will you clean up those planfiles? Should they persist after a terraform apply succeeds or crashes?
- How will you implement approval steps? If the approval is denied, how will you clean up the terraform planfile?
- If you have multiple open PRs (e.g. many plans) for one workspace, after applying one, all other plans need to be invalidated. How will you implement that invalidation?
- Git is only one source of truth for infrastructure as code. Data sources are another (e.g. terraform remote state). How will you reconcile that your state is current and update it when it drifts? When it drifts, how will you be notified?
- How will you know that your infrastructure changes are applied everywhere? If a build fails, but the code is already merged, how do you escalate and ensure it's resolved?
- If you need to lock an environment from being updated, how will you do it?
- How will you suggest the changes? If the plan is to comment on the PR, that gets VERY noisy, and everyone subscribed will receive the notification. Runs may also accidentally leak secrets in the output. GitHub comments are limited to 65K bytes, which means large plans will need to be split across multiple comments.
- What happens if multiple PRs are merged that want to modify the same environment? How will you enforce ordered consistency?
- How will you restrict who can run terraform plans and applies? Furthermore, how will you restrict it to specific environments?
- How will you provide the short-lived IAM credentials to the terraform processes? e.g. any hardcoded credentials exposed will be a major liability
Why not use Terraform Cloud?
Terraform Cloud is prohibitively expensive for most non-enterprise customers we work with, and possibly 10x the cost of Spacelift. Terraform Cloud for Teams doesn't permit self-hosted runners and requires hardcoded IAM credentials in each workspace. That's insane and we cannot recommend it. Terraform Cloud for Business (and higher) support self-hosted runners, which can leverage AWS IAM Instance profiles, but the number of runners is a significant factor of the cost. When leveraging several hundred loosely-coupled terraform workspaces, there is a significant need for a lot of workers for short periods of time. Unfortunately, even if those are only online for a short period of time, you need to commit to paying for them for the full month on an annualized basis. Terraform Cloud also requires that you use their state backend, which means there's no way to “break glass” and run Terraform if they are down. If you want to migrate off of Terraform Cloud, you need to migrate the state of hundreds of workspaces out of the platform and into another state backend.
Foundational Release Engineering
We work with companies anywhere in the world.
While most of our customers are based in the United States, we've worked with companies in the United Kingdom, Germany, Australia, Hong Kong, India, Argentina, etc. Our team is distributed across the US and Eastern Europe.
Anyone is free to fork our repositories and try themselves, but our support eliminates the guesswork and shortens the time it takes to implement correctly.
Think of it like this: anyone can walk into a hardware store and pick up the materials to build a house. Very few people can build a house that won't fall down if they don't have the experience of using all the tools and hardware correctly. We fill the gap by providing the knowledge and experience to get you where you want to be faster than doing it yourself.
Cloud Posse does offer documentation as part of the engagements but the audience is for experienced developers, so if different documentation is required, these can be created upon request.
It really depends on when a contract begins and who on our team is on the bench. Generally, we like to put (2) engineers on a project so we have cross-training and continuity in the event a member needs to take time off. Our team is geographically distributed across the continental US as well as Eastern Europe. Throughout the course of a project, we may move team members between projects depending on their subject matter expertise.
We provide entirely optional ongoing support for customers who've gone through our DevOps Accelerator.
By in large, most of our customers take over the day to day management of their infrastructure.
We're here though to help out anywhere you need it.
We do not provide 24×7 “on-call” (aka PagerDuty) support.
We'll deliver the end-to-end solution you've seen in all of our demos. It will be preconfigured for your environments under your AWS accounts. We'll create new GitHub repos that will contain all the infrastructure code you need.
Along the way, we'll show you the ropes and how to operate it. In the long run, you'll be responsible for operating it but we'll stick around for as long as you need our help.
- Gruntwork doesn't provide open access to all their modules, they are a subscription service. Cloud Posse open sources everything.
- All of our code is in GitHub and can be forked and used with no concerns about licensing issues (APACHE2).
- Gruntwork's Reference Architecture requires Terragrunt
- Gruntwork is not a consulting company. They do not help with hands-on implementation. That's left up to you.
- We provide a comprehensive project plan consisting of hundreds of implementation tasks and design decisions that we execute together with your team.
- Our Slack community is free for anyone to join, not just paying customers.
- Because our work is Open Source, there's a lower barrier to getting started. That's why it's in use by thousands and thousands of companies. We receive dozens of Pull Requests every week enhancing our modules and fixing bugs.
We'll answer this based on our experience.
For Terraform Continuous Integration (CI), we use GitHub Actions with all of our modules. This works very well for us since we rely on GitHub. Then on a nightly basis, we run aws-nuke to clean up our environments, since failing tests frequently orphan resources that cost money and can conflict with other tests.
For a proper Terraform Continuous Delivery (CD) workflow, we think your best bet is to start with a SaaS solution and learn from that. Your options are Terraform Cloud, Scalr, Spacelift. Terraform CD is non-trivial to do well. You can easily stick Terraform into any pipeline, but a well-built terraform CD pipeline will have a terraform plan
→ planfile → approval → apply
workflow. You'll need to stash the planfile somewhere and the planfile may contain secrets.
We work with companies who need to own their infrastructure as their competitive advantage.
Our customers are typically post-Series A technology startups who are seeing success in the market and need to accelerate their DevOps adoption in order to take their company to the next level.
They are backed by some of the biggest names in the industry and are solving really difficult problems with technology.
No, it's absolutely FREE for anyone to attend.
Can you help me understand where the boundaries of CloudPosse's responsibilities end, and where ours would start?
Cloud Posse's mission is to help companies own their infrastructure. We accelerate this journey by architecting your 4 layers with you and by taking the lead on the implementation. Since we have an opinionated framework, customers will need to learn how to leverage everything for their use cases. This will sometimes mean altering how you build and deploy your services.
Getting Started With Us
We always prefer to start with a green-field approach, where we build your infrastructure from the ground up together with your team. As part of our process, we'll walk you through all of the required design decisions, ensuring you have sufficient context to make informed decisions. This is why we expect our customers to have someone on their engineering team invested in the outcome. This part is absolutely critical, as it ensures what we deliver suits your business needs. Everything we do is delivered by pull request for your review and we will happily provide documentation on anything you want. Along the way, we'll assign homework exercises and provide ample documentation. This approach provides the best opportunity to gain a deep hands-on understanding of our solution.
We encourage you to ask as many questions as you want and challenge our assumptions. You also can volunteer for any task you want to take on as “homework” and we'll help you out as needed.
When You Own It
Once our job is done, this is where you take the driver's seat. We'll help you get everything set up for a smooth transition from your heritage environment to your shiny new infrastructure. Rest assured that we'll stick around until your team is confident and has the know-how to operate these platforms in production. We don't expect teams to pick this up overnight, that's why we'll stay engaged for as long as you need. We're happy to answer questions and jump on Zoom for pair programming sessions.
Day-2 Operations
After our engagement, you will have a solid foundation powering your apps, and all the tools you need for infrastructure operations. This means your team is responsible for the ongoing maintenance, including upgrades (e.g. EKS clusters, and all open-source software), patching systems, incident response, triaging, SRE (e.g. adding monitors and alerts), as well as security operations (responding to incidents, staying on top of vulnerabilities/ CVEs). Cloud Posse is continuously updating its Open Source module ecosystem, but it's your responsibility to regularly update your infrastructure. Staying on top of these things is critical for a successful long-term outcome, with minimal technical debt.
For companies that want to focus more on their business and less on maintenance, we provide ongoing support engagements exclusively for customers that have completed our accelerator.
Check out our approach to learn more!
Foundational Security & Compliance
We work with companies anywhere in the world.
While most of our customers are based in the United States, we've worked with companies in the United Kingdom, Germany, Australia, Hong Kong, India, Argentina, etc. Our team is distributed across the US and Eastern Europe.
Anyone is free to fork our repositories and try themselves, but our support eliminates the guesswork and shortens the time it takes to implement correctly.
Think of it like this: anyone can walk into a hardware store and pick up the materials to build a house. Very few people can build a house that won't fall down if they don't have the experience of using all the tools and hardware correctly. We fill the gap by providing the knowledge and experience to get you where you want to be faster than doing it yourself.
Cloud Posse does offer documentation as part of the engagements but the audience is for experienced developers, so if different documentation is required, these can be created upon request.
It really depends on when a contract begins and who on our team is on the bench. Generally, we like to put (2) engineers on a project so we have cross-training and continuity in the event a member needs to take time off. Our team is geographically distributed across the continental US as well as Eastern Europe. Throughout the course of a project, we may move team members between projects depending on their subject matter expertise.
We provide entirely optional ongoing support for customers who've gone through our DevOps Accelerator.
By in large, most of our customers take over the day to day management of their infrastructure.
We're here though to help out anywhere you need it.
We do not provide 24×7 “on-call” (aka PagerDuty) support.
We'll deliver the end-to-end solution you've seen in all of our demos. It will be preconfigured for your environments under your AWS accounts. We'll create new GitHub repos that will contain all the infrastructure code you need.
Along the way, we'll show you the ropes and how to operate it. In the long run, you'll be responsible for operating it but we'll stick around for as long as you need our help.
- Gruntwork doesn't provide open access to all their modules, they are a subscription service. Cloud Posse open sources everything.
- All of our code is in GitHub and can be forked and used with no concerns about licensing issues (APACHE2).
- Gruntwork's Reference Architecture requires Terragrunt
- Gruntwork is not a consulting company. They do not help with hands-on implementation. That's left up to you.
- We provide a comprehensive project plan consisting of hundreds of implementation tasks and design decisions that we execute together with your team.
- Our Slack community is free for anyone to join, not just paying customers.
- Because our work is Open Source, there's a lower barrier to getting started. That's why it's in use by thousands and thousands of companies. We receive dozens of Pull Requests every week enhancing our modules and fixing bugs.
We work with companies who need to own their infrastructure as their competitive advantage.
Our customers are typically post-Series A technology startups who are seeing success in the market and need to accelerate their DevOps adoption in order to take their company to the next level.
They are backed by some of the biggest names in the industry and are solving really difficult problems with technology.
No, it's absolutely FREE for anyone to attend.
Can you help me understand where the boundaries of CloudPosse's responsibilities end, and where ours would start?
Cloud Posse's mission is to help companies own their infrastructure. We accelerate this journey by architecting your 4 layers with you and by taking the lead on the implementation. Since we have an opinionated framework, customers will need to learn how to leverage everything for their use cases. This will sometimes mean altering how you build and deploy your services.
Getting Started With Us
We always prefer to start with a green-field approach, where we build your infrastructure from the ground up together with your team. As part of our process, we'll walk you through all of the required design decisions, ensuring you have sufficient context to make informed decisions. This is why we expect our customers to have someone on their engineering team invested in the outcome. This part is absolutely critical, as it ensures what we deliver suits your business needs. Everything we do is delivered by pull request for your review and we will happily provide documentation on anything you want. Along the way, we'll assign homework exercises and provide ample documentation. This approach provides the best opportunity to gain a deep hands-on understanding of our solution.
We encourage you to ask as many questions as you want and challenge our assumptions. You also can volunteer for any task you want to take on as “homework” and we'll help you out as needed.
When You Own It
Once our job is done, this is where you take the driver's seat. We'll help you get everything set up for a smooth transition from your heritage environment to your shiny new infrastructure. Rest assured that we'll stick around until your team is confident and has the know-how to operate these platforms in production. We don't expect teams to pick this up overnight, that's why we'll stay engaged for as long as you need. We're happy to answer questions and jump on Zoom for pair programming sessions.
Day-2 Operations
After our engagement, you will have a solid foundation powering your apps, and all the tools you need for infrastructure operations. This means your team is responsible for the ongoing maintenance, including upgrades (e.g. EKS clusters, and all open-source software), patching systems, incident response, triaging, SRE (e.g. adding monitors and alerts), as well as security operations (responding to incidents, staying on top of vulnerabilities/ CVEs). Cloud Posse is continuously updating its Open Source module ecosystem, but it's your responsibility to regularly update your infrastructure. Staying on top of these things is critical for a successful long-term outcome, with minimal technical debt.
For companies that want to focus more on their business and less on maintenance, we provide ongoing support engagements exclusively for customers that have completed our accelerator.
Check out our approach to learn more!
Spacelift checks off all the boxes for managing extremely large environments with a lot of state management. Since Cloud Posse's focus is on deploying large-scale loosely coupled infrastructure components with Terraform, it's common to have several hundred terraform states under management.
Every successful business in existence uses accounting software to manage its finances and understand the health of its business. The sheer number of transactions makes it infeasible to reconcile the books by hand. The same is true of modern infrastructure. With hundreds of states managed programmatically with terraform, and modified constantly by different teams or individuals, the same kind of state reconciliation is required to know the health of its infrastructure. This need goes far beyond continuous delivery and few companies have solved it. With Spacelift, you have an up-to-date view of your assets, liabilities & tech debt across all environments.
Major benefits
- Drift Detection runs on a customizable schedule surfaces inconsistencies with what's deployed and what's in git.
- Reconciliation helps you know what's deployed, what's failing, and what's queued.
- Plan Approvals ensures changes are released when you expect them
- Policy Driven Framework based on OPA (open source standard) is used to trigger runs and enforce permissions. This is like IAM for GitOps.
- Terraform Graph Visualization makes it easier to visualize the entire state across components
- Audit Logs of every change traced back to the commit and filterable by time
- Affordable alternative to other commercial offerings
- Works with more than Terraform (e.g. Pulumi)
- Pull Request Previews show what the proposed changes are before committing them
- Decoupling of Deploy from Release ensures we can merge to trunk and still control when those changes are propogated to environments
- Ephemeral Environments (Auto Deployment, Auto Destruction) enables us to bring up infrastructure with terraform and destroy it when it's no longer needed
- Self-hosted Runners ensure we're in full control over what is executed in our own VPC, with no public endpoints
What level of access do the Spacelift worker pools have?
Spacelift Workers are deployed in your environment with the level of permission that we grant them via IAM instance profiles. When provisioning any infrastructure that requires modifying IAM, the minimum permission is administrative. Thus, workers are provisioned with administrative permissions in all accounts that we grant access to since the terraform we provision requires creating IAM roles and policies. Note, this is not a constraint of Spacelift; this is required regardless of the platform that performs the automation.
What happens if Spacelift as a product goes away?
First off, while Spacelift might be a newer brand in the infrastructure space, it's used by publicly traded companies, Healthcare companies, banks, institutions, Fortune 500 companies, etc. So, Spacelift is not going away.
But just to entertain the hypothetical, let's consider what would happen. Since we manage all terraform states in S3, we have the “break glass” capability to leave the platform at any time and can always run terraform manually. Of course, we would lose all the benefits.
How tough would it be to move everything to a different platform?
Fortunately, with Spacelift, we can still use S3 as our standard state backend. So if at any time we need to move off of the platform, it's easy. Of course, we'd give up all the benefits but the key here is we're not locked into it.
Why not just use Atlantis?
We used to predominately recommend Atlantis but stopped doing so a number of years ago. The project was more or less dormant for 2-3 years, and only recently started accepting any Pull Requests. Atlantis was the first project to define a GitOps workflow for Terraform, but it's been left in the dust compared to newer alternatives.
- With Alantis, there is no regular reconcilation of what terraform state has been applied or not applied. So we really have no idea in atlantis the actual state of anything. With a recent customer, we helped migrate them from Atlantis to Spacelift and it took 2 months to reconcile all the infrastructure that had drifted.
- With Atlantis, there's no drift detection, but with spacelift, we detect it nightly (or as frequently as we want)
- With Atlantis, there's no way to manage dependencies of components, so that when one component changes, any other components that depend on it should be updated.
- With Atlantis, there's no way to setup OPA policies to trigger runs. The OPA support in atlantis is very basic.
- With Atlantis, anyone who can run a plan, can exfiltrate your root credentials. This talked about by others and was recently highlighted at the Defcon 2021 conference.
- With Atlantis, there's no way to limit who can run terraform plan or apply. If you have access to the repo, you can run a terraform plan. If your plan is approved, you can run terraform apply. Cloud Posse even tried to fix it (and maintained our own fork for some time), but the dicussion went no where and we moved on.
- With Atlantis, there's no way to restrict who has access to unlock workspaces via the web GUI. The only way is to install your own authetnication proxy in front of it or restrict it in your load balancer.
- With Atlantis, you have to expose the webhook endpoint publically to GitHub.
What about using GitHub Actions?
We provide a suitable alternative to Spacelift using GitHub Actions for companies looking to unify their deployments under one common platform.
It's an entirely free and open-source alternative that uses atmos and works with self-hosted GitHub Runners as well as GitHub Cloud.
What about using GitLab/Jenkins/etc?
There are plenty of examples of using other tools to implement continuous delivery for Terraform. However, it's solving for all the edge cases which makes it so complicated and therefore seldom, if ever handled by these approaches.
- Where will you store the plan files which are required for approvals? (plan → approve → apply workflow) Note, these planfiles may contain root-level credentials to things like RDS databases, which cannot be avoided.
- How will you clean up those planfiles? Should they persist after a terraform apply succeeds or crashes?
- How will you implement approval steps? If the approval is denied, how will you clean up the terraform planfile?
- If you have multiple open PRs (e.g. many plans) for one workspace, after applying one, all other plans need to be invalidated. How will you implement that invalidation?
- Git is only one source of truth for infrastructure as code. Data sources are another (e.g. terraform remote state). How will you reconcile that your state is current and update it when it drifts? When it drifts, how will you be notified?
- How will you know that your infrastructure changes are applied everywhere? If a build fails, but the code is already merged, how do you escalate and ensure it's resolved?
- If you need to lock an environment from being updated, how will you do it?
- How will you suggest the changes? If the plan is to comment on the PR, that gets VERY noisy, and everyone subscribed will receive the notification. Runs may also accidentally leak secrets in the output. GitHub comments are limited to 65K bytes, which means large plans will need to be split across multiple comments.
- What happens if multiple PRs are merged that want to modify the same environment? How will you enforce ordered consistency?
- How will you restrict who can run terraform plans and applies? Furthermore, how will you restrict it to specific environments?
- How will you provide the short-lived IAM credentials to the terraform processes? e.g. any hardcoded credentials exposed will be a major liability
Why not use Terraform Cloud?
Terraform Cloud is prohibitively expensive for most non-enterprise customers we work with, and possibly 10x the cost of Spacelift. Terraform Cloud for Teams doesn't permit self-hosted runners and requires hardcoded IAM credentials in each workspace. That's insane and we cannot recommend it. Terraform Cloud for Business (and higher) support self-hosted runners, which can leverage AWS IAM Instance profiles, but the number of runners is a significant factor of the cost. When leveraging several hundred loosely-coupled terraform workspaces, there is a significant need for a lot of workers for short periods of time. Unfortunately, even if those are only online for a short period of time, you need to commit to paying for them for the full month on an annualized basis. Terraform Cloud also requires that you use their state backend, which means there's no way to “break glass” and run Terraform if they are down. If you want to migrate off of Terraform Cloud, you need to migrate the state of hundreds of workspaces out of the platform and into another state backend.
Foundational SRE
We work with companies anywhere in the world.
While most of our customers are based in the United States, we've worked with companies in the United Kingdom, Germany, Australia, Hong Kong, India, Argentina, etc. Our team is distributed across the US and Eastern Europe.
Anyone is free to fork our repositories and try themselves, but our support eliminates the guesswork and shortens the time it takes to implement correctly.
Think of it like this: anyone can walk into a hardware store and pick up the materials to build a house. Very few people can build a house that won't fall down if they don't have the experience of using all the tools and hardware correctly. We fill the gap by providing the knowledge and experience to get you where you want to be faster than doing it yourself.
Cloud Posse does offer documentation as part of the engagements but the audience is for experienced developers, so if different documentation is required, these can be created upon request.
It really depends on when a contract begins and who on our team is on the bench. Generally, we like to put (2) engineers on a project so we have cross-training and continuity in the event a member needs to take time off. Our team is geographically distributed across the continental US as well as Eastern Europe. Throughout the course of a project, we may move team members between projects depending on their subject matter expertise.
We provide entirely optional ongoing support for customers who've gone through our DevOps Accelerator.
By in large, most of our customers take over the day to day management of their infrastructure.
We're here though to help out anywhere you need it.
We do not provide 24×7 “on-call” (aka PagerDuty) support.
We'll deliver the end-to-end solution you've seen in all of our demos. It will be preconfigured for your environments under your AWS accounts. We'll create new GitHub repos that will contain all the infrastructure code you need.
Along the way, we'll show you the ropes and how to operate it. In the long run, you'll be responsible for operating it but we'll stick around for as long as you need our help.
- Gruntwork doesn't provide open access to all their modules, they are a subscription service. Cloud Posse open sources everything.
- All of our code is in GitHub and can be forked and used with no concerns about licensing issues (APACHE2).
- Gruntwork's Reference Architecture requires Terragrunt
- Gruntwork is not a consulting company. They do not help with hands-on implementation. That's left up to you.
- We provide a comprehensive project plan consisting of hundreds of implementation tasks and design decisions that we execute together with your team.
- Our Slack community is free for anyone to join, not just paying customers.
- Because our work is Open Source, there's a lower barrier to getting started. That's why it's in use by thousands and thousands of companies. We receive dozens of Pull Requests every week enhancing our modules and fixing bugs.
We work with companies who need to own their infrastructure as their competitive advantage.
Our customers are typically post-Series A technology startups who are seeing success in the market and need to accelerate their DevOps adoption in order to take their company to the next level.
They are backed by some of the biggest names in the industry and are solving really difficult problems with technology.
No, it's absolutely FREE for anyone to attend.
Can you help me understand where the boundaries of CloudPosse's responsibilities end, and where ours would start?
Cloud Posse's mission is to help companies own their infrastructure. We accelerate this journey by architecting your 4 layers with you and by taking the lead on the implementation. Since we have an opinionated framework, customers will need to learn how to leverage everything for their use cases. This will sometimes mean altering how you build and deploy your services.
Getting Started With Us
We always prefer to start with a green-field approach, where we build your infrastructure from the ground up together with your team. As part of our process, we'll walk you through all of the required design decisions, ensuring you have sufficient context to make informed decisions. This is why we expect our customers to have someone on their engineering team invested in the outcome. This part is absolutely critical, as it ensures what we deliver suits your business needs. Everything we do is delivered by pull request for your review and we will happily provide documentation on anything you want. Along the way, we'll assign homework exercises and provide ample documentation. This approach provides the best opportunity to gain a deep hands-on understanding of our solution.
We encourage you to ask as many questions as you want and challenge our assumptions. You also can volunteer for any task you want to take on as “homework” and we'll help you out as needed.
When You Own It
Once our job is done, this is where you take the driver's seat. We'll help you get everything set up for a smooth transition from your heritage environment to your shiny new infrastructure. Rest assured that we'll stick around until your team is confident and has the know-how to operate these platforms in production. We don't expect teams to pick this up overnight, that's why we'll stay engaged for as long as you need. We're happy to answer questions and jump on Zoom for pair programming sessions.
Day-2 Operations
After our engagement, you will have a solid foundation powering your apps, and all the tools you need for infrastructure operations. This means your team is responsible for the ongoing maintenance, including upgrades (e.g. EKS clusters, and all open-source software), patching systems, incident response, triaging, SRE (e.g. adding monitors and alerts), as well as security operations (responding to incidents, staying on top of vulnerabilities/ CVEs). Cloud Posse is continuously updating its Open Source module ecosystem, but it's your responsibility to regularly update your infrastructure. Staying on top of these things is critical for a successful long-term outcome, with minimal technical debt.
For companies that want to focus more on their business and less on maintenance, we provide ongoing support engagements exclusively for customers that have completed our accelerator.
Check out our approach to learn more!
General
Cloud Posse's typical engagement is for greenfield projects.
The typical duration of our initial rollout is 3-4 months, broken down into 2-week sprints. Each sprint is focused on specific deliverables that are summarized in this list: https://cloudposse.com/what-we-do/
The whole package is recommended but not every item on this list is required to be delivered in every engagement, this is per-customer requirements. We work with your team to help them own the solution we build together once the engagement winds down, but we're always here to help!
Community support is available through our internal and public Slack communities (slack.sweetops.com) and our public Office Hours are available every Wednesday at 11:30 AM Pacific Time, you can also listen to previous sessions on our podcast or on our YouTube channel.
After an engagement ends, we offer optional ongoing support, and starting new projects is always an option as well
Our community of over 6000+ people is open to everyone. You do not need to be a customer to benefit from our community. We have a VERY active and helpful public slack community that is welcoming of all skill levels and backgrounds.
- Go to slack.cloudposse.com to register.
- Sign up for our weekly “office hours”
p.s. if you're interested in a career at Cloud Posse, this is a great first step.
Cloud Posse is based in Los Angeles (PT), which is ~GMT-8 depending on the time of year. =) However, our community of more than 3,200 members is truly global. There's always someone online at any given time of the day.
Jobs
Our community of over 6000+ people is open to everyone. You do not need to be a customer to benefit from our community. We have a VERY active and helpful public slack community that is welcoming of all skill levels and backgrounds.
- Go to slack.cloudposse.com to register.
- Sign up for our weekly “office hours”
p.s. if you're interested in a career at Cloud Posse, this is a great first step.
Cloud Posse, LLC is based in Los Angeles, CA (USA). We are a 100% remote company. As such, our team is distributed across the the continental United States as well as Eastern Europe.
Yes! We're a 100% remote company.
Please submit your resume as part of the application process by going to our jobs portal.
Someone from our team will contact you if your skills and experience meet the requirements of an open position. Make sure you've applied applied only to positions that match your background, credentials and interests. We receive applicants every day, so please bear with us as we work through the backlog.
No, we do not sponsor H-1B visas. Please let our recruiter know upfront that you require H-1B sponsorship so that they can determine what if any alternative positions are available for remote, off-shore work.
From time-to-time we offer entry-level positions, however, generally, most positions are for senior-level DevOps. Please check our jobs portal to find out if we have any entry-level positions available.
Search and apply for positions on our jobs portal. You can easily apply to available positions simply by using your LinkedIn profile.
No, positions are for full-time subcontracting.
We offer a generous amount of paid time off (PTO), including observance of 10 national holidays.
Legal
We can add easily additional sprints to a Scope of Work. We just need to agree on what goes into a Sprint which will determine the number of Sprints required.
We'll sign a few contracts before we get started:
- Mutual NDA that describes how we'll govern sensitive information
- Master Services Agreement that describes the terms for all engagements we'll do down the road
- Statement of Work that describes at length exactly what you'll receive.
Our standard payment terms are Net 15.
Payment for each Sprint is due before work commences.
We can add easily additional sprints to a Scope of Work. We just need to agree on what goes into a Sprint which will determine the number of Sprints required.
Time & Materials (T&M) projects align our goals with yours. Our company will work with you until your needs are met irrespective of the actual deliverables, providing the greatest likelihood of a successful outcome—everyone's end goal. This will allow you to realize a final product that meets or exceeds your expectations and prevents you from being held financially responsible for software that does not help you reach your objectives. It rewards the company for working harder to satisfy those needs or saves you money if the needs are met with less effort. More importantly, it gives you the maximum agility to decide what you need as the project progresses and to pivot at any time. Is there some new feature you just thought of and want right away? We'll get right on that. Something you thought you were going to need but now realize can wait until next year? We're happy to skip that and move on to what you care about most, even if that is different from what it was last week.
Fixed-fee (or Fixed-bid) projects are based on the outdated “waterfall” model where you define everything you are going to need to finish the project before work on the deliverables even starts. On top of locking you into a rigid set of deliverables before you are even sure that is what you want, they require significant extra time and effort (as much as 50% of the total project time) to define “acceptance criteria”, which is a mutually agreed set of tests that, when passed, define the project as finished. In addition, fixed-price bids transfer completion risk to the company, so a company is wise to double the estimated T&M and add 20% (the extra 20% is for all the time spent negotiating the acceptance criteria.) 🙂 Fixed-price bids incentivize the company to ignore your actual needs and focus on delivering the bare minimum to satisfy the acceptance criteria. Was there something important to you that you forgot to capture in the acceptance criteria? Sorry, that is “out of scope”; we will get to it on the next project. This is how the big consulting companies got so big. They know your “waterfall” project will fail, leading to a follow-on waterfall project to fix it, except that, too, will fail for the same reasons, leading to a never-ending stream of work for the consultants. They fatten their profits by charging for all the extra work which fixed-bid contracts require, and keep you on the hook by taking advantage of the “sunk cost fallacy.”
Time and Materials Not To Exceed (T&M NTE) has a couple of ways of working. At its worst, it has all the problems of a Fixed-fee project but it takes away any incentive for the company to give you a good rate. It caps the company’s profits but not their losses. No company can agree to this model and stay in business.
There is a second form of T&M NTE, that larger companies with more complex governance, budget, and financial control systems might prefer. It’s mostly the same, except with built-in circuit breakers ensuring that budget and finance departments will retain oversight so that if a project balloons in scope, appropriate people will be called in to review and triage features before the project becomes an unexpected drain on resources. It is important for all concerned to understand that there is no commitment to “finishing” a T&M NTE project because such a project does not have a defined, agreed-upon endpoint. Still, the company is incentivized to meet expectations with the given budget and to help identify cost savings where appropriate, in the hope of securing additional work on the ongoing project. Larger companies with more bureaucratic management and more sophisticated budgetary and financial controls may prefer this, while smaller, more nimble companies know that a standard T&M contract is tacitly a Not To Exceed in that it can be canceled at any time, for exceeding the budget or any other reason.
We prefer T&M projects because it eliminates the need for either side to argue about ambiguities in the project definitions. Nobody needs to be convinced that the deliverables are acceptable or unacceptable. Instead, we deliver to the best of our ability what we understood our customers wanted. If the customer wants something else, whether it is because their needs changed in the interim or they asked for the wrong thing in the first place, we can just accept that they want something different and get right to work delivering that. Your acceptance criteria can be whatever you want, from fully-automated tests to feedback surveys, and anything you want to be done differently, we are happy to do it. When working with T&M, we are truly on your side.
TL;DR: Unfortunately, we can’t grant ownership of our work product because it is built mostly on open source or other pre-existing materials derived from other engagements.
Even if we scope everything to work committed in your GitHub repositories, it's not cut-and-dry. That's because almost nothing is new or original in our Scope of Work—even that which we commit to your repo, and anything that is new will most likely be derivatives of other pre-existing materials.
Since we run multiple concurrent engagements and because two customers could easily require or acquire the same architecture despite totally different applications and industries, we need to have provisions that protect past, present, and future customers. This is why we own it, and give our customers irrevocable perpetual use of it. Obviously, we can’t include our customer's proprietary materials in any open-source.
So, this is why the best we can do for you and all of our other clients is to grant an irrevocable perpetual license to use our work product but we can’t grant ownership. We need to be very clear that it's fundamental to our entire business model as an open-source consultancy specializing in DevOps Acceleration, that we maintain ownership in order to protect all of our customers from infringement.
Office Hours
We recommend you first sign up for our slack team. Then join the #office-hours slack channel and ask your question there. We'll make sure to address it on our next call. Of course, you can always just join us this week and ask us there.
You're other option is to register for our FREE weekly office hours. We host these calls every Wednesday at 11:30am PT.
Cloud Posse is based in Los Angeles (PT), which is ~GMT-8 depending on the time of year. =) However, our community of more than 3,200 members is truly global. There's always someone online at any given time of the day.
We hold these sessions every Wednesday from 11:30am – 12:30pm PST (GMT-8).
Make sure to register to receive a calendar invitation.
You can find all of our past recordings on YouTube. Subscribe to our YouTube Channel so you can stay current on the latest discussions.
Alternatively, tune in to our Podcast, which also has all episodes.
Ask anything related/adjacent to DevOps, Release Engineering, CI/CD, Cloud Automation, Terraform, Kubernetes, Helm, etc. These all all topics in our domain of expertise.
Yes, we record all sessions and post them within 24 hours to our YouTube channel and Podcast.
No, it's absolutely FREE for anyone to attend.
Cloud Posse holds public “Office Hours” every Wednesday at 11:30 am PT to answer questions on all things related to DevOps, Terraform, Kubernetes, CI/CD. Basically, it's like an interactive “Lunch & Learn” session where we get together for about an hour and talk shop with our community. These are totally free and just an opportunity to ask us (or our community of experts) any questions you may have.
You can register here: https://cloudposse.com/office-hours
Join the conversation in our SweetOps slack #office-hours
channel: https://slack.cloudposse.com/
What to expect…
- Live Q&A. Ask questions and get answers. We're usually about 30+ people on the call.
- News & Announcements. We'll share cool things we're working on at Cloud Posse along with any cool announcements or projects we come across.
- Live Demos. Watch live demos of some of the things we're building here at Cloud Posse. Also, tune in if you want to share any open source projects you're working on.
- Special Guest Speakers. From time to time we'll bring on special guests who will talk at length on something relevant for our community.
Presales
We charge a fraction of what it costs to do it in-house and deliver it in half the time or faster. Check out our calculator just to see how much it can cost to build your own cloud infrastructure. Make sure you review some of the risks of doing it yourself.
We practice “agile” development. We charge a flat fee per sprint but allow for scope changes (which are billed separately) at customer request. A typical engagement consists of 8-10 sprints that are 2 weeks (80 hours) in duration.
We can start as soon as you sign our Statement of Work. Typically we see this process take 2-3 weeks from the first introductory call to the start of our engagement.
Here's our checklist we'll need to complete before we can start.
- Execute Mutual NDA (ours or yours)
- Collaborate on Engagement Workbook via Google Docs
- Execute Statement of Work, and Master Services Agreement
- Deposit Payment
- Kick-off!
We can kick off the initial introductory call immediately, so please make sure that you schedule it today.
After talking with you and assessing if we're a proper fit, we'll execute a Mutual NDA and then send over an Engagement Workbook so we can gather all the requirements for your project and estimate the cost.
We believe in total transparency.
For this reason, you can expect no hidden fees from us.
IMPORTANT: Depending on the features you want to be implemented, certain third-party software subscriptions may be required (SaaS).
We do not include these costs in our contract because they are negotiated between your company and the vendor. Sometimes you may qualify for “startup” pricing.
Examples include:
- AWS
- Datadog
- NewRelic
- Sumologic
- Splunk
- Codefresh
- Teleport
- Kubecost
- Mailgun
- PagerDuty
- Pingdom
Anyone is free to fork our repositories and try themselves, but our support eliminates the guesswork and shortens the time it takes to implement correctly.
Think of it like this: anyone can walk into a hardware store and pick up the materials to build a house. Very few people can build a house that won't fall down if they don't have the experience of using all the tools and hardware correctly. We fill the gap by providing the knowledge and experience to get you where you want to be faster than doing it yourself.
Cloud Posse's typical engagement is for greenfield projects.
The typical duration of our initial rollout is 3-4 months, broken down into 2-week sprints. Each sprint is focused on specific deliverables that are summarized in this list: https://cloudposse.com/what-we-do/
The whole package is recommended but not every item on this list is required to be delivered in every engagement, this is per-customer requirements. We work with your team to help them own the solution we build together once the engagement winds down, but we're always here to help!
Community support is available through our internal and public Slack communities (slack.sweetops.com) and our public Office Hours are available every Wednesday at 11:30 AM Pacific Time, you can also listen to previous sessions on our podcast or on our YouTube channel.
After an engagement ends, we offer optional ongoing support, and starting new projects is always an option as well
Our experience ensures you reach your goals in record time. Time is money. Salaries are by far the biggest cost for most startups. Think about how much you would pay to do this in-house and combine that with how long it will take you. During that time, your teams will be blocked or at the very least slowed down. Plus, you don't even have a predictable outcome. You can easily quantify this as your opportunity cost.
Our solution will pay for itself. You get a predictable solution delivered in record time for a fair price. Your engineers will be unblocked sooner and you'll be able to move faster.
Make sure you include all costs associated with your project.
- What is the cost of recruiting your team?
- What is your team’s fully-loaded cost?
- How long will it take to build and train the team?
- Will they stick around long enough to see the project through?
- What happens when everyone goes on holiday or takes a vacation?
- Will you have enough work for them when the project is over?
Our total project costs predictable. You'll know upfront what to expect and there are no surprises.
When you hire Cloud Posse, you're buying an outcome that few others can provide. What a company is really buying from Cloud Posse is an end-to-end solution that includes time for implementation and integration. This is a solution that has cost our customers millions of dollars to implement and we are selling for a tiny fraction of the cost to implement it in-house.
We are not a traditional “DevOps as a Service” company that only does the grunt work; we provide thought leadership combined with expert execution and implementation. We have chosen to use an “Open Source” licensing model to simplify the software distribution because we provide 10x the value in our implementation.
During the course of our engagement, our customers have direct access to our team with tremendous experience in cloud architecture & implementation. Companies hire us to implement in a span of only 3-4 months would take even the most senior experienced team DevOps engineers years to develop, which makes our offering insanely affordable by comparison. By partnering with Cloud Posse, you're sparing all the hard “lessons learned” to achieve a greater outcome in a shorter amount of time with less risk.
You will find the industry-standard rate for experienced independent contractors/freelancers is around $150-250/hr. Note, when you hire freelancers they don't bring to the table the unparalleled library of code and experience that you get when you partner with Cloud Posse. We put our best foot forward on GitHub so you see exactly what you’re getting. Plus, freelancers and employees cannot offer business continuity, which leaves your company with no one to turn to when/if they leave or go on vacation. While a company might shave off a little bit on the hourly rate by going with an independent contractor, it's several orders of magnitude more expensive to implement a custom solution that is remotely comparable to what we offer; that solution will have greater uncertainty and result in greater risk for your business.
- Gruntwork doesn't provide open access to all their modules, they are a subscription service. Cloud Posse open sources everything.
- All of our code is in GitHub and can be forked and used with no concerns about licensing issues (APACHE2).
- Gruntwork's Reference Architecture requires Terragrunt
- Gruntwork is not a consulting company. They do not help with hands-on implementation. That's left up to you.
- We provide a comprehensive project plan consisting of hundreds of implementation tasks and design decisions that we execute together with your team.
- Our Slack community is free for anyone to join, not just paying customers.
- Because our work is Open Source, there's a lower barrier to getting started. That's why it's in use by thousands and thousands of companies. We receive dozens of Pull Requests every week enhancing our modules and fixing bugs.
Our goal is not to sell you on a solution that you don’t need or one that will frankly be overkill for you. We've worked with several customers that were pre-product and helped them launch successfully. What's important is that owning your infrastructure needs to be a competitive advantage. We work best with companies that have some experience running their apps in containers, using AWS in some capacity, companies that flexible in adopting the open-source tools we deliver as part of our solution.
Your best bet is to schedule a discovery call and we'll quickly assess if we're a good fit for your company.
Unfortunately, we're not able to take on small engagements. You can, however, join us every single week for 100% free “Office Hours”—where we seek to answer your questions. Just register for an invitation.
We hold our “Office Hours” every Wednesday at 11:30 am PT via Zoom. We're typically 30+ people on the call and all skill levels are welcome.
Checkout our past recordings on YouTube or subscribe to our Podcast.
We work with companies who need to own their infrastructure as their competitive advantage.
Our customers are typically post-Series A technology startups who are seeing success in the market and need to accelerate their DevOps adoption in order to take their company to the next level.
They are backed by some of the biggest names in the industry and are solving really difficult problems with technology.
Can you help me understand where the boundaries of CloudPosse's responsibilities end, and where ours would start?
Cloud Posse's mission is to help companies own their infrastructure. We accelerate this journey by architecting your 4 layers with you and by taking the lead on the implementation. Since we have an opinionated framework, customers will need to learn how to leverage everything for their use cases. This will sometimes mean altering how you build and deploy your services.
Getting Started With Us
We always prefer to start with a green-field approach, where we build your infrastructure from the ground up together with your team. As part of our process, we'll walk you through all of the required design decisions, ensuring you have sufficient context to make informed decisions. This is why we expect our customers to have someone on their engineering team invested in the outcome. This part is absolutely critical, as it ensures what we deliver suits your business needs. Everything we do is delivered by pull request for your review and we will happily provide documentation on anything you want. Along the way, we'll assign homework exercises and provide ample documentation. This approach provides the best opportunity to gain a deep hands-on understanding of our solution.
We encourage you to ask as many questions as you want and challenge our assumptions. You also can volunteer for any task you want to take on as “homework” and we'll help you out as needed.
When You Own It
Once our job is done, this is where you take the driver's seat. We'll help you get everything set up for a smooth transition from your heritage environment to your shiny new infrastructure. Rest assured that we'll stick around until your team is confident and has the know-how to operate these platforms in production. We don't expect teams to pick this up overnight, that's why we'll stay engaged for as long as you need. We're happy to answer questions and jump on Zoom for pair programming sessions.
Day-2 Operations
After our engagement, you will have a solid foundation powering your apps, and all the tools you need for infrastructure operations. This means your team is responsible for the ongoing maintenance, including upgrades (e.g. EKS clusters, and all open-source software), patching systems, incident response, triaging, SRE (e.g. adding monitors and alerts), as well as security operations (responding to incidents, staying on top of vulnerabilities/ CVEs). Cloud Posse is continuously updating its Open Source module ecosystem, but it's your responsibility to regularly update your infrastructure. Staying on top of these things is critical for a successful long-term outcome, with minimal technical debt.
For companies that want to focus more on their business and less on maintenance, we provide ongoing support engagements exclusively for customers that have completed our accelerator.
Check out our approach to learn more!
Services
If you're interested in keeping us around after you're finished with our DevOps Accelerator program, we suggest a quarterly retainer that covers 3 months (120+ hours) that will enable us to continue to consult and support you.
This would include:
- Slack support via shared channels
- Zoom pair programming sessions
- Project management with direct Jira access
- Weekly status check-ins (for 120+ hour retainers)
Typical tasks include:
- Patch and update services (e.g. kubernetes and associated services)
- Keep infrastructure code current (terraform modules, helm charts)
- Support major version upgrades of Helm and Terraform
- Implement new infrastructure components, monitors, or environments
- Assist with triaging incidents and remediations
- Optimize performance and cloud spend
Delivery
How we use the retainer is entirely up to you. We'll suggest tasks as they come up and add them to the backlog. We'll prioritize the work together with you on our check-in calls.
Tasks (and projects) are typically assessed by how much time we want to invest in them. We are happy to collaborate with you to help figure out the best use of our time, but we generally don't guarantee estimates and deadlines as part of ongoing support. This is why we recommend instead to timebox requests, that way you can stay informed if something takes longer than you had expected. It also gives the engineer(s) the ability to quickly communicate if the requested task is going to take shorter or longer than expected.
Billing
Our standard quarterly retainer size is 120 hours. You can expect to be provided with detailed billing reports and have direct communication with us every step of the way. We invoice retainers in advance of services under Net-30 terms. Additional retainers can be purchased at any time with written approval. In other words, any time you want to guarantee more bandwidth with us, all we need is an email approval.
Products & Services
Cloud Posse does offer documentation as part of the engagements but the audience is for experienced developers, so if different documentation is required, these can be created upon request.
We provide entirely optional ongoing support for customers who've gone through our DevOps Accelerator.
By in large, most of our customers take over the day to day management of their infrastructure.
We're here though to help out anywhere you need it.
We do not provide 24×7 “on-call” (aka PagerDuty) support.
We'll deliver the end-to-end solution you've seen in all of our demos. It will be preconfigured for your environments under your AWS accounts. We'll create new GitHub repos that will contain all the infrastructure code you need.
Along the way, we'll show you the ropes and how to operate it. In the long run, you'll be responsible for operating it but we'll stick around for as long as you need our help.
We offer all of our customers’ ongoing support for as long as they need it. Choose what's right for you.
- We provide free weekly support via our “Office Hours” webinars every Wednesday at 11:30 am PST. These calls last one hour and we'll answer as many of your questions as we can.
- We also provide optional support retainers which include a fixed block of hours that go towards maintenance and support. You'll have direct access to our team via a shared Slack channel in addition to the ability to schedule one-on-one calls via Zoom.
Unfortunately, we're not able to take on small engagements. You can, however, join us every single week for 100% free “Office Hours”—where we seek to answer your questions. Just register for an invitation.
We hold our “Office Hours” every Wednesday at 11:30 am PT via Zoom. We're typically 30+ people on the call and all skill levels are welcome.
Checkout our past recordings on YouTube or subscribe to our Podcast.
Services
If you're interested in keeping us around after you're finished with our DevOps Accelerator program, we suggest a quarterly retainer that covers 3 months (120+ hours) that will enable us to continue to consult and support you.
This would include:
- Slack support via shared channels
- Zoom pair programming sessions
- Project management with direct Jira access
- Weekly status check-ins (for 120+ hour retainers)
Typical tasks include:
- Patch and update services (e.g. kubernetes and associated services)
- Keep infrastructure code current (terraform modules, helm charts)
- Support major version upgrades of Helm and Terraform
- Implement new infrastructure components, monitors, or environments
- Assist with triaging incidents and remediations
- Optimize performance and cloud spend
Delivery
How we use the retainer is entirely up to you. We'll suggest tasks as they come up and add them to the backlog. We'll prioritize the work together with you on our check-in calls.
Tasks (and projects) are typically assessed by how much time we want to invest in them. We are happy to collaborate with you to help figure out the best use of our time, but we generally don't guarantee estimates and deadlines as part of ongoing support. This is why we recommend instead to timebox requests, that way you can stay informed if something takes longer than you had expected. It also gives the engineer(s) the ability to quickly communicate if the requested task is going to take shorter or longer than expected.
Billing
Our standard quarterly retainer size is 120 hours. You can expect to be provided with detailed billing reports and have direct communication with us every step of the way. We invoice retainers in advance of services under Net-30 terms. Additional retainers can be purchased at any time with written approval. In other words, any time you want to guarantee more bandwidth with us, all we need is an email approval.
Project Management
It really depends on when a contract begins and who on our team is on the bench. Generally, we like to put (2) engineers on a project so we have cross-training and continuity in the event a member needs to take time off. Our team is geographically distributed across the continental US as well as Eastern Europe. Throughout the course of a project, we may move team members between projects depending on their subject matter expertise.
- Slack. You will have direct access to the team via a shared Slack channel between our respective teams.
- Zoom. We'll have weekly scheduled cadence calls via Zoom to review the current progress, blockers and give product demos in your environment. These calls can be recorded and shared with your team.
- Google Drive. We also recommend creating a shared Team Drive folder via Google Docs for the sharing of relevant design docs, agendas or other materials.
- Trello. We manage the project via a Trello Team created specifically for each engagement. We invite your team and our team to this team and create (1) board per sprint. This allows us to standardize our process while providing transparency along the way.
- Office Hours. Most engagements include a “Documentation & Training” sprint, we arrange a weekly “Office Hours” via Zoom (recorded) to answer any questions your team may have as they begin to kick the tires.
We’ve designed our own flavor of the “agile” development process which enables us to better parallelize work to achieve aggressive timelines.
The traditional agile sprint methodology doesn't work for the velocity move. This is a situation that’s somewhat unique to a consultancy like ours, which implements repeatable solutions. Our team members are coming on and off of projects, or get blocked on tasks they are working on for other customers. When this happens, using our methodology they can easily pick up slack in other areas. In certain situations, we may tap members with specific specializations to work on a Sprint.
We define sprints in terms of effort (man-hours) of either 1 or 2 weeks (40 – 80 hours). Sprints do not necessarily correspond to a calendar week, but they do have a calendar due date. In a given week, one sprint is the priority which is the one with the nearest delivery date.
As part of our process, when we write a Statement of Work, we decide upfront the deliverables for a given sprint. When a sprint grows beyond what we think we can achieve in a fixed amount of time, we split it off into a new Sprint.
Every sprint is thematic; that is, it speaks to some greater, overarching goal or milestone with a clear set of deliverables. At any given time, we may have multiple people working on different parts of a project across multiple different sprints. However, since we sometimes have extra resources on hand, we can frontload work. This is especially valuable on sprints which may require more research or involve more interrupts.
We log detailed time entries against our time accounting software (Harvest) which has a project allocated for each Sprint that corresponds to a Kanban board. This ensures that we have proper time accounting and everything lines up business-wise.
We generally prefer to work on (2) week sprints because we acknowledge that most things take more than one week to accomplish. Since our engagements operate on a Time & Materials basis, we define a Sprint as an 80-hour retainer that does not necessarily correspond to calendar weeks.