Local Development Environments
Rapidly onboard new developers. Efficiently iterate on features.
Rapid Software Development...
Reproducible 100% of the time. Spin up as many environments as needed and treat them all the same.
Treat everything as "Infrastructure as Code" for more manageable environments. Reduce the human element wherever possible.
EASY TO USE
Developers should be able to get up and contribute their first "Pull Requests" day-one.
Simple enough that anyone on your team should be able to use it.
You need a process to reliably release software at any time and without downtime.
Confidence that it works...
Leverage CodeFresh, GitHub Actions or other systems like AWS CodeBuild, and Jenkins to build and test every commit. Know exactly which commit broke the build every time.
Deploy exactly what was tested to any cluster using immutable containers. Identify problems before they get into production. Run identical environments to eliminate headaches.
Preview Environments enable any branch or Pull Request to be deployed as a short-lived ephemeral environment. Unlimited environments ensure developers are unblocked to test their changes.
Zero downtime, rolling deployments are accomplished automatically using Kubernetes with Helm. Need a service mesh like Istio? No problem.
Continually test every change made to your infrastructure and ensure all systems go. "Operations by Pull Request" ensure anyone that can open a Pull Request is capable of contributing.
Infrastructure as code means it can be tested as code.
Easy rollbacks when things don't work as expected. Just revert to the previous deployment without bending over backward.
Zero downtime, rolling deployments are accomplished automatically by Kubernetes.
Improve overall stability by catching problems early. Treat every problem as an opportunity to eliminate future headaches.
Site Reliability Engineering
Monitor everything that your organization depends on to meet SLAs, which means keeping an eye on both internal and external services.
Dashboards provide an overview of everything at a glance and provide the necessary transparency across departments. Get everyone on the same page and working towards the same goals by giving them the insights they need to do it.
KEY PERFORMANCE INDICATORS
KPIs provide the benchmarks for success. They give a concrete indicator when things are working or broken. Alert based on thresholds instead of discrete events. Generate actionable notifications that escalate only when it matters to On-Call Engineers.
Monitor internal services for both availability and correctness. Aggregate and report on logs collected from all services across all machines.
MONITORING AS CODE
External services are just as integral to the performance of your product as internal ones. Monitor all dependencies as if they were your own. Escalate before their problems become yours.
Know your limits...
Collect and ship logs somewhere for easy reporting.
Reporting on logs requires visualization of events because that's the only way to make sense of mounds of data.
OPTIMIZE & REPEAT
Integrate with monitoring and alerting so that critical events are not lost.
Training and Support
Foster an engineering culture that fuses ops and dev by cross-training engineers to achieve maximum productivity and complete business continuity.
Foster a DevOps culture...
DevOps involves constant cross-training of engineers to achieve business continuity at the human-level.
Live pairing with your team via Zoom helps them pick things up quickly. Schedule time easily with any member of our team.
Best Practices exist to teach hard lessons more easily.
Cloud Technologies are evolving at an astonishing rate. Get help staying on top of the latest & greatest tech without getting overwhelmed.
Security & Compliance
Implement a strategy that bakes security into the DNA of the organization that addresses both technological attack vectors and social engineering.
Protect your business...
Cloud security involves hardening all components, restricting access with SSO/MFA, and having a bird's eye view of everything going on to quickly remediate any incident.
On-prem security is just as important as cloud security. Protect your intellectual property (IP) from being compromised. Lockdown laptops, wifi, and physical access. A company is only as secure as its weakest link.
Auditing is the on-going process of surfacing anomalous events happening across all systems by combing through centrally aggregated logs like Splunk, Sumologic or Kibana/ElasticSearch.
Secrets management ensures there's a formal process for storing, securing, and rotating passwords and keys. Well designed solutions help ensure your company will not be tomorrow's headline news.
Our "Best Practices" exist to teach hard lessons more easily.
Gain the upper hand...
We'll perform a comprehensive 12-factor assessment on your code base
We'll review your GitHub organization to make sure your taking maximum advantage of the platform, including a comprehensive security assessment.
We'll review your Dockerfiles and Docker Compositions to make sure you're making best use of the tools.
We'll review your usage of Kubernetes and make recommendations on how to better leverage the platform to your advantage.
Hundreds of Terraform Modules
We are the largest provider of high quality, well-maintained, 100% Open Source (APACHE2) Terraform Modules. All modules are tested with terratest. Pull Requests welcome! View our Terraform Modules
Dozens of Helm Charts
What makes them special is we've developed these charts to integrate with third-party services like Github for authentication (OAuth2) and Duo for MFA. View our Helm Charts
Helpful Slack Community
Join our community, It's FREE! This is the best place to talk shop, ask questions, solicit feedback, and work together as a community to build sweet infrastructure. Join our Slack Community
Here you'll find comprehensive guides and documentation to help you start working with the Cloud Posse technology stack as quickly as possible, as well as support if you get stuck. Read our Docs
Free Weekly "Office Hours"
Every week we hold a conference call via Zoom for our community members to share what they are working on and ask questions. Join our next call
Frequently Asked Questions
- Take our quiz to find out if we are a good fit!
- Book a discovery call to go over your exact challenges.
- If we can help, we'll execute a Mutual NDA (ours or yours), then collaborate with you on our Engagement Workbook using Google Docs.
- Once we agree on the general scope, we'll prepare a comprehensive Statement of Work (SOW) detailing the entire project.
- Master Services Agreement (MSA) and SOW are executed, we'll send an invoice for the deposit and first Sprint.
- Work will commence shortly thereafter.
We work with companies anywhere in the world.
While most of our customers are based in the United States, we've worked with companies in the United Kingdom, Germany, Australia, Hong Kong, India, Argentina, etc. Our team is distributed across the US and Eastern Europe.
We can start as soon as you sign our Statement of Work. Typically we see this process take 2-3 weeks from the first introductory call to the start of our engagement.
Here's our checklist we'll need to complete before we can start.
- Execute Mutual NDA (ours or yours)
- Collaborate on Engagement Workbook via Google Docs
- Execute Statement of Work, and Master Services Agreement
- Deposit Payment
We can kick off the initial introductory call immediately, so please make sure that you schedule it today.
After talking with you and assessing if we're a proper fit, we'll execute a Mutual NDA and then send over an Engagement Workbook so we can gather all the requirements for your project and estimate the cost.
We can add easily additional sprints to a Scope of Work. We just need to agree on what goes into a Sprint which will determine the number of Sprints required.
Our typical engagement model begins with a complete platform rollout. This includes roughly 6-8 sprints, each one 1-2 weeks in duration. During this time we set up all AWS Accounts with IAM federation, Cloud Trail audit logs, a comprehensive release engineering process, total observability with our Site Reliability Engineering (SRE) sprint, Remote Access Management (Teleport and KeyCloak), GitOps Operations by Pull Request.
The first engagement takes roughly 3-4 months to complete. These engagements have extremely well-defined project plans. Ask us and we can show you what that looks like.
Customers most often decide to keep us on after the initial engagement for follow up work.
We provide entirely optional ongoing support for customers who've gone through our DevOps Accelerator.
By in large, most of our customers take over the day to day management of their infrastructure.
We're here though to help out anywhere you need it.
We do not provide 24×7 “on-call” (aka PagerDuty) support.
We offer all of our customers’ ongoing support for as long as they need it. Choose what's right for you.
- We provide free weekly support via our “Office Hours” webinars every Wednesday at 11:30 am PST. These calls last one hour and we'll answer as many of your questions as we can.
- We also provide optional support retainers which include a fixed block of hours that go towards maintenance and support. You'll have direct access to our team via a shared Slack channel in addition to the ability to schedule one-on-one calls via Zoom.
Can you help me understand where the boundaries of CloudPosse's responsibilities end, and where ours would start?
Cloud Posse's mission is to help companies own their infrastructure. We accelerate this journey by architecting your 4 layers with you and by taking the lead on the implementation.
Getting Started With Us
We always start with a green-field approach, where we build your infrastructure from the ground up together with your team. As part of our process, we'll walk you through all of the required design decisions, ensuring you have sufficient context to make informed decisions. This is why we expect our customers to have someone on their engineering team invested in the outcome. This part is absolutely critical, as it ensures what we deliver suits your business needs. Everything we do is delivered by pull request for your review and we can augment the documentation on anything you want. Along the way, we'll assign homework exercises and provide ample documentation. This approach provides the best opportunity to gain a deep hands-on understanding of our solution.
We encourage you to ask as many questions as you want and challenge our assumptions. You also can volunteer for any task you want to take on and we'll help you out as needed.
When You Own It
Once our job is done, this is where you take the driver's seat. We'll help you get everything set up for a smooth transition from your heritage environment to your shiny new infrastructure. Rest assured that we'll stick around until your team is confident and has the know-how to operate these platforms in production. We don't expect teams to pick this up overnight, that's why we'll stay engaged for as long as you need. We're happy to answer questions and jump on Zoom for pair programming sessions.
After our engagement, you will have a solid foundation powering your apps, and all the tools you need for infrastructure operations. This means your team is responsible for the ongoing maintenance, including upgrades (e.g. EKS clusters, and all open-source software), patching systems, incident response, triaging, SRE (e.g. adding monitors and alerts), as well as security operations (responding to incidents, staying on top of vulnerabilities/ CVEs). Cloud Posse is continuously updating its Open Source module ecosystem, but it's your responsibility to regularly update your infrastructure. Staying on top of these things is critical for a successful long-term outcome, with minimal technical debt.
For companies that want to focus more on their business and less on maintenance, we provide ongoing support engagements exclusively for customers that have completed our accelerator.
Check out our approach to learn more!
Can you walk through the typical lifecycle of a small change that you might help us with, specifically with how it relates to coordinating changes between your team and ours?
Every change in your environment starts with submitting a pull request as our solution is built with a fully GitOps driven approach. Depending on the
CODEOWNERS configuration for the repository, branch protections will require all pull requests to have approvals by specific stakeholders, in addition to requiring all checks to pass. We also try to automate as much of the review process as possible. For example, when the pull request is opened, it automatically kicks off a job to validate the change against your environment so you can see the impact of any change.
The coordination needed is simply about figuring out who will be responsible for each part of the release process. The tooling handles the rest and we have a policy-driven approach (Open Policy Agent) to enforce it.
- Who will submit the pull request, which is entirely dependent on your comfort level with the change, or if you prefer us to take the lead.
- Reviewing the pull request and applying changes to it as needed.
- Approving and merging the pull request.
- Validating and confirming the changes.
Lastly, where applicable we implement blue/green rollout strategies for releases, but there are edge cases where a change could be disruptive to active development or live services. In such cases, these would be carefully coordinated to be released at an approved time.
From time to time we get asked if it's possible to use something other than slack (e.g. MS Teams, discord, etc) to collaborate between our teams.
TL;DR: Unfortunately, we're not able to join other teams. Here's why…
On the backend, we have dozens of engineers (and growing) who we can pull into any channel at a moment's notice. While there is always a lead assigned to your project performing most of the work, at any given time, we'll pull in different people to help with different parts. Think of them as specialists. Also, because we're engaged with a dozen or more customers at any given moment, the logistics of that means we cannot manage conversations across multiple chat platforms.
When companies work with Cloud Posse, you're not getting “a DevOps engineer” (e.g. that's Staff Augmentation), you're getting access to all of our pool of resources which include expertise in DevOps, Release Management/Engineering, SRE, Security & Compliance, etc. One or two engineers cannot possess all that expertise and be specialists (e.g. “Jack of all trades, master of none”). This is also why it's so much more valuable hiring Cloud Posse because for barely the cost of a fully-loaded DevOps engineer, you're buying access to a full team and all the pre-existing materials that have cost millions of dollars to produce.
The great thing about our Engagement Workbook process is it helps us identify upfront customers who are a good fit for how we operate. It's why we're so successful because we have standardized our engagement model. The more variables we introduce, the greater the risks and the more we diverge from what we know works.
You might be wondering if you can expect to come out the other end of our accelerator with a team ready and able to take over day-to-day operations and migrate additional products into this stack using Cloud Pose's modules.
TL;DR: Yes! But there's homework involved.
When you work with Cloud Posse, it's more of a “delivery” model of engagement in the sense we're doing 95% of the work, in your repo, from day one – one pull request at a time. Our strategy of handoff is helping your team pick up the ropes by assisting them with self-prescribed homework assignments. We do not at this time have any formal curriculum for training, since every team has different needs. What we provide is a standard set of documentation, architectural diagrams, and office hours. We will also document any requested processes or systems as general support. Cloud Posse does not provide Staff Augmentation or Training arrangements.
Think of it more like this… while we're engaged and building out your platform, your team has full access to ask us anything. They can follow along in GitHub, review pulls requests, ask for demos, etc. We'll jump on the phone anytime to help triage, pair program, research, or prototype anything else they want. The most successful teams take advantage of this opportunity early on in the engagement. Those are the teams that are ready to migrate additional products.
Case in point: we have a customer that after 3 weeks of working with us took the initiative and used our Datadog component and migrated all their existing legacy Datadog monitors into terraform. The way we found out was they tagged us on the pull request. That's rad. After multiple reviews and comments, the PR got merged and they're well on their way.
When we're done building everything out, we'll stick around for as long as you need our help – but that's optional. Most customers keep us around for some time afterward until their team feels fully confident operating everything. Also, what we frequently see happen is that teams decide to expand the scope and tack on additional services in their catalog (E.g. EMR, RedShift, StrongDM, etc are examples of this)