The foundation of Site Reliability Engineering (SRE) is implementing an architecture that supports observability, alerting, and incident management.
It doesn't matter how much IaC, automation, resiliency or fault tolerance is in place; infrastructure is similar to a living and evolving organism. It's a complex structure with various abstractions masking complexity under the hood. Without observability, there's no way to operate, optimize the platform or safely onboard new engineers; it would be like racing a Ferrari on a track with no tachometer, speedometer, or temperature/gas gauge.
That's why we help lay a solid foundation to build upon. We support the most popular tools in the industry, including Datadog and OpsGenie, which we manage entirely with Terraform.
Most companies we see fail to implement Monitoring as Code. When that happens, it's difficult and error-prone to manage any sort of complex monitoring infrastructure.
There's an overwhelming amount of theory on what needs to happen, but almost no Open Source implementations of how to actually implement it. We provide the concrete implementation, entirely provisioned with Terraform. We support all the core capabilities of Datadog, including Logs, Synthetics, Private Locations, Dashboards, Monitors, Roles, etc. We manage a large catalog of pre-baked monitors, defined in YAML, that are fully parameterized.
Our reference architecture includes native terraform support for the following related services.
Frequently Asked Questions
- Based Open Source. Everything we do is available for free today on our GitHub. This is our proof we know what we're talking about. “What You See is What You Get” – no other company can provide such a comprehensive solution based on Open Source.
- Free Weekly Office Hours Our commitment to helping others is in our DNA. We want to make sure you get the maximum value out of your investment.
- Massive Community Adoption ensures our projects get regular updates and bug fixes.
If everything is open sourced, why don't teams just do it themselves instead of work with Cloud Posse?
Anyone is free to fork our repositories and try themselves, but our support eliminates the guesswork and shortens the time it takes to implement correctly.
Think of it like this: anyone can walk into a hardware store and pick up the materials to build a house. Very few people can build a house that won't fall down if they don't have the experience of using all the tools and hardware correctly. We fill the gap by providing the knowledge and experience to get you where you want to be faster than doing it yourself.
Cloud Posse does offer documentation as part of the engagements but the audience is for experienced developers, so if different documentation is required, these can be created upon request.
It really depends on when a contract begins and who on our team is on the bench. Generally, we like to put (2) engineers on a project so we have cross-training and continuity in the event a member needs to take time off. Our team is geographically distributed across the continental US as well as Eastern Europe. Throughout the course of a project, we may move team members between projects depending on their subject matter expertise.
We provide entirely optional ongoing support for customers who've gone through our DevOps Accelerator.
By in large, most of our customers take over the day to day management of their infrastructure.
We're here though to help out anywhere you need it.
We do not provide 24×7 “on-call” (aka PagerDuty) support.
We'll deliver the end-to-end solution you've seen in all of our demos. It will be preconfigured for your environments under your AWS accounts. We'll create new GitHub repos that will contain all the infrastructure code you need.
Along the way, we'll show you the ropes and how to operate it. In the long run, you'll be responsible for operating it but we'll stick around for as long as you need our help.
It depends. Your best bet is to schedule a discovery call with us so we can go over your specific concerns. Assuming your software runs on Linux and that you're able to make any necessary code changes to ensure your applications are “12 Factor App” compliant, there's a very high likelihood we'll be able to help you out.
What’s it costing your business if you wait?
The longer you wait, the more time & effort you'll waste on maintenance rather than innovation. The more tech debt you'll amass. The more opportunities you'll miss.
Your developers will be less productive, which means you'll be paying more while getting less done in return.
The sooner you streamline your operations the faster you will move:
- Reduce your opportunity cost and capitalize on the investment sooner
- Release more features to customers faster
- Control operating costs to do more for less
Not to mention, your developers will love you for making their lives easier. The last thing developers want is to do things by hand.
- Gruntwork doesn't provide open access to all their modules, they are a subscription service. Cloud Posse open sources everything.
- All of our code is in GitHub and can be forked and used with no concerns about licensing issues (APACHE2).
- Gruntwork's Reference Architecture requires Terragrunt
- Gruntwork is not a consulting company. They do not help with hands-on implementation. That's left up to you.
- We provide a comprehensive project plan consisting of hundreds of implementation tasks and design decisions that we execute together with your team.
- Our Slack community is free for anyone to join, not just paying customers.
- Because our work is Open Source, there's a lower barrier to getting started. That's why it's in use by thousands and thousands of companies. We receive dozens of Pull Requests every week enhancing our modules and fixing bugs.
We work with companies who need to own their infrastructure as their competitive advantage.
Our customers are typically post-Series A technology startups who are seeing success in the market and need to accelerate their DevOps adoption in order to take their company to the next level.
They are backed by some of the biggest names in the industry and are solving really difficult problems with technology.
We help companies own their infrastructure in record time by building it together with your team and then showing them the ropes. We stick around for as long as it takes for you to become successful.
Our SweetOps™ process eliminates the guesswork so you get everything you need for a successful cloud migration from the bottom up.
- We Build Your Infrastructure. We implement everything you need from your cloud platform using Infrastructure as Code.
- You Own It. You achieve it in just a few months. We show you how to ride it along the way.
- You Drive It. Customize everything or anything you want. It's your infrastructure.
You get a predictable outcome that is delivered on time and within budget.
There are no long term commitments. No license fees. No strings attached.
Plus, we stick around for as long as you need our help.
Sounds like pretty good deal, right?
No, it's absolutely FREE for anyone to attend.
Can you help me understand where the boundaries of CloudPosse's responsibilities end, and where ours would start?
Cloud Posse's mission is to help companies own their infrastructure. We accelerate this journey by architecting your 4 layers with you and by taking the lead on the implementation. Since we have an opinionated framework, customers will need to learn how to leverage everything for their use cases. This will sometimes mean altering how you build and deploy your services.
Getting Started With Us
We always prefer to start with a green-field approach, where we build your infrastructure from the ground up together with your team. As part of our process, we'll walk you through all of the required design decisions, ensuring you have sufficient context to make informed decisions. This is why we expect our customers to have someone on their engineering team invested in the outcome. This part is absolutely critical, as it ensures what we deliver suits your business needs. Everything we do is delivered by pull request for your review and we will happily provide documentation on anything you want. Along the way, we'll assign homework exercises and provide ample documentation. This approach provides the best opportunity to gain a deep hands-on understanding of our solution.
We encourage you to ask as many questions as you want and challenge our assumptions. You also can volunteer for any task you want to take on as “homework” and we'll help you out as needed.
When You Own It
Once our job is done, this is where you take the driver's seat. We'll help you get everything set up for a smooth transition from your heritage environment to your shiny new infrastructure. Rest assured that we'll stick around until your team is confident and has the know-how to operate these platforms in production. We don't expect teams to pick this up overnight, that's why we'll stay engaged for as long as you need. We're happy to answer questions and jump on Zoom for pair programming sessions.
After our engagement, you will have a solid foundation powering your apps, and all the tools you need for infrastructure operations. This means your team is responsible for the ongoing maintenance, including upgrades (e.g. EKS clusters, and all open-source software), patching systems, incident response, triaging, SRE (e.g. adding monitors and alerts), as well as security operations (responding to incidents, staying on top of vulnerabilities/ CVEs). Cloud Posse is continuously updating its Open Source module ecosystem, but it's your responsibility to regularly update your infrastructure. Staying on top of these things is critical for a successful long-term outcome, with minimal technical debt.
For companies that want to focus more on their business and less on maintenance, we provide ongoing support engagements exclusively for customers that have completed our accelerator.
Check out our approach to learn more!
Who will be the Tech Lead/Architect for our project and assurance that the lead will be fully allocated throughout the project (for continuity)?
We'll embed 1-2 engineers to work with your team through your project. Our preferred approach is to have multiple leads working in parallel so that we can ensure continuity throughout the engagement. Working with Cloud Posse, it is our responsibility to ensure continuity throughout this engagement. We have various subject matter experts that we'll swap in and out of the project and I'll be directly involved through the entire process. The way we achieve greater continuity is by ensuring everything is well documented as we go, opening pull requests for all work, synchronizing branches regularly, and the tasks are all well-defined in Jira. Every single call is recorded and shared with our team (via Gong), in addition to this, all design decisions are recorded in Jira issues and referenced throughout the project for context. Typically we have one engineer allocated to each Sprint and parallelize work by commencing multiple concurrent sprints. You can expect 4-6 Cloud Posse engineers to be involved and contributing.