Our Expertise – Cloud Posse

Q: Are there any long-term commitments?

Absolutely not. You can cancel anytime.

Q: Do you provide ongoing support?

[checklist_in_post][/checklist_in_post]We provide entirely optional ongoing support for customers who've gone through our DevOps Accelerator. By in large, most of our customers take over the day to day management of their infrastructure. We're here though to help out anywhere you need it. We do not provide 24x7 "on-call" (aka PagerDuty) support.

Cloud Migrations

Owning your infrastructure needs to be your competitive advantage. We'll get you there faster.

Fetch Tweets: Could not authenticate you. Code: 32

What it looks like...

FULLY AUTOMATED

Our GitOps process enables all engineers to participate without risking instability. By using 100% Infrastructure as Code, developers perform "Operations by Pull Request" so that every change goes through Code Review and CI/CD workflow.
REPEATABLE

100% Infrastructure-as-Code eliminates error-prone manual operations. Reproducible 100% of the time. Spin up as many environments as needed and treat them all the same. Lifecycle Management strategies ensure easy upgrades of all components without major disruptions.
BEST PRACTICES

We deliver a flexible solution capable of supporting any class of application that you might need to run now or in the future. We adhere to the best practices of the AWS Foundations Framework and the CIS Benchmarks for security.
HIGHLY AVAILABLE

Fault-tolerant so that services won't fall over and die if a component fails. Resilient by design, so that services self-heal without human intervention. Scalable to grow with demand both instantaneously and over time as the business grows. Capable of being both scaled-up and scaled-out.

Local Development Environments

Rapidly onboard new developers. Efficiently iterate on features.

Rapid Software Development...

REPEATABLE

Reproducible 100% of the time. Spin up as many environments as needed and treat them all the same.
SQUASH BUGS

Treat everything as "Infrastructure as Code" for more manageable environments. Reduce the human element wherever possible.
EASY TO USE

Developers should be able to get up and contribute their first "Pull Requests" day-one.
FAST ONBOARDING

Simple enough that anyone on your team should be able to use it.

Release Engineering

You need a process to reliably release software at any time and without downtime.

Confidence that it works...

INTEGRATION TESTING

Leverage CodeFresh, GitHub Actions or other systems like AWS CodeBuild, and Jenkins to build and test every commit. Know exactly which commit broke the build every time.
CONT. DELIVERY

Deploy exactly what was tested to any cluster using immutable containers. Identify problems before they get into production. Run identical environments to eliminate headaches.
PREVIEW ENVIRONMENTS

Preview Environments enable any branch or Pull Request to be deployed as a short-lived ephemeral environment. Unlimited environments ensure developers are unblocked to test their changes.
FULLY AUTOMATED

Zero downtime, rolling deployments are accomplished automatically using Kubernetes with Helm. Need a service mesh like Istio? No problem.

Automated Deployments

Continually test every change made to your infrastructure and ensure all systems go. "Operations by Pull Request" ensure anyone that can open a Pull Request is capable of contributing.

Easy deployments...

EASY ROLLOUTS

Infrastructure as code means it can be tested as code.
QUICK ROLLBACKS

Easy rollbacks when things don't work as expected. Just revert to the previous deployment without bending over backward.
ZERO-DOWNTIME

Zero downtime, rolling deployments are accomplished automatically by Kubernetes.
RELIABLE

Improve overall stability by catching problems early. Treat every problem as an opportunity to eliminate future headaches.

Site Reliability Engineering

Monitor everything that your organization depends on to meet SLAs, which means keeping an eye on both internal and external services.

System-wide overview...

DASHBOARDS

Dashboards provide an overview of everything at a glance and provide the necessary transparency across departments. Get everyone on the same page and working towards the same goals by giving them the insights they need to do it.
KEY PERFORMANCE INDICATORS

KPIs provide the benchmarks for success. They give a concrete indicator when things are working or broken. Alert based on thresholds instead of discrete events. Generate actionable notifications that escalate only when it matters to On-Call Engineers.
LOG AGGREGATION

Monitor internal services for both availability and correctness. Aggregate and report on logs collected from all services across all machines.
MONITORING AS CODE

External services are just as integral to the performance of your product as internal ones. Monitor all dependencies as if they were your own. Escalate before their problems become yours.

Scale Testing

Know your limits...

TEST PLAN

Collect and ship logs somewhere for easy reporting.
ESTABLISH BASELINE

Reporting on logs requires visualization of events because that's the only way to make sense of mounds of data.
SIMULATE TRAFFIC
OPTIMIZE & REPEAT

Integrate with monitoring and alerting so that critical events are not lost.

Training and Support

Foster an engineering culture that fuses ops and dev by cross-training engineers to achieve maximum productivity and complete business continuity.

Fetch Tweets: Could not authenticate you. Code: 32

Foster a DevOps culture...

CODE REVIEWS

DevOps involves constant cross-training of engineers to achieve business continuity at the human-level.
SCREEN SHARING

Live pairing with your team via Zoom helps them pick things up quickly. Schedule time easily with any member of our team.
SLACK CHANNEL

Best Practices exist to teach hard lessons more easily.
DOCUMENTATION

Cloud Technologies are evolving at an astonishing rate. Get help staying on top of the latest & greatest tech without getting overwhelmed.

Security & Compliance

Implement a strategy that bakes security into the DNA of the organization that addresses both technological attack vectors and social engineering.

Protect your business...

SINGLE SIGN-ON

Cloud security involves hardening all components, restricting access with SSO/MFA, and having a bird's eye view of everything going on to quickly remediate any incident.
PHYSICAL SECURITY

On-prem security is just as important as cloud security. Protect your intellectual property (IP) from being compromised. Lockdown laptops, wifi, and physical access. A company is only as secure as its weakest link.
AUDIT TRAILS

Auditing is the on-going process of surfacing anomalous events happening across all systems by combing through centrally aggregated logs like Splunk, Sumologic or Kibana/ElasticSearch.
SECRETS MANAGEMENT

Secrets management ensures there's a formal process for storing, securing, and rotating passwords and keys. Well designed solutions help ensure your company will not be tomorrow's headline news.

Gap Assessments

Our "Best Practices" exist to teach hard lessons more easily.

Gain the upper hand...

CLOUD ARCHITECTURE

We'll perform a comprehensive 12-factor assessment on your code base
GITHUB

We'll review your GitHub organization to make sure your taking maximum advantage of the platform, including a comprehensive security assessment.
DOCKER/COMPOSE

We'll review your Dockerfiles and Docker Compositions to make sure you're making best use of the tools.
KUBERNETES

We'll review your usage of Kubernetes and make recommendations on how to better leverage the platform to your advantage.

Other Resources

Hundreds of Terraform Modules
We are the largest provider of high quality, well-maintained, 100% Open Source (APACHE2) Terraform Modules. All modules are tested with terratest. Pull Requests welcome! View our Terraform Modules
Dozens of Helm Charts
What makes them special is we've developed these charts to integrate with third-party services like Github for authentication (OAuth2) and Duo for MFA. View our Helm Charts
Dozens of Helmfiles
Preconfigured release configurations for all essential services for kubernetes including Prometheus, Grafana, Nginx Ingress, Kube Dashboard, Cloudflare Argo, Fluentd, and much more. View our Helmfiles

Helpful Slack Community
Join our community, It's FREE! This is the best place to talk shop, ask questions, solicit feedback, and work together as a community to build sweet infrastructure. Join our Slack Community
Badass Documentation
Here you'll find comprehensive guides and documentation to help you start working with the Cloud Posse technology stack as quickly as possible, as well as support if you get stuck. Read our Docs
Free Weekly "Office Hours"
Every week we hold a conference call via Zoom for our community members to share what they are working on and ask questions. Join our next call

Frequently Asked Questions

Engagements

What are the next steps?

Take our quiz to find out if we are a good fit!
Book a discovery call to go over your exact challenges.
If we can help, we'll execute a Mutual NDA (ours or yours), then collaborate with you on our Engagement Workbook using Google Docs.
Once we agree on the general scope, we'll prepare a comprehensive Statement of Work (SOW) detailing the entire project.
Master Services Agreement (MSA) and SOW are executed, we'll send an invoice for the deposit and first Sprint.
Work will commence shortly thereafter.

Do you only work with US-based companies?

We work with companies anywhere in the world.

While most of our customers are based in the United States, we've worked with companies in the United Kingdom, Germany, Australia, Hong Kong, India, Argentina, etc. Our team is distributed across the US and Eastern Europe.

When can we get started?

We can start as soon as you sign our Statement of Work. Typically we see this process take 2-3 weeks from the first introductory call to the start of our engagement.

Here's our checklist we'll need to complete before we can start.

Execute Mutual NDA (ours or yours)
Collaborate on Engagement Workbook via Google Docs
Execute Statement of Work, and Master Services Agreement
Deposit Payment
Kick-off!

We can kick off the initial introductory call immediately, so please make sure that you schedule it today.

After talking with you and assessing if we're a proper fit, we'll execute a Mutual NDA and then send over an Engagement Workbook so we can gather all the requirements for your project and estimate the cost.

What if we wanted to expand the scope of work?

We can add easily additional sprints to a Scope of Work. We just need to agree on what goes into a Sprint which will determine the number of Sprints required.

How long does an engagement last?

Our typical engagement model begins with a complete platform rollout. This includes roughly 6-8 sprints, each one 1-2 weeks in duration. During this time we set up all AWS Accounts with IAM federation, Cloud Trail audit logs, a comprehensive release engineering process, total observability with our Site Reliability Engineering (SRE) sprint, Remote Access Management (Teleport and KeyCloak), GitOps Operations by Pull Request.

The first engagement takes roughly 3-4 months to complete. These engagements have extremely well-defined project plans. Ask us and we can show you what that looks like.

Customers most often decide to keep us on after the initial engagement for follow up work.

Are there any long-term commitments?

Absolutely not. You can cancel anytime.

Do you provide ongoing support?

We provide entirely optional ongoing support for customers who've gone through our DevOps Accelerator.

By in large, most of our customers take over the day to day management of their infrastructure.

We're here though to help out anywhere you need it.

We do not provide 24×7 “on-call” (aka PagerDuty) support.

What type of support do you offer?

We offer all of our customers’ ongoing support for as long as they need it. Choose what's right for you.

We provide free weekly support via our “Office Hours” webinars every Wednesday at 11:30 am PST. These calls last one hour and we'll answer as many of your questions as we can.
We also provide optional support retainers which include a fixed block of hours that go towards maintenance and support. You'll have direct access to our team via a shared Slack channel in addition to the ability to schedule one-on-one calls via Zoom.

What is our responsibility?

Can you help me understand where the boundaries of CloudPosse's responsibilities end, and where ours would start?

Cloud Posse's mission is to help companies own their infrastructure. We accelerate this journey by architecting your 4 layers with you and by taking the lead on the implementation. Since we have an opinionated framework, customers will need to learn how to leverage everything for their use cases. This will sometimes mean altering how you build and deploy your services.

Getting Started With Us

We always prefer to start with a green-field approach, where we build your infrastructure from the ground up together with your team. As part of our process, we'll walk you through all of the required design decisions, ensuring you have sufficient context to make informed decisions. This is why we expect our customers to have someone on their engineering team invested in the outcome. This part is absolutely critical, as it ensures what we deliver suits your business needs. Everything we do is delivered by pull request for your review and we will happily provide documentation on anything you want. Along the way, we'll assign homework exercises and provide ample documentation. This approach provides the best opportunity to gain a deep hands-on understanding of our solution.

We encourage you to ask as many questions as you want and challenge our assumptions. You also can volunteer for any task you want to take on as “homework” and we'll help you out as needed.

When You Own It

Once our job is done, this is where you take the driver's seat. We'll help you get everything set up for a smooth transition from your heritage environment to your shiny new infrastructure. Rest assured that we'll stick around until your team is confident and has the know-how to operate these platforms in production. We don't expect teams to pick this up overnight, that's why we'll stay engaged for as long as you need. We're happy to answer questions and jump on Zoom for pair programming sessions.

Day-2 Operations

After our engagement, you will have a solid foundation powering your apps, and all the tools you need for infrastructure operations. This means your team is responsible for the ongoing maintenance, including upgrades (e.g. EKS clusters, and all open-source software), patching systems, incident response, triaging, SRE (e.g. adding monitors and alerts), as well as security operations (responding to incidents, staying on top of vulnerabilities/ CVEs). Cloud Posse is continuously updating its Open Source module ecosystem, but it's your responsibility to regularly update your infrastructure. Staying on top of these things is critical for a successful long-term outcome, with minimal technical debt.

For companies that want to focus more on their business and less on maintenance, we provide ongoing support engagements exclusively for customers that have completed our accelerator.

Check out our approach to learn more!

What is the typical lifecycle of a small change?

Can you walk through the typical lifecycle of a small change that you might help us with, specifically with how it relates to coordinating changes between your team and ours?

Every change in your environment starts with submitting a pull request as our solution is built with a fully GitOps driven approach. Depending on the CODEOWNERS configuration for the repository, branch protections will require all pull requests to have approvals by specific stakeholders, in addition to requiring all checks to pass. We also try to automate as much of the review process as possible. For example, when the pull request is opened, it automatically kicks off a job to validate the change against your environment so you can see the impact of any change.

The coordination needed is simply about figuring out who will be responsible for each part of the release process. The tooling handles the rest and we have a policy-driven approach (Open Policy Agent) to enforce it.

This includes:

Who will submit the pull request, which is entirely dependent on your comfort level with the change, or if you prefer us to take the lead.
Reviewing the pull request and applying changes to it as needed.
Approving and merging the pull request.
Validating and confirming the changes.

The toolchain in your CI/CD process provides Slack notifications and full audit history of everything that happens to give you optimal visibility and traceability.

Lastly, where applicable we implement blue/green rollout strategies for releases, but there are edge cases where a change could be disruptive to active development or live services. In such cases, these would be carefully coordinated to be released at an approved time.

Who will be the Tech Lead/Architect for our project and assurance that the lead will be fully allocated throughout the project (for continuity)?

We'll embed 1-2 engineers to work with your team through your project. Our preferred approach is to have multiple leads working in parallel so that we can ensure continuity throughout the engagement. Working with Cloud Posse, it is our responsibility to ensure continuity throughout this engagement. We have various subject matter experts that we'll swap in and out of the project and I'll be directly involved through the entire process. The way we achieve greater continuity is by ensuring everything is well documented as we go, opening pull requests for all work, synchronizing branches regularly, and the tasks are all well-defined in Jira. Every single call is recorded and shared with our team (via Gong), in addition to this, all design decisions are recorded in Jira issues and referenced throughout the project for context. Typically we have one engineer allocated to each Sprint and parallelize work by commencing multiple concurrent sprints. You can expect 4-6 Cloud Posse engineers to be involved and contributing.

What if we don't use Slack?

From time to time we get asked if it's possible to use something other than slack (e.g. MS Teams, discord, etc) to collaborate between our teams.

TL;DR: Unfortunately, we're not able to join other teams. Here's why…

On the backend, we have dozens of engineers (and growing) who we can pull into any channel at a moment's notice. While there is always a lead assigned to your project performing most of the work, at any given time, we'll pull in different people to help with different parts. Think of them as specialists. Also, because we're engaged with a dozen or more customers at any given moment, the logistics of that means we cannot manage conversations across multiple chat platforms.

When companies work with Cloud Posse, you're not getting “a DevOps engineer” (e.g. that's Staff Augmentation), you're getting access to all of our pool of resources which include expertise in DevOps, Release Management/Engineering, SRE, Security & Compliance, etc. One or two engineers cannot possess all that expertise and be specialists (e.g. “Jack of all trades, master of none”). This is also why it's so much more valuable hiring Cloud Posse because for barely the cost of a fully-loaded DevOps engineer, you're buying access to a full team and all the pre-existing materials that have cost millions of dollars to produce.

The great thing about our Engagement Workbook process is it helps us identify upfront customers who are a good fit for how we operate. It's why we're so successful because we have standardized our engagement model. The more variables we introduce, the greater the risks and the more we diverge from what we know works.

Will we be able to take over when it's done?

You might be wondering if you can expect to come out the other end of our accelerator with a team ready and able to take over day-to-day operations and migrate additional products into this stack using Cloud Pose's modules.

TL;DR: Yes! But there's homework involved.

When you work with Cloud Posse, it's more of a “delivery” model of engagement in the sense we're doing 95% of the work, in your repo, from day one – one pull request at a time. Our strategy of handoff is helping your team pick up the ropes by assisting them with self-prescribed homework assignments. We do not at this time have any formal curriculum for training, since every team has different needs. What we provide is a standard set of documentation, architectural diagrams, and office hours. We will also document any requested processes or systems as general support. Cloud Posse does not provide Staff Augmentation or Training arrangements.

Think of it more like this… while we're engaged and building out your platform, your team has full access to ask us anything. They can follow along in GitHub, review pulls requests, ask for demos, etc. We'll jump on the phone anytime to help triage, pair program, research, or prototype anything else they want. The most successful teams take advantage of this opportunity early on in the engagement. Those are the teams that are ready to migrate additional products.

Case in point: we have a customer that after 3 weeks of working with us took the initiative and used our Datadog component and migrated all their existing legacy Datadog monitors into terraform. The way we found out was they tagged us on the pull request. That's rad. After multiple reviews and comments, the PR got merged and they're well on their way.

When we're done building everything out, we'll stick around for as long as you need our help – but that's optional. Most customers keep us around for some time afterward until their team feels fully confident operating everything. Also, what we frequently see happen is that teams decide to expand the scope and tack on additional services in their catalog (E.g. EMR, RedShift, StrongDM, etc are examples of this)

View all >>

Cloud Migrations

What it looks like...

FULLY AUTOMATED

REPEATABLE

BEST PRACTICES

HIGHLY AVAILABLE

Local Development Environments

Rapid Software Development...

REPEATABLE

SQUASH BUGS

EASY TO USE

FAST ONBOARDING

Release Engineering

Confidence that it works...

INTEGRATION TESTING

CONT. DELIVERY

PREVIEW ENVIRONMENTS

FULLY AUTOMATED

Automated Deployments

Easy deployments...

EASY ROLLOUTS

QUICK ROLLBACKS

ZERO-DOWNTIME

RELIABLE

Site Reliability Engineering

System-wide overview...

DASHBOARDS

KEY PERFORMANCE INDICATORS

LOG AGGREGATION

MONITORING AS CODE

Scale Testing

Know your limits...

TEST PLAN

ESTABLISH BASELINE

SIMULATE TRAFFIC

OPTIMIZE & REPEAT

Training and Support

Foster a DevOps culture...

CODE REVIEWS

SCREEN SHARING

SLACK CHANNEL

DOCUMENTATION

Security & Compliance

Protect your business...

SINGLE SIGN-ON

PHYSICAL SECURITY

AUDIT TRAILS

SECRETS MANAGEMENT

Gap Assessments

Gain the upper hand...

CLOUD ARCHITECTURE

GITHUB

DOCKER/COMPOSE

KUBERNETES