In this presentation, I'm going to cover how startups like gladly dot.com a billion dollar customer service, platform as well as a few other fintech startups like EBITDA com and pier street went from having near zero DevOps competency to owning their infrastructure with full blown automation using our DevOps accelerator program.
Not only that they did this while being capital efficient.
So they didn't break the bank.
Just so you know I'm not making this stuff up.
I'll show you a few case studies.
For example, we're going to cover how we helped gladly dot.com build out their internal DevOps and SRT teams despite having no previous experience.
And now runs everything on their own.
They've gone on to raise a cool 113 million from top tier beasties.
Then there's peer street.
We're going to show you how they managed to successfully ditch Roku for AWS using communities.
We help them rapidly build out their eight US environments with Kubernetes clusters and a release engineering process that is similar to Roker review apps.
It was really sweet stuff.
They've gone on to raise $50 million and we're super excited about their potential.
There's also even NBA.com.
We'll give you a sneak peek at how they accelerated their journey into AWS using our vast open source library of infrastructure as code for Terraform.
They picked up from where we left off to build out their own internal DevOps team that continues to Aerie on the platform.
We built. So we're going to circle back to each of these case studies and show you what we learn from the experiences.
But first, I want to talk about who this is for.
I want to take a moment to show who can benefit from this because we've been around the block a few times and seen what works and what doesn't.
I just want to make sure we can help you.
If I were to guess you're probably a startup who's raised a lot of capital.
You've had your series and then some.
You probably started with Roku or click ops and that was smart is what got you to where you are today.
It was the best decision and probably your only option at the time because investing in infrastructure when you had started would have been the wrong move.
You didn't have the time and you wouldn't have come as far as you have.
Instead you made wise decision to hold off and maximize your current situation.
So now you've started to find product market fit and everyone sees the billion dollar potential your company now has.
That's why you need to double down and go faster because the number of things you want to achieve in a short amount of time is not even funny.
That's why you planned out your roadmap for at least the next year.
And it's freaking awesome.
The problem is you want to enable your software engineers and cube teams to pump out features, but they can't ship fast enough.
Your sales team has been pressing you for more demo environments.
So they can close more deals and the deals you are getting are hung up on security and compliance requirements.
You can't meet you can't fix any of this because of your infrastructure.
You are now the blocker and it's the last thing you ever wanted.
So you're getting pressed and delivering a solution.
You know you need to act, but it was fast thinking and acting that got you here.
People within your company keep telling you it's easy, but your experience tells you otherwise.
As someone with an engineering background, you have a hard time believing you can have your infrastructure delivered in about a day.
And that's right.
You would love to get a handle on your op situation and come up with a game plan.
But you can't spend forever researching it.
Look you've got a lot on your plate already.
So your team has been floating ideas around about moving to Kubernetes, and using CI/CD but the amount of information and conflicting opinions is frankly overwhelming.
On top of that maybe you already tried and this has also gone nowhere because your team is slammed and you don't even have the right skill sets in-house.
So that's why now you're trying to hire the team to do it.
But discover it's so freaking competitive out there.
If you can't even make an offer.
You know you need to take the next step.
That's why you're here, if any of this sounds like you the first thing you should know is that you are not alone.
We've helped companies just like yours get to where you want to be with your infrastructure in record time because that's what we do.
So just keep listening to find out how by partnering with cloud posse you will achieve what we like to call suite ops.
But before we get deep into how we're going to do this.
Our goal is to make sure we're still a good fit as the DevOps accelerator that partners with you to build awesome infrastructure that you own.
We think it's important to be upfront about what that entails and what cloud policy is not.
We don't do one off bespoke solutions.
We implement our solution every time.
We don't do staff augmentation.
That's totally different kind of company.
We're not your typical DevOps as a service company and we certainly won't build your mobile site or do your SVD.
We have a very specific mission.
So we fully recognize that what we do is not for every company.
That's why we have such a high success rate.
So here are some of the things you want to consider before making a decision to own your infrastructure first.
Could your apps easily run on Roku or surplus instead because the kinds of apps that we deploy for our customers typically Keely can't run on platforms like that because they are too complex or have security requirements that prevent it.
Next are you looking for primarily someone to help run things for you.
One of our goals is into Empower you, our customers to take over everything once we're done.
If you lack the in-house resources to do that.
This could be a problem.
You own it after all.
That's why we want to teach you how to fish.
So can the number of developers in your company be counted on one hand, because the number of because the reason we ask, is our release engineering process has been optimized for companies that can benefit from being able to deploy as many short lived environments as they want and have a runway that allows multiple teams to collaborate and deploy on software at the same time without stepping all over each other.
If these don't sound like problems that you have right now, then maybe our process might actually slow you down and you want to wait until you grow a little bit more.
So is using us a deal breaker for your company as an 8w US partner.
We specialize on the services they offer and the benefits that we can provide as a result.
There are a lot of awesome competing cloud offerings out there.
We just wouldn't be the best to support it.
And finally, are you against using tools like Docker Terraform or Kubernetes solutions are the cornerstones of our solution because they allow for maximum automation and repeatability across organizations.
If any of the answers to these questions are yes, then maybe we are not a good fit for you at this time.
And we might just be overkill.
Our goal is not to sell you on a solution that you don't need.
We work best with companies that have some experience running their apps in containers using us a little bit.
And companies that are flexible to adopting new and open source tools that we deliver as part of our solution.
We have a proven and repeatable strategy for companies that fit our profile.
That's why we kick ass.
What we do.
So let's get right into it.
Here is the core concept.
If you can understand this everything else we're talk about today is going to make sense in order for you to build a successful startup or Sas business and keep up with how fast you're moving.
You need to own your infrastructure and build it to serve you.
Most teams don't have sufficient in-house experience to build it and need outside help.
No one is helping companies implement this stuff.
From beginning to end.
So that you can own it.
Except us our mission is to help the best and brightest startups and Sas businesses build and operate their own secure cloud platforms from the ground up to serve their business.
So that they never are blocked by infrastructure again.
This is something that all successful startups struggle with once they reach a certain size.
That's why we're here to eliminate the mystery and the wheels spinning too.
And to cut out the repetitiveness of doing it.
We do what no one else out there is doing.
But desperately needed to be done.
That's why we've now helped dozens of startups from around the world who've been backed by some of the biggest names in venture capital achieve greater success by owning their platform.
So that's all great for us.
But what does this mean for you well over the years, our process has become so well refined that we've been able to codify that and make it extremely clear.
That's what we're going to show you today.
But before we get into what we can do for you.
I want to share my story so you can understand where I'm coming from My name is Eric Osterman and I'm the CEO and founder of cloud posse.
When I was in my early 20s, I started my career as a back end software developer and cut my teeth working at countless startups.
I've been practicing DevOps since before there was a term for it and built my first startup back in 2006 when AWS was just in a private beta.
So if you're technical founder I speak your language.
But before starting cloud posse I had a few my own companies there.
What taught me what lean startups really need to succeed.
That's why when I joined clicker a startup backed by red point ventures.
I was put in charge of their cloud operations after only two years, we were acquired by CBS Interactive.
The broadcast company.
Let's be honest here.
It was an aqua hire our CEO became the CEO of CBS Interactive our CTO.
The same and well I became the first director of cloud architecture.
It was a pretty sweet opportunity.
But what really stood out from the experience was how teams could accomplish amazing things when they weren't blocked by old school centralized operations.
Basically when business units can act like startups within the organization, but have at their disposal all of the resources they need to get the job done.
Beautiful things happen very fast.
That's why we are here today to give startups all the resources they need to succeed.
But first, we need to pause here and define what we're talking about.
What the heck is DevOps develops as we all know, is a trendy buzzword that has been overloaded and misused.
It can mean, almost anything these days along with words like transformative cloud and email we get it.
So what does it actually mean.
Here's what it actually means to us.
In the preceding decades infrastructure was fit strictly physical hardware.
It required humans to rack and stack machines in data centers and configure network switches these days.
On the other hand.
All that has been virtualize and what I mean by that.
It's software defined under the hood.
It's all the same stuff.
It's just that the common element, the human operator.
Your team now deals with software rather than hardware for a configuration since it's all software it can be automated and that which can be automated can be continuously improved and iterated on.
For example, we can do automated testing continuous delivery and code reviews just like your applications.
So your DevOps is really the fusion of these disciplines.
It's taking the best of software development and applying it to operations and taking the battle tested experience of operations and incorporating it into the software automation process because what we need to do is more collaboration.
This movement is what makes DevOps possible today.
The walls have been broken down.
So that developers can now write infrastructure as code and work with operations teams to build more reliable software.
Plus this frees up your operations teams to provide even more stable platforms and tools to unblocked your developers.
So that they are self-sufficient and write better software.
But you may be surprised to find out just how much the benefits of DevOps extend into your business.
So let's start with a quick introduction just so we're all on the same page.
I want to give you an introduction into our world of DevOps.
So you'll be better equipped to understand what it will take to run a successful DevOps organization.
This is a hot topic and everyone has an opinion on it.
And for some strange reason they're all blogging about it on medium.
So I'll spare you from that.
Instead I'm going to talk about how to get stuff done because that's what we're all about at cloud posse.
We live and breathe this stuff.
It's easy to forget or take for granted that not everyone comes at this with the same level of experience or background.
I'm going to cover why DevOps is a business enabler why you need to own your infrastructure.
And what it's going to take to be successful when you practice DevOps in your organization.
So rhetorical question how is software impacting your business.
It's likely the number one asset in differentiator.
If it's not just top the video right now, because everything else I'm going to talk about is probably irrelevant to you.
So then it goes without saying that everything that makes your software awesome is ultimately at the mercy of your infrastructure, your infrastructure and its capabilities are then what limit your ability to grow your company based on what you can or cannot do.
How fast or how long it will take.
Automation is what allows you to create and replicate your systems at a low cost and avoid the expense of manual effort and human error.
Automation is therefore a force multiplier.
It's how you continue to scale your business as you succeed.
The faster your developers can move with the least impedance the more agile your business and the faster you can run circles around the competition.
They'll literally be scratching their heads trying to figure out how you pulled it off.
Infrastructure automation therefore, is not an afterthought.
It's because of this.
You have a product, your customers can come to use reliably.
It's because of it.
You have the confidence to scale your business and meet your customer's requirements for security and compliance.
Now, unlike your product, your customers probably don't care what your infrastructure looks like as long as their data is safe and online.
However, your developers do.
It's what makes their job easier when done right with the right infrastructure.
Your developers are literally untethered.
They can use all the APIs of the underlying cloud services.
And by providing providers like Amazon infrastructure that is well-designed ensures you capture the entirety of what makes your business successful.
Down to the nuts and bolts when you have it all in code, you can point to it.
You can tune it.
You can extend it.
You gain immediate visibility into what's working and what's not.
Plus you can fix what's broken faster than before because you are in control.
Furthermore, because you have all the toggles in your hands, you can control the costs and optimize your performance in runtime.
This is a competitive advantage because your business will be doing less well.
So your business will be doing more while spending less.
Again, you will be doing more while spending less.
You just need a plan for how to do that.
Your business is growing.
If you don't own the infrastructure behind it, you won't be able to hold on.
If you wait until the last minute.
It's too late.
And remember when the unicorn falls the rider you go down with it.
You need to be able to respond to all the unknowns and curveballs that are thrown your way period to be able to get to the next level to your business is headed.
You need to step up your game when you can rapidly respond to all the unknown requirements your business faces.
That my friend is a competitive advantage.
Anything short of this, it will ultimately be stifling for your startup is just a matter of time.
But whatever you do, don't make the mistake to build the wrong infrastructure.
This can have the complete opposite effect.
It can slow you down because you will.
And cause you to crash and burn hard good requirements don't always translate to good solutions.
If one thing is true in software engineering is that mistakes are costly and somewhat unavoidable.
The only way to reduce them is to have more experience.
Shortcuts taken today will be more expensive for you to fix tomorrow.
They'll just pile up as tech debt that no one will want to touch with a six foot pole.
So a little story here.
When clicker was sold to CBS Interactive.
They subjected us to extensive due diligence, looking over everything we did.
They wanted to know exactly what they were buying.
They were impressed by our adoption of AWS using end to end automation and that we use CCD content consistently to deploy everything.
That's why clicker stuck clickers technology ultimately ended up powering TV to this day.
And what I was put in charge with operating or not operating overseeing the migration of over eight properties to AWS while on my watch.
So I hope you see that DevOps needs to become the foundation of everything your business will do or be capable of doing.
I'll tell you what.
Once you own it your developers will love you for it.
And your customers will appreciate the outcome.
So let's take a moment and look at some of the major trends that we see in how companies manage their infrastructure.
This is how this is how they release their software and run their teams.
A lot of this is the result of how DevOps is breaking down the traditional barriers between developers and operations.
This is a collaboration.
This collaboration is what your company needs to capitalize on for maximum Roi.
Here are the five trends that I want to cover.
So we're on the same page.
First, let's cover infrastructure as code.
Why do we need to care.
Well, the problem is in the beginning, companies practice a lot of click ops basically manually provisioning resources as needed.
It was the fastest way to get up and running.
But with a huge caveat.
No one knew what was done and why.
Except for the person who did it.
Plus no two environments would be identical because they were always done by hand.
It wasn't repeatable at best there would be some out of date documentation and a whole lot of manual error prone commands that only a select few were confidently able to run.
This was a massive liability because if we needed to rebuild for any reason or spin up additional environments.
It would take days or weeks.
Worse yet if the business lost that key person their hands were tied.
Few businesses if any can survive that, especially if there's a catastrophic incident behind it.
So the solution is what we practiced today.
On the other hand.
Developers can use tools that let them describe the desired state of their infrastructure as code.
But since this code can get pretty complex due to all the edge cases in error handling.
That's why their domain specific languages like Terraform these DSL cells are what allow us to define what our infrastructure should look like in a declarative fashion.
The benefits of this are since it's all declarative.
Developers can just describe the intended goal rather than have to know all the steps to get there.
And since it's all code.
It can be peer reviewed and tested and even automatically deployed upon approval.
And since it was done via code and stored in your git repository, you have a log a change log of everything that goes on when you go to write it though.
Don't reinvent the wheel.
There are good patterns out there that exist for a reason.
Companies like Google Uber Netflix.
So much of their software.
We don't need to start from scratch anymore.
And we can instead build on the shoulders of giants.
There are literally thousands of Terraform modules out there that implement the best practices and best patterns of infrastructure.
The more of these patterns you adopt the higher your chance of success and the easier it will be to support in the long run.
Plus as a bonus.
It may help your company attract and retain the top talent who desperately want to work with this stuff and know what well-built infrastructure looks like.
Look no one wants to come in and inherit a hornet's nest of tech that and other people's dirty laundry.
So the next thing we see is that companies treat infrastructure like cattle.
The problem is that for decades companies have been terrified of losing their beloved servers because it was painful to heal them like our pets.
So servers were fortified with RAID 5 and backup power supplies.
They were given powerful sounding Greek names of gods like zoos.
But that was all a false sense of security.
We were never really any better off.
Because if the water main broke and flooded the data center, we would still lose everything.
So the solution is instead to practice, what we do today.
We treat everything as disposable and dispose of it all the time.
We build things that are easy to throw away and bring it back up when we run active active across multiple data centers in different availability zones.
We achieve high availability.
We always keep rolling forward and only look back.
So that we learn from our failures.
What we can do better.
So pardon the crass analogy here.
But the reality is we should act a lot more like ranchers.
If livestock is sick.
It's best to put them down before they infect the Herd.
The same is true with servers terminate them and move on then track whatever led to that failure in a postmortem and add it to the backlog.
So it gets fixed and won't happen again.
When you do this, you get a near immediate recovery.
You spend less time manually triaging and more time innovating.
Plus you're constantly testing your ability to fail over and that gives you the peace of mind knowing you can recover from scratch with regular practice and even fighter pilots know that your company can achieve this too.
The problem is in the previous era of software engineering that there would be these massive quarterly or yearly monolithic releases.
These were scary and stressful events because so many changes would be rolled out at the same time.
Not to mention usually manually that when things broke it was anyone's guess as to what went wrong.
The thinking was that the only way to have stability was to have less change.
In other words, less new features.
But this was all the wrong way of thinking about it.
So the solution is what we practiced today and companies are, what they do is they practice continuous integration testing with continuous delivery where this is where software gets deployed several times a day or week in small changes that are tested and released.
They get pushed through your entire release pipeline getting automatically tested at each stage once rolled out to production feature flags are used to toggle what users see making rollbacks as easy as the flip of a switch then using the combination of service level operators and service level indicators with error budgets.
A company knows when they need to play it safe or when they can be more aggressive.
Basically when they have too many mistakes or when they're just humming along in this new paradigm, the benefits are that when something breaks.
Not only do you find out sooner you'll know exactly which change.
It was that broke it.
Plus the change sets are smaller.
So the cost of a disruption is less big because the resolution times are faster and your team members don't need to wait to troubleshoot to wait on others to troubleshoot the problems they can fix it themselves.
So when you can rapidly roll back and disable the functionality using feature flags on your own.
That is power in your hands.
The fourth trend that we see this is a Biggie startups today are more distributed than ever.
Part of the reason for that is that talent is so hard to come by.
You might not be able to hire the developers in your area because they're in such high demand or simply just too expensive for a startup to afford.
Other times it might be that one of your star players needs to move for family reasons or a little list just goes on and on.
The reality is today developers can work from anywhere.
And they know this.
So the solution is to make sure that you have the tools to make her work.
Remote work possible.
And not sacrificing productivity.
So that you can get the job done.
However, it's only possible.
If your company embraces the movement.
All your portals all your monitoring dashboards that you need to run your business must be remotely available.
For example, using a combination of TLC with single sign on and multi-factor authentication to securely open up individual and specific services as needed.
Once you do this the benefits are that you can now hire from anywhere in the world and stop competing in a competitive job market.
You can focus more on improving color improving the quality of life for your team.
Allow them to work from home.
Every now and then and avoid stressful traffic and have a better life and more efficiently work across multiple time zones.
In fact, some companies even run 24/7 global operations that never sleep.
What we're leading up to here today is that we want companies to have autonomous cross-functional teams.
In the past.
They're typically be some form of centralized operation team whose sole job was overseeing the infrastructure and handling deployments.
They were often small teams who were siloed away in the dark corners of your office always on call and seemingly always the scapegoat for failures as the organization grew.
These gatekeepers became the bottleneck software and product teams were frustrated because releases took so long.
And deploying new features was it was never fast enough and to procure that new hardware resources was always too slow.
At the same time, the ops teams always felt overworked and underappreciated for their sacrifices and those were big sacrifices.
So the solution is instead to turn ops teams into enablers.
Their role should be to redefine should be redefined as one who teaches and educates and writes the tools to support the teams your product teams your software teams to right to own the software that they write from top to bottom.
This helps those teams run their own deployments and puts the teams on the hook for their own services uptime and performance.
Basically, if those teams get stuck there.
They can always escalate.
This is like where your s.r. will come in and help out to get teams back on their feet.
The benefit is that when a company practices this it aligns the incentives developers write better code that is more easily deployed when they get paged.
They feel the pain and they take proactive steps of those problems don't happen again in the future.
The ops teams on the other hand, get to focus on what they do best.
Building tools for the company and ensure that a stable platform exists for the rest of the company to operate on and get work done.
In the end your company will be more resilient to both technological failures and the loss of key players who have knowledge of intricate systems has been captured in code from the beginning and processes and documentation.
So now that you understand some of the common trends that we see.
And the overwhelming benefits.
This can have on your business.
I want to.
I hope you're even more excited to get started building your infrastructure.
But first, I want to make sure that you know what this is going to take and ask the right questions before you start breaking ground.
If things like the AWS will architected framework or the beyond corp. security model or the CISO foundation's benchmarks are unfamiliar then there may be a very big gap from where you are today to where you want to be.
The reason I'm telling you, this is we've seen a lot of companies who've tried and failed to reach their goal.
And that's why they reach out to us ultimately.
If you haven't already done this before it could take a long time to get it right.
I want to give you more insights into what it will take to operate and own your infrastructure by breaking it down to the five phases that will help get you there.
This is what we follow and is based on our experience of what works first step you need to establish your platform and infrastructure.
You need to architect what everything will look like and decide how to get it done.
This is where you'll be laying the foundation that will determine what you are capable of doing for years to come.
Mistakes made early can be expensive to fix down the road.
You want to make sure whatever you build today isn't something that will be impossible to manage or unable to meet your security requirements.
So let's take a look at gladly gladly was founded by a few heavy hitters who really know what they're doing.
The previous venture full on IPO and Van 2 was acquired by Symantec for a cool 350 million.
This time they built a customer service platform that lets customers get help that lets them.
This time they've built a customer service platform that lets customers get help using any channel like Twitter, Facebook email s.m. s phone.
In fact, if you want to experience it for yourself just head over to JetBlue.
One of the Fortune 500 companies customers and see for yourself when we started with gladly they had almost nothing in place on AWS.
They had a tad of Docker and a splash of ISIS with a whole lot of manual deployments.
But that was it.
There was no monitoring and definitely no standardized way to release apps but they are a wickedly smart team.
Who knew they desperately wanted to practice DevOps.
They just didn't know where to begin.
So we worked with them to build out all the tooling and establish the platform on top of which their teams could operate efficiently and securely.
We used a combination of tools like Terraform with cops with help to build a platform that would run their services.
We would use helm to provide their developers with a standardized way to build and deploy their applications in a declarative way.
And even easily roll back when necessary.
We used multiple AWS accounts to logically segment environments like production and staging.
This was essential for security.
We set up cloud trail audit logs to capture events which are happening in their system that would come into play later when they're going after PCI.
We use Datadog to monitor everything from top to bottom and set up alerts for serious events.
So that they escalated to on call engineers using the PagerDuty service.
Then there were these dashboards displayed prominently throughout the office on large big screen TV is for everyone to see which kept teams aligned and moving forward towards the same goals.
We used Sumo Logic to ingest the logs from servers in containers.
So we could report on them.
This made it so much easier for developers to understand what was going on when things broke because they could see what was happening from their applications without needing to SSH into servers.
We created libraries a reusable Terraform code called modules.
These enable developers to manage individual pieces of code without needing to be subject matter experts on what they were doing.
Basically the organization can provide the modules that form the foundation of the infrastructure and reuse them time and time again.
That way more developers can collaborate as a result of all this work.
The number of services they've deployed on the platform has grown tremendously.
And now they run everything on their own and better yet teams build and operate their own services and respond to incidents related to them.
This their rapid progress allowed them to sign major Fortune 500 companies, like JetBlue as customers and was a huge factor in their ability to go on a raise over 113 million in venture capital when you own your infrastructure and your platform you will be able to meet any complex requirements thrown your way because everything is fully under your control.
It becomes putty in your hands.
This is what will enable you to meet those requirements thrown at you from tough fortune 500 companies and what will set you apart from the competition who just can't keep up.
It's the kind of thing that becomes your competitive differentiator that you need in order to catch the whales who will need the assurances about how you manage their data.
Don't let security events derail your business and damage the reputation that you've worked so hard for.
Not only that with the right platform you will be able to gain the necessary insights you need to operate more cost effectively while doing more at the same time.
So here are some of the common questions that we get asked.
Related to this by far the most frequently asked question is this.
What are some of the tips for getting started.
First before you lift a finger I want you to get familiar with the AWS as well architected framework.
This is going to help you form your opinions on how to build your systems.
Now it's a lot to read and pretty dense, but doing so early will help you avoid or at least be aware of some of the most common pitfalls that we see.
The thing is it's just a documented framework.
It's not a run book or anything.
It doesn't break it down like that.
Your team still needs to figure out how to implement it and what tools to choose like Terraform to get it right.
And that is no small feat.
The next question we get is, how do you choose the right platform to run our apps.
Well, we get it.
There are a lot of them today.
And you're going to want to select the right platform for your company.
That's going to take when you select the right platform.
That's what's going to make it easier for you to manage the runtime and scheduling of your applications running in her containers.
You've probably heard of a few of them.
Platforms like Kubernetes from Google or shift by Red Hat incidentally built on Kubernetes now doctor swarm ISIS no nomad missiles and a dozen more, which is what makes it all.
Which one do you pick what you pick will ultimately determine the kind of talent your company attracts and the kinds of things that will be either easier or harder for you to do.
We're not ashamed to say we prefer Kubernetes over all the others.
It's pretty hard to argue that Google doesn't know what they're doing when it comes to operating this stuff.
So the platform is one thing, but how will you provide the services that your apps will need.
Unfortunately And I hate to break it to you, whatever platform you pick will not be enough on its own.
The reality is you'll need to use multiple services in concert to deliver your end solution.
That may mean using Kubernetes along with the aid of US relational database service arduous or maybe with the elastic system EFS for durable storage.
Maybe you're going to use CloudFlare or fastly as a CDMRP Sumo Logic, or Splunk.
If you're using some tool that doesn't support the automation of all these things in a reasonably straightforward way it's going to be a problem.
So in order for you to receive a realize all the benefits we've been talking about you need to pick a tool that's reasonably straightforward and works with AWS and works with your platform like Kubernetes and that's how you're going to achieve all these benefits, we talked about.
So how will you automate your infrastructure in a repeatable way.
If you're already familiar with infrastructure automation then you're probably from that you're writing some code infrastructure as code already you're probably using something like Terraform or maybe CloudFormation for AWS.
If you're not there yet, that's OK.
These are pretty easy languages for any developer to pick up after they're shown the ropes.
Either way if you're not doing it today, then you'll have just a little bit more work before you realize the benefits.
There's essentially this dip that must occur where you're essentially slower before you're faster.
If you haven't start started already well, then that dip is just going to be a little bit greater also, if you're using CloudFormation without using Terraform then you're very limited in what you can automate.
It's as simple as that CloudFormation is just for aid of US Terraform is for everything with an API.
Even Domino's Pizza.
And I'm not joking.
The best way for you to shortcut that dip however, is to leverage some of the thousands of open source libraries out there for Terraform.
That's why cloud policy maintains hundreds of modules on our GitHub.
Similar things exist for CloudFormation but it's just nowhere near as mature as an ecosystem where companies are sharing composing bits of infrastructure, the way they are with Terraform finally.
Everyone always asks us this.
How long is it going to take.
Well, so let's get real here.
If you haven't done it before and haven't done it recently is probably going to take you a while before you get it right.
There's no other way around it.
For example, we have spent at least the last four years working on this full time integrating open source software just to get where we are today.
And that's with a full time team of subject matter experts now because of that investment we are able to roll everything out for you in about 90 days from beginning to end in.
And if your experience is any less than ours is probably going to take you a whole lot longer, especially if you have a lot of spinning plates and no dedicated team like we do to work on it.
As a startup that might be a tough sell for your product teams who are frustrated that you're not moving faster or you're CFO, who just might not give you enough budget or runway to get it done.
So I'd like to take a moment before we move on to just call out why Terraform is such a game changer.
Basically Terraform is this DSL a domain specific language for desire defining the desired state of your infrastructure.
It's what enables you to safely and predictably create change and update your infrastructure.
It's an open source tool that lets you fire infrastructure with declarative configuration files that can be shared amongst your team members using your source control because it's code.
It can therefore be edited peer reviewed version just like everything else that you develop it supports plugins for what feels like every cloud and API service out there, including, like I said earlier.
But the real power comes from defining modules.
Now modules are what characterize some piece of functionality you want to standardize.
Basically the business logic for how to get stuff done in your specific company, for example, how should your database be provisioned.
And what are the best practices that you want everyone in your company to follow.
You stick that all into a module.
Now when a developer needs a new database for like a microservice they're developing.
They just need to invoke that module.
And they don't need to be domain experts to do a good job now your company as it evolves, it needs its library of modules to grow alongside of it.
The faster you continue to move while adhering to your stated best practices.
It's going to make it easier basically for you to review and have more stable infrastructure because things are done in a consistent repeatable fashion.
That's in line with your governance model that you've established now before.
Now better yet companies like ours open source their library of modules.
So that everyone can benefit.
That's why we now maintain literally hundreds of peer reviewed modules on our GitHub that can save you time.
Just like the 10,000 other people every single day that we see visiting our projects all right.
Moving on now.
Now that you have your foundational infrastructure in your platform up and running.
We want you to rapidly enable your engineers and developers to start using it.
This is what will help you achieve your first big win.
Developers will move faster than ever before.
They'll be ecstatic with the automated tested testing and the deployments that let them spot check their changes on the fly and you will be able to open doors to running a smoother queue a process that's not hindered by a single staging environment like so many other companies suffer from.
This is where you release engineering process comes in.
This is how your developers will continually and rapidly deploy their software to any environment.
This is how you're going to test this.
How they will be testing their changes and reducing their bugs.
All right, so let's take a look at another case study here.
This time it's peer street a startup in the financial services industry facilitating real estate lending to institutional investors.
When we started with them.
The first thing we did was a comprehensive infrastructure audit and code review.
We wanted to make sure that we could truly help them achieve success in a short amount of time.
From that review we discovered a lot of things they did well.
In addition to some improvements we recommended one of the things they did really well that we liked was the way they leverage Roku review apps.
This let them bring up short lived environments for every pull request and it said it's basically.
And it's a stable QA environment for every change.
The problem was it wasn't easy to bring up all the backing services on demand that they require.
So we fixed this using a combination of helm with helm file to deploy their services on Kubernetes but this introduced another problem databases and staging environments that were brought up on demand needed to have a fresh data set.
And they were quite large loading the fixtures at runtime just took too long.
So we implemented instead a nightly build process that would produce a Docker image of their complete Postgres data set after it had been scrubbed and normalized of sensitive information.
Now when a PR environment came online, that's what we're calling these short lived environments.
It only took a matter of seconds.
And better yet, they could test using more realistic data you see in our experience, most problems are a result of a lot of problems that affect production are a result of bugs that have value errors.
These aren't easily caught in simple smoke tests with fake data.
If the scale of the data is any different the benefits of well design defined and designed release process are boundless.
But the minimum of what you should come to expect to day is automated zero downtime deployments for all your apps across the board to any environment.
This is what's going to contribute to your overall product stability and enable you to release changes quickly without fear.
Basically your software never goes dark because you only cut over when all the health checks pass and you know things are good to go.
When you do this.
You'll be able to have faster release cycles then enable you to push change regular updates out to your customers more frequently because using smaller incremental changes are safer.
The velocity that you move at will increase as people are less afraid of breaking things.
Basically your confidence goes up.
Plus it gives the power back to the product teams to ship features on their timeline and not be blocked by large and dangerous monolithic releases that previously required a whole wrote war room to roll out part of the reason that you'll be able to achieve.
This is from the ability to run automated tests that validate code quality on every single comment this way, you can catch any regression faster to spend less time squashing bugs later.
In turn this will improve the overall quality of your product by providing and that we'll be providing even greater y to you in the long run.
Plus you can even scan for security vulnerabilities to meet your security and compliance requirements.
How sweet is that.
You're literally automating your compliance at the source.
On the other hand, you'll have more stable demo environments that are useful for customers sales demo.
So you can close more deals.
They are not just helpful for tech for testing.
Like we said with gladly they use these for customer enterprise sales demos scaling your customer sales demos for many high ticket SaaS companies can be just as important as scaling the technology itself.
We've seen this from our customers.
Also no matter how many automated tests you have you still want to have some level of user acceptance testing that can be done by humans to validate the outcome is what you had set out to achieve to begin with.
This is why you need to have staging environments that showcase the feature work that showcase feature work in a stable QA setting and you need those staging environments to not be destabilized when you have dozens of open pull requests.
So here are some good questions that we get asked all the time.
What should our release engineering process look like.
Well, to answer this really depends on the tools you pick.
There are countless CI/CD platforms out there and no one way to skin the cat frankly knowing which one to go with often comes down to the tool your team has the most experience with or is the most familiar with sometimes that's Jenkins other times that circle.
The risk with that.
However, is that you get stuck doing things the old way, which is suboptimal.
If you're not happy with your current process perhaps your tools are part of the problem.
We really love a tool called coat code fresh.
It works great with Kubernetes clusters out of the box because it was designed that way from the beginning it was it natively supports helm, which is what we use for packaging our apps.
It looks freaking great.
And supports a totally declarative pipelines that make governance easier with approval steps.
So how can you deploy features faster without breaking things.
This is another question that is frequently asked and isn't understood.
If companies haven't been practicing it before.
Basically what we want.
The goal here is zero correlation between the stability of your site and the release of new features.
To do that first and foremost, you need to eliminate all manual and error prone processes by using automation.
This means adding comprehensive automated testing using whatever test frameworks are available for your language like pi test for Python.
Then you'll need to make sure that everything is well tested before getting released in order to do that, though, you'll want to Run multiple staging environments.
So that you know exactly what you're testing.
And not something else.
We like using Docker to package and ship applications because it reduces the potential for environmental drift between when you're releasing your software you get a guarantee that you basically have this golden image that is the same in every environment.
Fortunately all this has been made.
So much easier today when you're deploying to Cuba and Eddie's is what makes it easier to spin up short lived environments for every pull requests.
So once you have your testing in your queue a process.
And that's working.
The trick is to make smaller incremental changes.
So that if unexpected problems arise.
And sometimes things slip through.
It's easier and faster to track down the change that led to it.
But you want to know how the pros do it.
They use something called feature flags rather than using deployments.
The truth is no matter how fast your deployment process.
Is it will always be slower than the flip of a switch.
Plus it's a lot harder to roll out a feature to a subset of your users feature flags are what solved that.
There are many open source options out there like Flagler or unleash as well as some paid solutions like launch darkly.
So just make sure you use one.
It's going to simplify your rollbacks.
So the other question we get is like, how are we going to achieve zero downtime deployments.
Look, this is something that used to be very hard.
But you get this out of the box by using Kubernetes.
It's just that you need to make sure you're doing a few things right in order to capitalize on it.
So there's going to be a little more technical here, but here are some of the things you're going to want to think about.
If you're using Cubans first you know your applications should try to be a stateless as possible.
Basically, you want to depend on durable backing services instead.
These are backing services that are ideally fully managed by services Amazon provides to all applications need to have health check endpoints that return successfully when everything's OK that way the platform knows when things are wrong.
Similarly, they should have a readiness endpoint especially for applications that have a slow start up time this way.
The scheduler in Cuba 80s knows when your application is actually ready to handle real traffic.
This will help you ensure that you will be more stable.
So the next thing you're going to want to do is ensure that you have reasonable thresholds set for how many of your services can be offline at any given time.
So basically, when you're doing a rolling update you want to say you want to have a certain tolerance that you can take down certain number of instances of your app.
So if you're running like 100 instances of your app, you might have a tolerance of 20 of them offline at any given time.
And this is what's going to enable you to roll quickly.
But if you don't have a tolerance set.
It's going to roll out one at a time.
And it's going to take you until tomorrow to do a deployment.
So when is set too low, you'll go nowhere on large clusters.
And lastly, you're going to want to have some disruption budgets set up on your Kubernetes nighties clusters.
And this is what's going to help you stick within your air budget for your company basically making sure that services are at least a minimum level of stability during your deployments.
And if a deployment is going poorly, then the deployment stops and that's solved by error budgets and communities.
So next comment, question we get all the time on customer calls is how do we handle automatic database migrations.
Well, we get this a lot.
Like I said, it's on top of every customer's mind because it's a pretty friggin common obstacle.
Let's face it.
Database migration sucked.
They easily overload the database with IoT causing slow queries and the kingdom come crashing down as all the queries dogpile on the servers.
But let's talk about some ways you can try and address that.
Here's five considerations that we tell our customers.
Basically do less of them and smaller.
I know this sounds stupidly obvious, but there's no silver bullet here.
So just reduce the frequency you perform migrations and have databases that have less painful migrations and definitely test them before pushing them out to production.
The next thing.
Make sure that all of your migrations are idempotent.
You must be able to run them as many times as necessary and expect a predictable outcome if they've already been applied once they shouldn't.
Error when running them again.
You don't want to rely on your deployment strategy to try and fix this is just going to complicate things for you.
That's why we always run them as part of our deployment process.
Then when you're going to want to then the next thing in order to be able to support database migrations is you need to make sure that adjacent releases of your application are compatible with the schema updates.
This is critical.
Don't assume that you can get atomic updates in a loosely coupled highly available distributed system.
It's just not practical.
So you're going to want to be better off without depending on them from the beginning.
So the way you're now.
Once you have these things in place the way you're going to one want to run these migrations is this part of your deployment process.
If you're using Kubernetes what we found is that Kubernetes jobs are ideal for this.
And that's how we do it.
Finally, we recommend that you leverage feature flags to reduce possible disruptions.
Remember what I said about earlier about atomic updates.
Well, this is how you kind of work around only enable features that depend on the new schema after the new schema has been deployed and after the Kubernetes jobs are successful, this way it can be automatically done as part of your migration process.
Moving on to the next question.
I realize I forgot some of the questions here.
There we go.
So the next question that we always get asked and that's on everyone's mind.
Very reasonable to be wondering at this point given everything else we've talked about is will you need to rewrite your apps and the short answer is probably not actually.
So maybe just a few tweaks.
Now look now look with Kubernetes can literally deploy everything.
That's why we like it.
However, just because it can doesn't make it right.
The trick is to make as few changes to your apps.
So that they will work better together in a standardized way kind of like an interface.
This is how you'll be able to move quickly and empower teens to deploy their apps without needing interventions to accomplish.
This means you'll want to configure applications in the same way every time.
For example, using environment variables for settings that way you'll.
The next thing is you'll want to use expose all of your logs as well.
And to do that emit them to standard output.
That way you never need to worry about filling up the disk again.
And Kubernetes will automatically capture them for you.
Plus if you're emitting logs make sure that there's structure because that way they're easily filtered on later on.
And now you know I'm cheating a little bit here.
Everything I'm saying is basically captured in something called the 12 factor pattern.
And I'll talk more about that in a second here.
So if you're planning on deploying applications that weren't originally written for the modern cloud, then there will likely be a few modifications.
So that everything goes smoother for you.
But don't worry if you're using a modern programming language framework like you know rails or any other languages for NPM or et cetera.
These changes shouldn't be hard for you to make because the constructs are already there and you might get some of them for free out of the box.
That's because the 12 factor pattern is nothing new.
It's been around since at least 2011.
Here are some of the things that we'd like to recommend and we do to our customers that are based on the 12 factor app pattern.
First Don't depend on local storage backing services instead like a database or object storage for buckets.
Now you want your app your services to be a stateless as possible because then you reduce the operational concerns by making scale and disposability easier if you depend on SQL then make sure that you can run automated database migrations because you'll need to run those at deployment time.
Also consider using feature flags to turn functionality on and off rather than depending on deployments to do the same thing.
Basically in the old school model deployments themselves were the feature flag.
That's what we got to stop doing then make sure that all your settings can be defined as environment variables.
That's the most universal interface every language supports it.
And it's the preferred way of configuring apps under Kubernetes and Dr. but most importantly, never ever hard code any secrets or host names in your code base because that's going to cause big problems when you want to try and deploy them to other environments and trying to achieve any kind of compliance on the other.
On on the logging front your apps should use a structured log format like Jason because that's easier to index and report on with tools like fluency and cabana.
Also send those logs to standard output rather than to a file.
That way they'll be easily ingested and you don't have to worry about log rotation anymore with platforms like Kubernetes.
Finally, make sure that you have standard health check endpoints that'll make your automated recoveries faster.
Now this isn't 12 factor to the teeth.
This is a practical application of one factor that we tell our customers right.
So now, while your developers are starting to kick the tires on their new platform, and busy porting their apps over you we need to start locking things down fast because things can quickly get out of hand.
You don't want to be that company on the front page.
This is where your platform comes in and where you're going to stop start focusing on your security and compliance.
Let's briefly take a look at another one of our customers case studies.
Even is a financial service company that allows workers to get paid on demand budget instantly and save automatically.
I really dig it.
Basically they're making it easier for people to be wiser about their money.
So that they can stop living paycheck to paycheck.
They're fixing it by building a new kind of financial services company that makes it easier to make ends meet so that people can pay down debt and save money.
Even was a referral to us by one of our partners ladder corps icon, a hardcore psych ops for hire team that works a lot like clown posse but specializes specifically on security.
They referred us to even because running on Roku was just not going to cut it anymore for the kind of product they are building.
They're basically building a bank and a bank doesn't run on Roku.
So we talked with Evan their CTO who reached out to us about getting off Roku.
We knew we could help.
We worked closely with them to see their chief information security officer who had previously dealt with banking grade security at simple bank.
So let's just say these guys to really know what they're doing.
The first thing we did was laid out a secure multi account Adobe us foundation that would let them govern their new infrastructure in a more secure way.
This enabled their product teams to provision their software in hardened environments that were strictly isolated from production while having the peace of mind knowing that all the accounts were aligned with centrally established company wide policies.
That is, we use the same Terraform modules to provision all the accounts.
But systematically applied the changes to only one account at a time.
This is key to maintain stability.
You'd never want to be able to accidentally apply your changes to all the environments at the same time because that is a bad thing.
That is how you blow your leg off.
Basically it's a little bit more tedious, but it's going to save you a lot in the long run.
We followed this by rolling out a well-established platform built on the best practices that enabled the automation and governor of governance rules for security compliance and operations.
For example, by using all Atlantis for continuous delivery for infrastructure code.
We ensured that every single change required code reviews with enforced approvals by key stakeholders before getting rolled out.
Since every change was made using this process.
There was total transparency and a change law for everything that transpired in their system, which is essential for compliance.
But the best part of all was that almost no one needed direct access to AWS.
This is the key.
Every change went through this get flow driven process instead of giving developers direct access to the low level API eight of US API.
So when you do this, you don't have to come up with complex error prone iam policies for every user that are a nightmare to validate and at best a false sense of security.
Instead you need to get out of developers way by providing them a battle tested library of modules.
So that they can easily implement everything without being experts and provide a secure means to review every change that they make without standing in their way and see the proposed change set.
Change sets posted as GitHub comments before they apply anything.
The next thing we did was to roll out the full suite of eight of your security products.
Mind you there are a ton of them today.
First, we deployed AWS ISIS foundation's benchmark.
So that we could easily establish a baseline of how we were doing and report automatically on the levels of compliance.
These benchmarks use a combination of eight of as config rules provided by the experts in the aid of US community.
They are deployed in every environment and then AWS can config continuously monitors them to detect non conformance.
Then we used eight of your security hub, which is a dashboard to identify gaps where we could improve.
We also deployed eight of his guard duty, which performs advanced threat intelligence to identify and prioritize potential threats in near real time.
It works by using machine learning to perform anomaly detection.
It can literally analyze all the millions of events taking place across multiple AWS accounts, then point out from the audit logs like cloud trails and VPC Flow Logs and DNS logs things that stand out.
It can detect unusual API activity port scanning failed log in requests it can pick up activity for known bad eyepiece that originate from unusual locations or not amazing proxies.
Pretty cool stuff.
The list goes on.
And this is just a small piece of what it can do.
But you can see why this is desperately needed.
I don't think I need to do spend too much more time to convince you on what the benefits are with good security.
They speak for themselves right.
But here are some of the ones that stand out for startups.
For one if you're an enterprise SaaS company by locking things down.
You can be more confident that you can be transparent with your customers about your security procedures.
So that you'll be able to close bigger deals with customers who take that stuff seriously.
In fact, not doing this is a deal breaker for many companies.
For example, many Fortune 500 companies will subject you to some pretty extreme scrutiny by their SEC ops teams.
We've been through it.
That's what happened with gladly and how we were able to land their first fortune 500 companies.
Also you'll gain the peace of mind knowing that you are in charge of your brand's reputation and have at your fingertips all the controls that you need when you are out that you need.
Because when you are outsourcing the sanctity of your brand to some third party you don't know, anything can happen.
And it all too often does.
And finally, you'll know then when it's time for your company to seek PCI and SOC compliance you will be better positioned to achieve it.
So more than anything else you undertake security can never be an afterthought.
There are just too many things that can go wrong.
It needs to be baked into the fabric of everything you are doing.
Now I want to cover some of the considerations for establishing your security baseline basically how are you going to reduce the attack surface.
One of the best ways is by controlling the blast radius using multiple layers of security.
So we frequently get asked, how do we get started.
When we don't even know what we don't know.
This is look, there's an overwhelming industry built around securing cloud infrastructure while the cloud providers themselves go to extreme lengths to secure the physical infrastructure like Fort Knox it does you no good.
If what's running on top of it isn't secure.
This is why Amazon emphasizes the shared responsibility model basically.
Amazon promises to secure everything up to the point that you touch it is secure.
Basically, you are responsible for the operating system, including all the network security and the configurations.
So first, if you haven't yet reviewed the aid of US CISO foundation's benchmarks.
That is a great place to start when you follow the Cia's benchmarks it will make compliance down the road more straightforward and less of a mystery.
This is because the benchmarks were established by the Center for internet security in collaboration with AWS solution architects and the compliance experts over at Accenture.
They help organizations assess and improve their security in an objective way on a base that's based entirely on industry accepted best practices look following the Cia's foundation's won't guarantee you things like PCI compliance.
I don't want to misrepresent it.
But they will definitely accelerate your journey to getting there.
The next question that we commonly get asked is, how can companies secure their systems when they have distributed teams.
Well, this is a great question because the trend today is for companies to have more and more remote teams right.
So even Google had this problem.
They spent years developing what they now call the beyond corp. security model.
It's a security approach that lets employees work from anywhere quickly and easily.
It ensures that access is only given to individual services as they need them based on who they are and what and where they are coming from.
Now, contrast this to your traditional VPN, which has a binary access model you're either on the inside of the whole corporate network with all the access that allows or on the outside and completely locked out to all the applications that sucks.
We're going to circle back to this in a couple more times throughout this presentation.
So stay tuned.
Once you get to the.
So OK, moving on.
Once you get to the point that you have tons of teams or lots of teams using of you'll inevitably encounter new growing pains.
That's why we get asked about how companies can work to maintain compliance while enabling teams to tournament's to work autonomously.
We shared a bit of this in the EBITDA com case study earlier.
So now how should companies govern their cloud to achieve compliance.
And not block teams at the same time.
This is what's called your organizational or this is what's related basically to your organizational complexity as your infrastructure scales.
It requires more and more teams to maintain it.
The more teams that use it, the faster Everything evolves and the more it will go out of control.
For effective collaboration.
It's important to delegate the ownership of infrastructure across these teams and empower them to work in parallel.
But without conflict.
For this to work.
However, we need to have checks and balances in place.
Basically trust but verify to enforce our security baseline basically.
If we take what we learned from the AWS as architecture framework.
It says we need to automate our security that means we'll need to have things like automated mechanisms in place to ensure that we don't deviate from the established policies.
The best approach we've seen or come across is to implement the set of security configuration best practices from these CISO foundations for hardening your aid of US accounts.
That way you'll know the moment your infrastructure falls out of compliance.
So you can take immediate action.
So a little earlier we're just talking about the BeyondCorp security model and wanting to spend a little bit more time on it.
So you'll see why it's such an important thing to be aware of.
Look, if you're using a VPN today, that's OK.
It was probably the best decision at the time, but it's scary.
And you need to stop.
The worst part of a VPN is that they generally expose way too much of your corporate network or cloud infrastructure.
The word the if users are bringing their own devices like phones or laptops and even taking your standard work issue laptops home with them.
You have very little assurance or peace of mind that malware isn't affecting their devices and aren't going to spread.
Keep in mind these days these things can spread so fast you can't even respond to them in time.
On top of that very few VPN actually use short lived signed certificates.
That's how they ensure that they're frequently rotated and essentially, which is an essential best practice.
If you're using long lived keys for your users.
That's a big problem.
Even fewer VPN.
The other problem is that even fewer VPN is enforced multi-factor authentication.
You're in for a world of hurt.
If anyone's credentials are compromised by the way, for example, you might have heard that Google just disclosed that any site could exfiltration cached secrets from the last pass password manager.
Not good at all.
That's why you need MFA.
Fortunately over the past few years, the BeyondCorp security model has become period has pioneered a better way that you can use today.
It's one that addresses the weaknesses.
We talked about in the classic VPN and it does it like this.
Basically your connections are still encrypted with TLS and signed by real certificate authority.
Then every request goes through an authentic a authentication proxy, which sends your single, which sends you through to your single sign on provider, which lets you enforce things like NFA and fencing but most importantly, you not only expose the most importantly is that you only expose one service sitting behind the proxy at a time not your entire corporate network.
So stay tuned to see how we implement this using open source tools in a little bit.
So step for your site reliability engineering process.
This is what's going to enable your team to sleep well at night and kick ass during the day.
Basically we want your team to operate at maximum efficiency and keep your sites available.
Keep your sites highly available at the same time.
The objective here is that you need to be able to see everything that's going on in the system.
It's like you're driving blind basically with all these layers of abstractions and indirection that we get these days.
It gets harder and harder to avoid to track down problems.
The good thing is that the tools to do this have caught up.
So you're not in trouble.
Here are the clear benefits.
Basically when you operate this way, you will be able to move faster by reducing the cost of failures.
So you because you're doing more and more things in an automated fashion you'll be able to release much faster.
Will you'll be able to gain insights much faster to gain faster resolution of problems.
So you can spend a little bit more time shipping features this way, you'll meet your SLAs with your customers.
You'll align the objectives from both business with engineering.
And you'll know exactly how much your applications cost.
So that you can optimize them.
You'll also have more stable operating environments they can be with increased visibility.
So top questions that we get asked what should our release engineering process look like.
Well, that's easy.
The Google SRT handbook is the holy grail for SRT ease and is published by Riley and available for free to download right now is a PDF.
So go do it.
There's no excuse for not checking it out.
One of the key takeaways is that your job is to provide product teams and product development teams with a platform of SRT validated infrastructure upon which they can build their systems.
You need to lead a system architecture and show them how inter service dependencies will work.
You must provide the guidelines for how to do instrumentation metrics and monitoring.
These are ultimately what enables us to operate a stable product.
You need to be able to demonstrate ably meet your SLAs for customers because your developers are committed to your service level objectives.
It's unfair to hold your developers or your teams responsible for software.
If there's no KPI for how to operate it.
This is what's going to enable your teams to prioritize and respond to emerging to the emergencies that matter in a large system.
There's always going to be some little cancers developing somewhere.
The key is you got to knock the ones out that are most problematic your systems.
It's also how you're ultimately going to be handling your capacity planning.
So you can fully forecast your Cloud Spanner.
And then optimize your performance and efficiency.
So you gain greater availability.
So a couple pro tips here.
What are some of the secrets for writing alerts.
Basically so the key thing you got to know is that alerts are for humans.
That means that we need to make them easily understood and actionable.
The way you best achieve this is using Prometheus operators.
And that's what we use.
The other thing is that your alerts must never spam because if your alerts are filling up your inboxes your developers or your teams responsible for operating on them are just going to tune them out.
We do this all.
I mean, we're guilty of this as well.
The only way to avoid it is to keep your alert threshold down and address them as soon as they come up.
So that they don't happen again.
The next thing is that your alerts should be declarative.
That is you need to describe what you want to monitor and not necessarily need to know every step to be able to get those monitors working.
This is what Prometheus operator does exceptionally well.
And the last thing is that you're going to want to make sure that your alerts are deployed with your applications in the old days, if we talk about like really old days here like Nadia's that you'd have to like you developers would build the app your op seems to deploy them.
And then somebody else would they set up your noggin those alerts and then you unemployed the application, you had to go with an audio server and you configure them.
No you don't do that anymore.
Instead your alerts are all configured with your applications themselves using alert rules that get sent out.
Not only that you can deploy your dashboards alongside your apps at the same time.
This makes sure that your monitoring is defined as code as well.
So we're going to cover now how you're going to support this system.
I mean, at this point, you have a solid foundation with all the tools and the processes your organization needs to grow.
Now we'll want to help make sure that your teams can run it.
If you have a dedicated DevOps team today.
But it's far from a requirement.
In fact, you'll still want to work closely and involve all of your development teams in this process.
So that you can move faster.
The first step to do that is by ensuring that you document everything and help support your teams who are going to be operating the platform to ensure that your team is prepared.
Your SRT SRT team needs to take ownership of training the team, which includes supply all the documentation to support the platform services.
When you do this, you'll get the knowledge transfer baked into the fabric of your operations.
You'll have the ability to cross train your team and ensure a human level of business continuity.
Look in this job market.
People come and go or people need to move for family reasons.
Your business needs to be able to support that.
Plus it's going to lead to happier, more engaged teams who get what's going on and why it matters.
With this you also get improved communication and collaboration between ops and developers.
No more silos.
This also will lead to greater professional development for both teams are all your teams.
Basically your operations teams get more involved in the development process and your developers get a greater appreciation for the operations process.
And that turns into goals for your business.
Since so once you've built everything.
It's important that you consider how you can support this system.
We need to make sure that your team will be able to handle and respond to all the unknown circumstances that will inevitably come up our experience is that if it's not written down.
You don't own it.
You will lose it all that tacit knowledge disappears in an instant.
When people come and go in this competitive job market that can happen before you know it and it sucks.
On top of that almost nobody likes to write documentation and it's always out of date yet everyone always complains when it doesn't exist.
Let's face it.
So it's not surprising that companies struggle to invest in it when it takes to times longer to write the explanation of how it works because good explaining is just hard to do.
So first, let's discuss what are the things that you need to be concerned about when writing your documentation.
Well, we found it is pretty difficult to come up with an information architecture when there's so many moving pieces in an ideal world users can easily navigate your documentation to find what they're looking for.
But what they are looking for is often dependent on what they're trying to solve basically.
How are they approaching the problem.
So good documentation must be properly tagged must have a hierarchy and absolutely must be searchable.
The other thing that we noticed is how the architecture of your documentation will naturally need to evolve as you're doing more things.
It basically becomes a full time job if you want to become really good at it.
We noticed this in our documentation plus what we also noticed is that our links were always getting broken because you're linking out to information across the web and the web is constantly evolving and outside of yours or our control.
So you need some kind of listing process on your documentation.
So you quickly detect WAN links go bad because the sooner you detect when those links go bad the faster you'll be able to find the replacement documentation.
Nothing is worse than being on call getting a page in the middle of the night.
And that being able to click on the link seeing your documentation and find out how to fix the problem.
You don't want that.
So make sure you document everything by design, but also make sure that you document your design decisions.
Basically in six months from now, you're going to be scratching your head wanting wait why did we do this again, when this other way seems so much logical.
Well chances are you had a really good purpose or reason for doing it the way you did it in the beginning.
It's good to keep that on a paper trail for that.
Plus it's going to help other people who join your team from other companies with other experienced, understand the fabric of where you're coming from.
Also, it's critical to keep a log of your postmortems basically the root cause analysis of everything that goes wrong ask the five whys of what went wrong when something goes wrong in order to get to the bottom of it.
Look there's always like these cascading reasons of why things happen.
Sometimes the why it can just be that.
Well, we hadn't considered this possibility.
It can be human errors, which it almost always is.
So other times the why is problems in the software product or bugs that come out.
So make sure your postmortems are very verbose.
Take screenshots of everything that went wrong and publish those in a centralized place.
So this becomes your library of playbooks when things go wrong to consult. Similarly you're going to want to keep track of all the remediation in the form of run books.
These your future self is going to thank you because what was crystal clear today after spending five hours trying to figure it out won't be crystal clear six months from now.
So document all your mediation and put those in the same place you have your documentation.
So how should we manage our documentation and keep it up to date.
And the other one is you know, how.
How are you best going to support your teams and the platform that you built.
Well, so that the way you're going to end up training your team is.
Well, by baking this whole process into your development process.
Basically your team won't have time to spend a lot of time training everyone all the time.
If they're responsible for everything.
And that's why you distribute this responsibility over time across your teams.
So basically, your ops teams you as our teams are very much responsible for writing good documentation and providing the tools and the platform for you.
But those tools in the platform need to be well documented then what you're going to be doing is spending a lot of time pairing with developers as you onboard them.
If you have rotational OCD basically you're on call engineers you're going to always want to pair, another JDeveloper with the person on call.
And then over time grow and increase their confidence addressing these things.
The thing is that if you don't have the foundation of all the infrastructure, if you don't have good diagrams if you're not documenting your dot design decisions all of this stuff is going to be feeling much, much more complicated than it actually needs to be.
So tip here markdown documentation for the win.
There are a lot of ways you can write documentation.
But please just don't use Google Drive.
We've seen that before.
Look markdown affords.
So many great opportunities for one, you can use it with your GitHub that you're already using your documentation gets peer reviewed and goes through your code review process and involves others in the process and make sure that you catch things that are unclear sooner because the documentation needs to be understood by others.
This is the coders equivalent of having an editor review their work.
You get better content.
Now since his markdown in markdown is structured.
It can be treated as code as well.
And this is really cool because it affords your documentation.
The same level of validation as your software as your documentation grows, it will become unwieldy automating it is just the same way to handle it.
What we use is a bunch of lynching to make sure that our links are valid.
We also publish it in a static form to a S3 bucket for absolute reliability because the last thing you want is your documentation deployed inside of your cluster.
When you're clusters having trouble.
So these things happen.
No matter how well built things are.
These are living systems.
So make sure the documentation is highly available even maybe on a different cloud.
So I want you to see now that if you introduce DevOps into your organization is going to enable your business to move faster with greater confidence.
This is what's going to allow you to build a company that keeps up with how fast you are moving when you own your infrastructure and build it to serve you.
That's how you achieve scale.
So I hope after all of this that you can see that spinning up a small little PLC is a far cry from operationalizing it all the way into production, there's a lot that can go wrong.
And when you haven't done it before.
There's a lot that needs to happen in order to get it right.
So don't push it off until next quarter.
You're just kind of be that much farther behind than you want to be.
We don't want that for you.
So hire someone instead that has a track record of success in delivering the results that you care about.
This is what's going to risk.
This capital investment and minimize the opportunity cost to your business.
Your CFO is going to dig it.
So how confident do you feel right now about achieving your goal after seeing all this.
I want you right now to promise me you're going to say no to duct tape.
You have two options.
You can do this yourself however it could take you an awfully long time to figure out what you need to build, which could also make this very expensive since we can both appreciate that the costs of human labor and missed opportunities is often the highest you'll need to get buy in from the business to get the budget.
This is going to take a seriously convincing argument that drives home with confidence.
You will be able to implement the solution that you propose and that by implementing the solution you will provide an even greater opportunity to the business than is available today.
You could struggle figuring out all the tools and how they fit together documentation of open source tools is poor and depending on your team's level of experience they could be spinning their wheels on it for quite some time without much progress in it.
Then what happens is all this stuff ends up feeling a lot harder than it needs to be.
And we all know that perception then becomes reality.
So you could struggle, then with the release engineering process.
For example, you not only need to get all your apps updated but you also need to define what that release process is going to look like and the wrong steps combined with the wrong tools will set you even further back.
The right strategy on the other hand, the one that will show you is deceptively simple.
And that's what we want you to experience.
Now we don't want you to struggle trying to figure out what combination of tools will work together.
This is what will lead to a few false starts and each false start is going to set you back weeks or even months further.
It's extremely challenging to line up all the pieces, especially when you have lots of multiple plates spinning here and have never done this before.
There's a good chance you will either fail or run out of money or budget without much to show for it.
Many companies have succeeded.
Don't get me wrong.
However, do not underestimate the scope of work.
Well-intentioned but less experienced developers or freelancers might say they can do it faster.
But what they probably mean is that they can implement some small piece of what you actually want quickly, which has a blatant disregard to the bigger picture and the long term maintenance implications.
We've spent years developing our project plan, hundreds of hours on just writing stupid little cello tasks with well documented steps and screenshots thousands of hours and documentations.
Look no one likes to write it.
But we have to do it.
It's your call.
So just to recap these things right.
You may not have what it takes.
You're going to encounter lots of false starts.
It's just the nature of engineering.
You may lack the in-house expertise or skill sets to do it right.
And you mail entirely failed to launch and go over budget.
Plus if you don't show early success you're just going to get derailed and frequently more frequently interrupted.
So your alternative here is to partner with us here a clown posse.
We invest in your success.
This is how it works.
When you hire a clown posse.
We become your rocket boosters.
We propel you into the cloud with all the right tools and the highest rate of success out there.
When you reach space we detach and let you go on your way.
Our job is done.
But of course, since we're invested in your success.
We don't expect you to pick it up overnight.
Me show you the ropes and stay invested for as long as you need our help.
Remember when we got started would gladly they had almost nothing in place on AWS.
They had a tattoo of Docker in splash of ISIS.
But that was it.
They were wickedly smart, but who but knew they desperately needed to practice DevOps but just didn't know where to begin.
Then no Terraform experience no Kubernetes experience and certainly no helm experience.
So we worked with them to build out the platform and the tooling and established the system upon which their teams could operate efficiently and securely.
We used that combination of tools like cops and helmand Terraform to build a platform to run their services and Datadog to monitor everything and set up alerts so that series of events were escalated to the on call engineers.
We assume illogic to ingest the logs from all the servers in the container.
So we could report on them and that made things easier ultimately for the developers.
And then we create a library's reusable Terraform code that instead that still did the best practices for deploying infrastructure as a result of all of this the number of services they deployed has grown tremendously.
And as a result of this, they were able to sign the major Fortune 500 companies that they needed in order to prove their business.
So today, teams build and operate their own services respond to incidents related to them.
They can function as a lean, mean operating machine oases rotate across teams.
So no one is always on call all the time, which leads to better quality of life.
And then remember I mean, they've gone on to raise a cool 113 million from top tier b.s. so that's pretty sweet.
Then recall how we helped pierce streetcar who are big time users of heroes who get on state of US they were constrained by operate on Roku and had pushed it to the limit.
But they couldn't.
And they just couldn't continue to grow their company on a platform like that with the number of services they were developing as the number of employees grew and the scale of their data grew as well that they had to crunch.
So at the same time, there were a lot of things they loved about Roku.
Don't get me wrong.
But they didn't want to give those up.
So what we did instead was take the most important features of Roku and replicate those on Kubernetes, which were a rookie review apps, which basically let them bring up these short lived environments for any branch or pull requests.
Now this was cool because instead of one solitary staging environment that's frequently unstable and essentially you now have essentially unlimited staging environments spun up on demand and disposed of Windows.
So it's very cost effective and efficient for QE.
That's why we deliver the ability to do the same for communities.
And we use a combination of helm and helm file to deploy the services along with their backing services on demand we use Terraform to provision all the foundational infrastructure like theses and RDS databases across multiple AWS accounts that were segmented by stage like staging and production.
Then we use cops to spin up the Kubernetes clusters of native us.
Look it was all really sweet stuff.
They've gone on to raise $50 million and we couldn't be more excited about their potential.
And look, all of those tools all the terrible modules all on our GitHub.
So you don't have to invest in it.
Then there was even NBA.com.
They were too operating on Roku.
They were running behind on their migration to w.s. ever after having trouble in-house and lacking the skill sets.
So they hired us to pick up the speed and get them back on track.
We used our vast open source library of Terraform modules to provision everything that they needed with infrastructure as code.
But since they were also building banking grade financial services they needed to know that there wouldn't be that there would be a rock solid security.
They knew they needed to know that their platform would be rock solid period from the beginning.
This is why we worked closely with our chief information security officer to roll out the full suite of AWS security offerings in the process we established the baseline of security, which conformed to the aid of USCIS benchmarks.
When we finished, they picked up from where we left off to build out their own DevOps team that continues to iterate on the platform.
We built again.
We're talking to you, the serious startup with a smart team who knows how to execute.
They just need some help getting them off the ground.
We know we get that you like to build things.
We're developers too.
But you are too busy solving your business problems that only you can solve.
We want to make sure that you can focus on your product roadmap because we know you have a lot riding on it.
That's why will help you by building out your infrastructure and leveling up your team in the process.
So they can operate it.
You can always staff up later.
But do it for the right reasons.
On your time.
It's a big win for your business and even bigger win for your team.
They will love you for it.
So in almost no time we want you to achieve these benefits.
We want you to cut months or even years off of your DevOps journey.
Reduce the time spent on R&D and the wheel spinning you'll face.
We want you to eliminate all that guesswork and instead have you start benefiting to day not next year.
We're talking about near immediate gratification to get the benefits and the outcomes that you want.
You'll quickly be able to validate the impact on our way to your business.
You'll achieve your goals by moving faster in owning your infrastructure after usually only a few months.
You'll remain extremely capital efficient, because we're easily cheaper than doing it any other way, especially the wrong way.
Then hire the right people at the right time build an incredible platform that increases the value of your business and delights your developers build an incredible software business on top of the platform that you built all I can say is sounds good to me.
My guess is if you found us it's probably by way of our GitHub, which has hundreds of original open source projects geared towards DevOps and infrastructure automation.
Or maybe you're part of our slack community, which has grown by thousands of members in just the past year.
Now we're about to show you what we're all about.
First, as a DevOps accelerator we've covered who this is for and what we do and what we don't do.
So let's take a moment and look at exactly how you can benefit.
I want to give you a taste of what our solution packs with real screenshots.
So you can.
So what to expect.
One thing to note everything we're going to show you here today is available today for free on our GitHub.
So make sure you check it out because we stand by the quality of our work.
At the end, we'll wrap up with exactly how you can get started.
If you want to learn more.
So we've done a lot of talking so far on everything that you'll need and want.
But what does this actually look like when you get it all working.
If you come this far.
I'm going to assume you're probably pretty technical or have an engineering background.
That's why you're going to appreciate what I'm about to show you.
No more handwaving.
We're going to pull back the curtains and check out exactly what we can give you today.
All right, here's our platform.
My guess is exactly what you've been looking for.
Let's just say, this is what will give your business the superpowers.
It desperately needs to move at the speed that you're going.
Our platform packs a big punch.
We've done the hard part of integrating all the open source components.
So that they run on your Kubernetes platform and work with your services.
It was not the easy stuff to figure out and it took us a long time to get it right.
Here are some of the highlights that I want to point out.
Basically unlimited staging environments for every pull request or branch stable release process for production with rollbacks best in class monitoring with permits.
Prometheus operator Griffon a century an Elasticsearch easy application discovery using our portal full single sign on integration using key cloak and G Suite.
After all that's stuff automatic TLS and DNS everywhere fully added it s the safe sessions support for the escrow service mesh with the agar and telemetry and total visibility into your services and what they consume in communities and what that's going to cost you.
That's thow is a mouthful right.
First, we laid out the your account architecture and get it tightly integrated with your single sign on provider like October GC.
We find this architecture works really well for most startups.
It consists of seven or more AWS accounts with various purposes like staging, which is for your dev or QE processes.
This is where we run our unlimited staging environments.
Dev is an experimental sandbox where your developers can do anything you want.
Basically, you give them admin there, and they can work without fear of repercussions.
Then corp is where you run all your internal shared services.
These are things that help you operate your business like CI/CD does servers where you'd maybe deploy your Jenkins.
Then you have your audit account for four by aggregating all of your collateral audit logs, your production account where all your production facing infrastructure runs like customer facing stuff.
And last but not least is the root level account, which is the top level account in your organization that delegates access to all the rest.
Now we divided up this way because we ensure that changes to one account can never really affect changes in any other account.
And we do this with a strict enforcement policy.
It's also great from a PCI perspective because certain environments can be left out of scope.
We're able to rapidly rule all of this out to you by tapping into our massive library of battle tested infrastructure code available today on our GitHub totally for free.
We are unlike other companies out there because for us, it's more than a job.
It's our calling.
This is why we built this DevOps accelerator parole program and why we did it.
The only way we know how open source everything we do and iterate on it to perfection.
It's the only way to move faster and make fewer mistakes.
Question any other company that doesn't do the same.
How can they ensure that you know what you're getting.
And it's also the only way we're able to maintain over 300 projects on our GitHub that are actively used by hundreds if not thousands, of people.
We are passionate about the problems we solve and it shows in our work.
Go check it out part of our secret sauce is our not so secret cloud automation shell that we call geodesic.
It's available today on our GitHub along with everything else we talk about.
We've mentioned earlier about how implementing DevOps requires some new tools.
Well, we believe these tools belong in a container and not on your laptop.
This is what lets US version every tool and maintain stable operations.
We stick all of that inside of this container and it lets us then rapidly onboard new people in minutes rather than hours or days.
And we can take this container and stick it in something like Atlantis, which is the CI/CD or continuous delivery tool for Terraform and then deploy infrastructure with the same set of tools that we could run locally inside of that shell.
Now this container will run all those commands for us.
And we'll run anywhere.
Containers are run the secret to making infrastructure automation easier and more accessible to everyone on your team is to introduce GitOps with GitOps.
The only barrier to entry is opening a pull request, which every member I hope of your software development team does with regularity.
Plus it virtually eliminates the need to get anyone direct access to AWS except for maybe your dev account says all operations are performed using pull requests and go into GitHub GitHub access is all that they need.
This means you have a solid change control process whereby get is your system of record and a built in accountability because every change is peer reviewed and only applied once that PR is approved.
We use a tool called Atlantis that we deploy using Terraform as an easiest task in far gain here's what this looks like in practice.
First, the developer opens up a pull request this kicks off an auto plan, which basically figures out what's going to change.
The plan is then posted back to the pull request as a comment that we show here.
This thing gives the team visibility into the proposed changes and a chance to do a code review.
It gives the assurance that the PR does what it actually says it does right.
Next, the PR is approved, then is approved by members of the team using a team.
Now here's a little quick tip using the GitHub code owners file, we can then require that specific members of your team review changes specifically like things like subject matter experts on your team then upon reprove approval those changes are then applied and the results.
The outcome of that Terraform Apply posted back to the pool request as a comment.
As you see here in the slides.
Finally, we merge that pool request.
Once it completes the beauty here is that we avoid the risk of configuration drift.
We don't merge the PR until it represents the desired state of the infrastructure.
And that's actually enforced using GitHub branch protections our release engineering process is probably one of the suite is out there.
Yes, there are a lot of demos out there.
How others do it.
But no one packs it all in.
Just like we do.
We're talking about paralyzed steps declarative pipelines as code approval steps with access control slack notifications deployment histories automatic rollbacks unlimited staging or demo environments.
Wait until you see all of this in action.
You'll realize how much you've been missing out.
That's at least put all of our customers tell us once they see it in action.
Our release engineering process covers both the QA side of automation and the production deployments on the QA side.
We enable our customers deploy an unlimited number of environments across any number of clusters.
No more stepping on each other's toes.
These environments can then depend on multiple backing services that are brought up on the fly.
Of course, we support automatic database migrations which are critical to any modern application framework.
We can run all of your dog or compose integration tests in parallel to speed up your builds and those short lived environments we were talking about are automatically destroyed when the pull requests are closed or merged to save you money and free up resources.
Now over the past few years, we've developed a beautiful release engineering process using code fresh art as our CCD platform code for code fresh is a great option for Kubernetes because it was built from the ground up to support communities.
We leverage a combination of parallel pipeline steps for performance approval steps to require certain stakeholders to approve that request, which puts the control back in your hands slap notifications to cut down on the email span easy one click rollback.
So you can have recourse if things go wrong.
Get hub status API integration.
So basically, a pull request communicate GitHub status and GitHub comments created deployment histories.
So you always know where a specific command has been deployed.
Remember, you're running multiple clusters multiple environments.
You need a way to track all these changes and we prefer using it as the ideal system of record.
So for example, here's what happens with a pull request.
We rapidly bring up a disposable staging environment for any pull request.
And we can do this conditional for a specific pull request that have a label.
So if you don't want to do it for everything that's OK too.
Then we update the pull request with a status link to the environment, which is what you see here on the left.
So that's then followed by notifications to your slack team.
Now to go review it, and then comments on the committee had comments on the comment.
Hash to show when and where that was deployed.
This is friggin' solid.
I haven't seen all of this in any other release engineering process.
We use approval steps to control the flow of deployments.
For example, you might let anyone kick off a deployment.
But require special approvals before running database migrations in order to avoid costly disruptions during peak hours.
Approval steps work well with this, especially when you're also using feature flags.
OK So once we have all of this stuff running on our platform under your accounts on your GitHub repositories we'd start locking this stuff down.
Aside from all the obvious things like security groups and VPC and private subnets.
And IAM rules they're all a lot of less obvious things we're going to show you those right now.
So we take secure security seriously because we know you do too.
That's why we've implemented.
So many layers of security to give you just a taste.
And just to give you a taste of how awesome.
Here is your are a few highlights.
We use temporary credentials everywhere possible.
We remove access we provide remote access using tightly controlled single sign up, which enforces multi-factor and authentication and even supports fencing depending on your single sign on provider.
Our SSA solution uses the best of breed enterprise grade SSH to full on session transcripts and YouTube style playback.
This is only possible with teleport.
There is nothing else like it.
Our infrastructure changes are then stored automatically in GitHub for your auditing.
So here is how your users will securely access internal services when they are on the road like your sales teams or working remotely beyond corp. is the security model we've been talking a lot about those pioneered by Google.
One of the core concepts is to kill the VPN and instead use identity where proxies.
This is because the best policy is always zero trust.
All that really means is using a single sign on proxy.
Combined with HTTP s So that you instead create what are essentially point to point tunnels between your applications and your users.
This makes it easier to restricts the apps.
And it enforces whatever single sign on policy, you're already using for your.
Your organization, including multi-factor if you have that in place.
And I really hope you do everything is still encrypted.
But you still give users the least level of access required.
Now, contrast that with VPN is on the other hand, which give away way too much access to internal systems and many don't even support an NFA or have the capability of doing.
So if your users are bringing their own device you're exposing your networks to whatever malware they have running.
And that's a pretty scary thought.
So we implement our identity where proxies using key club by Red Hat cloak works great with octet G Suite and dual and any other IDP that you're using.
So you can provide a formidable gateway to your apps that lets users access them remotely.
This is key for running a remote distributed teams.
This is also great for sales teams who are on the road doing demos with customers and need remote access without VPN.
Many corporate networks block VPN access but not HTTP Si want to point that out again.
Many corporate networks that your sales teams might be going to block VPN access because that's like rogue access instead use HTTP s you're going to get better quality of service.
We deploy an application portal that showcases then all the apps deployed inside of the cluster or clusters.
And this makes application discovery easier for your users.
And that's what you see here then for SSH.
We only deploy teleport.
Teleport is the best enterprise grade.
This is a solution out there.
There is nothing else.
By comparison Netflix bless is great, but it's still child's play by comparison to the teleport architecture.
Don't let anyone else fool you or by telling you otherwise, when you get to see teleport in action you'll immediately see why it's better in the differentiators.
Everything else is child's play by comparison, the fact is most companies do SSA wrong.
They may implement a bastion which is good.
But then they use shared accounts.
They may use Keybase authentication, which encourages Cairo which which makes key revocation and rotation difficult.
Plus it encourages the bad practice of sharing or reusing keys.
They may not even integrate with sample or single sign on complicating account management you always want to make sure you centralize your account management.
They may not require MFA.
This is a gaping vulnerability for something as powerful ss as SSH.
Just do this Google shape shift hack and you'll see what I mean.
There's a great torque on it by Netflix.
So this is all why we use teleport this way, we have fully encrypted session logs of everything that transpires on the terminal.
We get a YouTube cell playback of those sessions the playback you see here is actually a recording of the playback you would see inside of their UI.
This is not made up.
This is the real deal.
It literally looks like this recording.
Team members can join other sessions.
This is great for pairing or triage and of course, it all works with single sign on like suite or acta or any other sandwich provider that can authenticate your users.
Well beats out Netflix less and any other homegrown solution, you can come up with.
Oh, and teleport is also open source.
Plus they offer paid support plans and startup pricing when it comes to this, we don't mess around you don't you shouldn't either.
So with all of this stuff up and running on your platform, you need to now know what's going on.
Our SRT process is what gives you that as all the essential tools to provide you the visibility into your systems that you desperately need.
This is central to how you build stable systems.
So your team sleeps well at night and kicks ass during the day.
So look, I admit SRT is as much technical as it is cultural.
Our focus is on the technical side, the technical enablers that you are going to need in order to succeed.
We provide everything from capacity planning dashboards to the insights on how your apps are performing in real time.
So you can make informed decisions.
Here's how we do it.
We deploy the Prometheus operator, which lays the groundwork for the monitoring infrastructure.
Then we use alert manager, which aggregates the alerts and escalates the ones you care about to services like PagerDuty when certain thresholds are met.
Gryffindor is what lets us visualize everything when we make and we make sure you get all the dozens of essential dashboards, which don't necessarily come out of the box distributed tracing is then accomplished using Jaeger for monitoring and troubleshooting with complex microservices.
We track service level objectives used for your internal business.
Sla lays with service with the service level operator.
This is basically how you're able to provide a actionable SLAs to your customers and to your teams responsible for meeting them.
We use kubectl cost to track the spend inside of your cluster and outside of your cluster.
It's similar to teleport in that it's open source while it's portions open source.
And they offer paid support to get more accurate data.
And this is the only way you can truly get cost visibility into how much your services are costing inside of your cluster by labels.
You'll be able to easily search your logs across your services in a heartbeat using Elasticsearch with cabana and fluid.
And we send the most common application exceptions from communities into century.
So you can easily escalate them into Jira or trollop.
You can receive all your notifications via PagerDuty and the slack notifications.
So this means you can say goodbye to all those annoying alert emails filling up your inbox and you're full you're the SMS blowing up on your phone.
This is the real deal.
Now let's take a peek at what it actually looks like.
We strongly believe in dashboards as the best way to visualize what's going on.
That's why we ship dozens of dashboards and have the ability to import any one of hundreds of dashboards provided by the group Grafana community.
This is an example of one I particularly like that shows the clusters utilization for the purpose of capacity planning just as important is the ability to see how your applications are doing.
You want to know if there are any memory leaks or unexpected unexplainable CPU births.
Also interesting here is to see how fast your applications are responding to requests you get that with our dashboards.
We believe it's critical that you know at a glance how well your services are doing.
When you have dozens or more services with thousands of metrics.
This could be hard.
One good way though, is with KPIs these are what you'll want to put the office.
These are what you'll want to put on the office big screen.
The Google SRT handbook suggests using service level indicators combined with an air budget.
When you do this, you know exactly how well, you're doing and you can make informed decisions.
For example, if your air budget runs low.
It's a good idea to maybe hold back on more aggressive changes and be more conservative instead.
Or when it's high you can be more aggressive and move faster.
But most importantly, it provides transparency and accountability in a way that most companies lack you provide with ASL eyes and ESSA close a number that makes sense to everyone, regardless of their experience.
It makes sense to people on the engineering side and that makes sense to people on the business side and it becomes everyone's job to reach it and everyone has the right to push back.
If it means breaking the contract with your objectives when we follow this.
We have a quantified metric of how well we're doing.
Just like sales teams how would deal flow or deal size.
Now So when we implement this or the way we implement this is using the service level operator for Kubernetes, which provides a declarative way for developers to define the KPI of the services in code and to be notified when those service when those KPIs are not met when your apps have problems that you want the fastest way to know what might be wrong.
Usually this starts by looking for what kinds of errors are happening.
One of the biggest challenges for developers when adopting a new platform with tons of abstractions is getting to the bottom of these errors.
You need to quickly dive into them and fix the worst offenders.
For that reason, we deploy a century, which is in a service for tracking specifically application errors with century your developers can see the error sorted by the frequency and recency that they occur.
Complete with full stack traces and links to your source control.
That way they can.
And plus they can even forward them to your ticketing system.
So they won't be forgotten.
But not everything may be that obvious.
Sometimes you need to dig deeper into the raw logs.
That's why it's useful to be able to see all the events happening across your application servers and services through a single pane of glass.
We deploy fluent d to aggregate logs across all servers and ship them to Elasticsearch.
We deploy an Elasticsearch with cabana so your developers can quickly search through the logs and triage issues.
They can even create dashboards to service to surface.
What is most important to them all of what we've shown is now all of what we've shown so far is what makes it easier to deploy our services and platform.
In fact, you'll probably see an explosion of services pop up as a result of this.
But with that comes increased costs in order to control them.
You need to see what's going on inside of your cluster our cost visibility gives you those insights into what it actually costs to operate your applications and Kubernetes.
This visibility is what enables your developers to optimize and reduce that what it costs to deliver the services to your customers, which is a competitive advantage.
The trick, though, becomes knowing what to monitor and what to escalate on.
There are literally Tens of thousands of metrics at your fingertips with Prometheus.
Most of them though you don't need to, but may 1 tomorrow we use something called the alert manager with Prometheus because we want to leverage all the insights from the community, the community supports Prometheus operator for Kubernetes and for a Cuban entity, which in turns provides all the essential alerts for your platform.
Each one of these alerts has links to actual run books that instruct the human operator what they should do if they don't know the next step.
These alerts are then sent back to a Slack channel for the specific environment that they refer to.
That way teams are not overloaded with alerts that are irrelevant to them and critical alerts and critical alerts are always escalated to PagerDuty.
So now that you have an insanely powerful platform.
We want to help make sure that you can run it.
This is where our support comes in.
Here's how we help.
Here's how we show you how you can achieve your own operational excellence.
First, we have our documentation portal.
Here's where we keep an updated view of how we do things.
We you can fork it and make it your own.
We use Hugo the static site generator to build it lintott and validate our documentation.
But you know what.
That's totally rad because it's all deployed to S3 and highly available and distributed with a CDM.
Our documentation alone though, isn't what gets you over the line.
What's critical for us is that you enjoy working with platform that we built for you.
Therefore, we leave an open line of communication for as long as you need.
Literally every single Wednesday 11:30 AM Pacific time.
That's GMT minus 8.
We hold office hours.
This is a chance for you to ask questions and get immediate answers from us as your trusted confidence.
Plus, you'll be able to get demos every now.
And then, of new features that we add to the platform as well as hear from other the community as to what they're doing on the other side, we have our 24/7 community that spans literally 57 time zones around the world.
This is where you can get a sense of what others are doing.
It's the best place to talk shop and ask questions to a broader audience.
Most often you'll get answers to problems you're currently struggling with.
In just a matter of minutes.
Our community has been growing very fast.
And we keep archives available at archive sweet UPS icon.
So nothing gets lost.
So at this point, you've seen what it can do for you.
You know a little bit more about what it's going to take when you work with us.
You get a solid foundation with all the tools and processes your organization needs to grow needs to grow.
Plus we do this all in record time.
So here's how it works.
So first, we'll begin by rapidly rolling out a secure platform to run all of your applications will immediately transition to work and unblock your developers with an awesome release engineering process that use the CI/CD to automatically deploy and test any change to your software without downtime.
We'll give you all the tools so that you see what's going on and are not operating blindly.
Plus we'll make sure that you'll be able to access it all remotely while not sacrificing security, because the modern day team is distributed, and we know that.
Plus no solution is complete.
If you don't know how to operate it.
That's why we teach you how to fish.
So do you see how, if you just did these five things you two could build and own an incredibly successful business on a platform that you own.
So as DevOps accelerator will help you build and own your infrastructure that we've just illustrated.
Everything is to find us code ready today not tomorrow.
We're your best chance for achieving rapid success with minimal opportunity cost.
This is not outsourcing.
This is buying an outcome.
It's like hiring an electrician to do an electrical job.
You can operate the switch.
But you don't need to install the wiring and run the load tests.
When you hire someone who can rapidly implement everything we've just described you get to focus your intention instead on your business.
Do the things that no one else can do.
But you while at the same time, have the peace of mind knowing everything was built to code by following industry standard best practice.
Look you'll still have all the pros.
You own it.
You drive it you soup it up.
But very few of the cons none of the same fear, uncertainty, and doubt and none of the long term commitments.
You'll know what you're getting before you invest a single dollar because you've seen our demo of it in action.
And it's been done before.
So you can spare that sunk cost plus.
Top that off with no long term commitments no mandatory license fees, no strings attached.
And we're always there when you need us.
Sounds like a pretty good deal, right.
Because it is.
So I hope you see that we are not like any other company out there doing what we do.
We are as passionate as they come.
We are exclusively focused on tech savvy venture backed startups just like yours.
More than 90% of our customers are startups and at the end combined have raised over 500 million from the biggest names in b.c.
We prefer to work with startups because we speak the same language and have the same aspirations as you do.
Our solution is 100% open source.
You could do this in-house given a solid team with experience.
But really consider if this is a core competency you need to own at this stage of the game.
We also specialize in integrating paid solutions like Sumo Logic teleport Datadog Splunk page your duty code fresh and the list goes on and on.
All into your stack.
But guess what our configurations for these all of them are still open source, which saves you time and money on the implementation.
Since we've done it before and already figured it out.
We offer total transparency.
As you realize by now we are a different kind of company.
We are doing everything.
It's in our DNA.
You will not be able to find a single other company who has put more proof of work doing what we do on their GitHub than us period.
We are so confident about our quality of work we put our money where our mouth is.
Literally everything you need to day is on our GitHub.
So go check it out.
If you don't believe me, you get to see exactly the quality of our work right now.
No questions asked.
I can confidently say that, because we make our work public it undergoes greater scrutiny because individual and company reputations are at stake.
Plus everything we do goes through extensive code reviews.
Just check out our tool request and you'll see what I mean.
Also, because we open source our work.
The licensing is straightforward.
You know what you're getting is licensed under Apache too.
This is a popular and well-respected permissive license that lets your business do pretty much anything you want.
Just give us some credit.
We have a massive community adoption are repo our repositories see over 11,000 unique visitors every single day across all of our projects are terrible modules have been forked hundreds of thousands of times our modules received dozens of pull requests every single week.
This is validation that what we do is providing immense value to others.
And they see it.
We're easily cheaper as a result of this than hiring freelancers and consultants are going about this in-house to build the same thing.
I can show you the math.
Just ask consultants and freelancers build most things from scratch.
The cost comes out of your pocket.
Putting together a comprehensive plan is hard.
We've spent years refining our plan all the way down to the cards on our boards.
Few others if any have to that level of specificity.
Knowing what combination of tools will work together is non-trivial and requires some degree of trial and error.
That's why we've done it for you.
Member This is our core competency.
It's all we do as a business.
That's why I can provide this level these levels of assurances to you where others cannot.
Here are the facts.
The fact is by working with us.
You have the peace of mind knowing that we've done this many times before we've when we're done.
You are not on the hook for expensive retainers or license fees.
We can implement the project faster than most people could do it in-house.
And unlike any other alternative we provide you a predictable outcome that raises the bar higher than most good dream.
But be advised we are very selective about the clients.
We take on.
We have to be.
We want to work with companies who are fun.
Who love what they do, and what we can provide them.
We really want to work with companies who want to own their infrastructure and are desperate to do so.
Our mission is to empower you to not only own.
But to operate your infrastructure.
That's why we'll show you how to do it.
But just in case that's not enough.
I want to call out again, that we provide weekly office hours.
These are free one on one support calls held every week for one hour the zoom you'll get to directly ask us questions and hear from others in our community as well.
We do this because we want you.
We want to make sure that you get the maximum value out of your investment.
And this is how we ensure it.
Now I can't promise we'll keep doing this forever without charging for it.
But if you sign up soon I'll make sure you can get it to my goal is that when you work with us.
It's going to be a breath of fresh air.
You might have had that experience bad experiences with outsourcing.
Remember, we are not that we have a sweet process that we'd love to show you.
Here's what it feels like within the very first two weeks of working with us.
You'll have all of your foundational infrastructure in place, including your Kubernetes clusters.
You'll get a very quick sense of what it's like working with us.
We've worked with is blown away by our process and professionalism.
We'll make sure you get the same level of treatment.
And that's why you'll always have a direct access and a direct line to our team.
And to me, it works like this.
First, we'll send you our statement of work that describes what we'll be giving you.
You'll get to work.
You'll get to work on getting that executed.
Once we do and receive payment for the first sprint.
We break ground.
We'll start with a kickoff call, which is where we go over the onboarding items.
So we can get started.
So you know so you know what you need to get busy on in the things that we'll need your help with.
Then every single week will have a recurring status call where we'll showcase our current progress.
Give you live demos and answer your questions.
We'll only move cards to the accepted column once you're really happy with the outcome when we're all done, you can continue to join us every single week on our public office hours Zoom.
But hey, I get it if you don't dig our process for whatever reason, you can cancel anytime for any reason.
So while this has never happened before.
We don't want to put you in a pickle and that's our promise.
So seriously at this point what is stopping you go to cloud posse slash booking right now schedule your first call.
I'll answer any questions you may have and give you a demo of anything else you want to see when you're ready to move forward.
We'll talk numbers.
Remember, we'll deliver everything we've just shown to you in less than about three months.
That's everything you need to operate efficiently in AWS and we'll do this for less than the cost of a full time engineer.
How does that sound.
So go to cloud posse slash booking and let's talk.
My name is Eric Osterman.
It's been an absolute pleasure sharing what we do here a cloud posse.
If you made it this far.
I can't wait to work with you.
And there are those bullet points I forgot to show helps.
So frequently asked questions look, you don't need to stick around for this.
I can't cover everything on the call.
So I just want to make sure that we answer some other common questions that come up.
So who else are you're working with.
Well, we recently wrapped up an engagement we'd like joining a startup may be easier for individuals to find health insurance.
We're currently rolling out platforms for spot on startup building out of payment and HR platform.
They've raised over 60 million.
We're working with another company called check trade out of the UK.
They're more of a traditional enterprise, taking in infrastructure first approach building a new awesome product inside.
Will this product work for you and your business.
Look, we've not come across the technology stack yet that this will not work for.
That said, we typically do this for larger startups with serious technology requirements.
If you're on the smaller side our solution might be overkill for you and your needs at this time.
So it's better that you maybe come back to us after you grow a little bit.
We cover earlier in the presentation who this is for and who this is not for make sure that message resonates for you.
What if we want to use something else instead.
We are here today exactly because customers like you continually ask us to extend the platform to add support for new features.
For example, we're currently implementing an end to end data lake on AWS using tools like EMR hive airflow and superset.
These are available for free today already on our GitHub because other customers have supported the development of that we added support for Splunk Sumo Logic and data log all of these are examples where customers ask for the features.
And we deliver that as well.
That said, while anything is possible.
It may increase the scope of work and therefore, the cost.
I can't promise you will deliver it in 90 days if we're throwing in new tools.
So what is the success rate.
How can I guarantee.
How can you or me Erik guarantee that you are going to achieve your goals.
We guarantee you'll receive exactly what we've shown you as soon as the second week of work working with us, you will already be able to kick the tires while continuing while we continue to build out the rest of your platform and infrastructure.
In fact, we'll keep you involved throughout the entire process because the best way for you to know what you're getting is to experience it every week.
You'll get a demo of what we've done, we'll only move things to the accepted column when you give us the all clear to do so.
If you don't like what you see, you can cancel for any time and any reason.
Although we've never had that happen.
So what will the composition of the team be.
Well, we always assign one to two dedicated engineers per project.
Now who they are will depend on who's on the bench at any given time.
But we need to always make sure that we have continuity on our side.
So we keep the projects moving at a fast clip.
So how soon can we get started.
Look, our process is very fast.
It really just depends on how fast your company can move.
We'll have our demo call will happen.
So it works like this.
Basically three steps one.
We'll have our demo call you decide if you want to move forward too.
You'll send us the s.w. fully executed three we'll send you an invoice for the first sprint.
And as soon as we receive payment for that, we're going to start.
So that's pretty fast.
So do we provide ongoing support.
Yes, we do.
We don't expect that you're going to pick this up overnight.
That would be unreasonable but we do want to teach you how to do it.
We don't offer support if you didn't go through our program.
But we will offer you support if you have gone through our startups accelerator.
So how are you going to interact with the cloud posse team.
You're going to interact with us using direct access via Zoom slack and email.
We generally don't do site.
However, if you're based in California, we can arrange paid trips if you want to sponsor those.
How much does it cost.
Well, it really depends if you have AWS credits already AWS activate credits that largely offsets our costs.
And we generally cost less than the full time cost of a senior DevOps engineer in the US.
We get it finished.
On the other hand, in just a few months, which would take a senior team of several people much longer to achieve probably two to three times longer to achieve at least.
Remember, we've been working on this for the past four years with a full time team of subject matter experts.
Plus, we have influence in the community.
We move pull requests forward.
We are where we are.
We're friends with many of the open source maintainers of the tools that we use.
And that's critical for getting bug squashed quickly.
So how do the office hours work.
This is a common question.
So first of all, maybe you ask what this Office Hours mean.
I'm taking this from the academic setting in courses.
The professor will usually offer sometime during the week where you can come to his office and ask him questions and get help.
We're talking about that right now cloud posse author offers office hours to all of its customers and our community once a week, every Wednesday at 11:30 AM Pacific time.
That's GMT minus 8.
These calls are a chance to ask questions, get help and are just basically a sounding board to run ideas by us and get help.
We can even start doing screen shares and pairing if you need help as well.
Now, every now and then we're also going to throw in a demo of cool new things that we're working on because we want to know what We want you to know.
The other awesome things available to you also since all of this stuff that we show you on our demos is open source.
Chances are you can implement it on your own without engaging us.
You can just.
You just need to engage us if you want to move faster.
That's the message.
So how often is our training materials updated.
So our training materials are getting updated regularly with each new engagement.
Now documentation can always be better.
We're always investing in it.
And the more customers.
We have, the better our documentation gets.
Rising tide floats all boats.
So how much work is this.
I don't got time for pish posh right.
So when you hire us you're hiring us to do all the heavy lifting here.
We're going to help you get things off the ground and eliminate that guesswork for you.
Now in the beginning.
We'll mostly need your help in one area, which is acceptance testing.
That's where you get to say if you understand what we've delivered and if it meets your expectations.
How you'll get maximum value out of our engagement.
The sooner you get your hands dirty and start using what we've delivered and using us as a resource basically the more questions you ask.
The sooner you ask them.
That's how you will be successful.
OK, now at the end of the engagement.
You'll be expected to start taking over the day to day operations.
Now we're going to stay engaged for as long as you need us.
And that's why every contract includes bucket of ours for support.
Those hours last for, a number of months, depending on your contract.
Now when you exceed that number of hours, we can always react with more hours.
So you can get more help if you need it.
So what do you need to get started with cloud posse.
All right to get started.
All you need to do is get a few things signed.
Basically we need to do the statement of work the mutual NDA how we cover the secrets and the master services agreement, which is basically describes what it's like to work with cloud passes.
So there are no misunderstandings.
And as soon as we get those executed and we receive a check for the first or a wire transfer or Aca or build our comm payment for the first sprint.
We are good to go and we hit the ground running.
Now to get started with the process.
I've just described the first thing you're going to want to do is go to cloud pass second slash quiz again, go to cloud posse slash quiz and take the quiz at the end there's going to be a booking form where you can schedule time we can jump on a call.
But those questions are essential for us to understand if we can work with you.
That's the end of the Q&A for right now, we will be updating the section continually.
So check back and thank you again for sticking through this incredibly long webinar, I hope you learned a lot.
My name is Eric Osterman and you can reach out to me anytime.