Erik Osterman, Author at Cloud Posse

Here's the recording from our DevOps “Office Hours” session on 2020-03-25.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours.

It's march 25th 2020.

My name is Eric Osterman and I'll be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator.

We help our companies own their infrastructure in record time by building it for you.

And then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

If you want to jump in and participate feel freedom.

If you're tuning in from our podcast or YouTube channel, you can register for these live and interactive sessions by going to cloud the posse office hours.

We host these calls every week will automatically post a video recording of this session to the office hours channel as well as follow up with an email.

So you can share with your team.

If you want to share something in private.

Just asking you can temporarily suspend the recording.

That's it.

Let's kick things off.

There are a bunch of questions brought up today in the office hours channel.

I thought we could use those as a starting point for some talking points.

And first, I want to, of course, start with getting your questions answered.

So any questions from the community.

Good evening.

I'm on the Scott frost.

I'd like to understand a fuel Jenkins suite of Terraform tools that build a full end to end pipeline are all complete.

Or is there additional work that we would need to do.

All right.

So if am getting the your drift you're talking about this care per module.

That's correct.

Gotcha So this.

I'll say that we don't typically use this module regularly because we're mostly using code fresh.

We have kept it up to date as recently as five months ago.

And not only that we have added the terror tests to this module to validate that the server comes online and is functional.

That said then there's two pieces of this right.

So there's this Terraform module here.

And what's unique about our module for our Terraform from module for Jenkins is that it does see ICD of Jenkins itself using code building code and pipeline.

So it's a little bit better.

And then there's the Jenkins Docker image.

Now we are not as you can tell by the age of that we're not actively investing in it because we don't support and none of our current customers are actively using it.

Also we're moving more in the direction we're actually doing a project right now with Jenkins but we're going to be deploying Jenkins with helm on Kubernetes netizen said.

So this approach.

I think is valuable for those who don't want to run Kubernetes but it's not the one we're investing.

Thank you for that.

Looking at the architectural diagram apologies.

I'm no expert in this particular area.

I've gone through the Terraform code.

And I can understand most of this.

But the one bit that seems to be missing is the slaves configuration thing or is that something that we need to build ourselves.

So Yeah a good question.

So as you see in India and its aging are this was designed for us that you would then point Jenkins at a Kubernetes cluster.

So you might wonder why.

OK So first of all, if you're using Kubernetes is kind of moot.

You'd have to do something else.

You are thinking, why we created Jenkins as a standalone instance and beanstalk running outside of Cuban Hayes was that we could run on the get UPS the commands to it administered junk Kubernetes itself.

So that basically Jenkins could rebuild the Cuban at his cluster.

If we wanted to.

So one way is to have the slaves running as EC2 instances which there's just a plug-in for that.

And the other is running those slaves as pods under Cuban netting.

So if we go to cloud posse and search for Jenkins.

I just want to point out one thing.

So there is new or better ways of doing this.

There's like Jenkins configuration as code.

We are.

This is not using that plug-in.

This is using just straight up groovy.

I wasn't the one to implement it says a lot.

I'm not as fresh on it.

But the idea is with this groovy this.

Here's how you can initialize I bleed groovy with EC2 instances as the workers.

And then I think there's I think I'm maybe mistaken.

Anyways look over this Jenkins container because the Jenkins container is the one that sets up everything for that.

And if you look at the plugins that see these are the plugins that it installs out of the box.

So if one adds the there's an eight of plugin for running slaves on it.

Yes there's also a plug-in for like running using code build I think.

So there's a few different options that anybody else have more to add to this.

Anybody else maintaining tier 4 modules for Jenkins so thank you very much for explanation.

If I wish to put you on the spot and say whilst you guys are doing limited maintenance on this as a company that might be looking at this approach probably for the first time to get to roughly what your network diagram displays and the flexibility that we believe brings to the table for us should start now.

Traditionally my company is pretty much had AC 2 machines that have just been that formed and standalone Jenkins reference to us.

This is much more of an improvement.

Would you advise not going down this road and looking at some of the other Jenkins modules you've mentioned and steering away from this because it's old hat just so put a freeze on it.

So two things.

I mean, this is something that's going to be the critical path for your business and something you're going to want to extend or customize.

I would probably start by forking the module.

And then customizing it to your needs there.

Also, I just wanted to I was just joking here.

We've also updated this.

So the data pipeline to do backups is simplified now because W as backups came out since we did this diagram.

So now the backups here is done with those backups.

So to answer your question, though like would I recommend this or something else.

I'm not sure.

I haven't done any research on Jenkins Terraform modules.

When we implement this it was like back in 2016 or something at least two years ago.

When we started.

I think earlier than that, though.

And it was very bespoke to a particular you know you guys decided that you wanted to build Docker images using Jenkins save it to and deploy it to Elastic Beanstalk and if and if someone else wants to do something different for any of those, then this doesn't work for them right.

No, it's opinionated as it is a lot of the modules.

But here's the thing is like you want to, you want to have you want to deploy Jenkins to us.

You have I guess you have three options high level right.

You could do farm gate you could do cars or you could do beanstalk or OK.

And then lastly, you could just roll your own you see two instances if we go the roll your own you see two instances, then we have to invent the whole deployment strategy for that.

So the reason why we went with beanstalk is being stalked has the concept of rolling updates as the concept of integrating with code building code pipeline to do the deployment.

The other thing is if you like roll your own.

You have to come up with the strategy of how do you update Jenkins itself.

I like the idea of having a Jenkins container or running that.

And then you can actually developed and work on that Jenkins container locally if you want.

So I mean, I think there's an eye sometimes to be frank, I use this as an example of why you shouldn't use Jenkins not.

Why not.

Why this is bad architecture.

It's just saying like if you take this to the extreme.

This is like a best practices implementation of open source or Community Edition of Jenkins unaided of us.

And let me explain why that is so like you know you're deploying in this case and you'll be for the traffic you need to have TLS terminated at that.

So we set that up and then you want your access logs.

So we rotate, we have those going to the bucket, but then you know you don't want to have those access logs persisted forever.

So we you know we lifecycle those often glacier and then ultimately delete them.

And I mean, that's a lot to do just right here.

And we haven't even come to the point Jenkins yet.

And then we deploy Jenkins in this case with beanstalk look legacy technology, but hey, it does the job in automates or classic application architecture where you have multiple nodes.

And then fronted by a load balancer in this case Jenkins master instances you can't have more than one you can only have one online at the same time where you risk corruption of the file system.

But if you want to be able to do rolling updates, even if that means going down to 0.

And then back up again.

We need to share the file system.

So the easiest way of doing that is using IFRS as that files.

And then we also need to worry about the file system running out of storage.

But OK.

Now we deploy any offense.

We need to have backups of that.

So how do we do backups that historically was a real pain in the butt.

Where are you provision a data pipeline data pipeline has like some lambda that runs in one 8 BSS 3 sink and sinks those to the first bucket and then all the lifecycle goes on those backups.

Now that's simplified using data as a first class service called backups but then we got.

How do we update Jenkins so one thing is getting Jenkins deployed.

How do you keep Jenkins up to date infrastructure as code wise, and that's what this is solving.

So the idea is that you have the code pipeline and code build and you start to do that again, staying within the Amazon ecosystem.

Now if you want to do is give up actions or circle Ci or that would be a totally different solution.

So we tried to do here was implement using almost strictly AWS services.

This implementation.

Thank you for that.

I would say what used to build your diagrams right there.

Oh so this is a lucid charts lucid.

Yeah, they've got good.

They've got good to be us and Azure and GCP, you know MIT graphics and everything.

I would take carbonated it makes us easier nowadays.

Totally Yeah.

So I mean Kubernetes I mean, once you get over the hurdle of deploying and setting up Kubernetes, which is its own roadblock.

But once I got the whole let's determination thing is you know that's incredible.

Yeah and an adaptation.

Yeah So Yeah, this gets radically simplified if you use Kubernetes to run and manage it.

But introduce new complexities like if you're running Jenkins under Kubernetes how you if you're building Docker image is that suddenly is a lot more complicated and you need to use one of the alternative doctor bundling tools.

Thank you for that.

So a lawsuit or a folder.

Yeah, a lot of room for that.

Any other questions.

No, not yet.

It doesn't have to be related to this just in general.

My impression was actually, you might.

So I'm working with Kubernetes and trying to figure out how to manage certificates.

And I came across a project called cert manager.

Yeah and started using that.

Is that pretty much the way to do it.

It's the first one I just happened to come across.

Yeah, that's the canonical like TLS certificate management software.

It's basically.

So I believe it's just that manages that one day they built the first one, which was called LEGO.

LEGO is now end of life and has been superseded by certain manager and cert manager does a lot more than just or something.

Let's encrypt certificates it can also be your s.a. your certificate authority and help you with TKI infrastructure.

You need to do self signed certificates for services like the Alaskan side of the cluster.

OK How does it.

How does a new certificate feature.

I saw that's coming in.

Kate's was.

I think they announce something with certificates that anyone happen to see that.

Yeah, this for.

Is this for though secret encryption or this.

I think there were people there was so Cuban has an internal service for certificates for provisioning certificates for you know LED and for whatever you know for internal services for Kubernetes and people were using it, even though it was like, hey, we're not where this is really not meant to be used.

But people were using it anyway.

And so they saw that people were using it.

And what we can you know, let's go ahead and make this a public API that you know and kind of bless this.

That's cool.

Yeah, it was.

To be honest.

I was not aware of that.

So yeah, I saw the same thing from cystic that you're looking at now.

And at the same time that I happened to be working on cert manager and then I'm like, oh, what's this coming down the pike.

Is this going to change how things are potentially done in the future.

It sounds like maybe I guess if you wanted to use the route s.a. that's in the cluster.

Yeah Yeah.

Or I see it maybe replacing the TKI stuff.

But I don't see it replacing LetsEncrypt.

So what are you using certain integer for now.

Just actual like search on the ingress controller.

Yeah So you're going to want to use like LetsEncrypt.

Yeah to do that.

So that you know when someone's hitting your ingress with the browser they don't get this search is not trusted or whatever.

I say only way to get that green lock is to use a trusted root s.a. which is which is what led to this getting you one thing I really wish is that lot of us would add the ACME protocol for less.

I remember it being really nice.

Yeah, I was Ruby doing some research for this.

And I don't remember exactly what I was searching for, but basically you know I love.

Let's encrypt and I want to continue using LetsEncrypt but if I could pay to raise those rate limits.

I would do that in a heart beat.

But they don't even provide a way to pay for better service if you need it.

So that's why I really wish there was any other you know official CIA that provided ACME protocol without limit.

So managers not DNS validation right now with about 53 actually, I've got cert manager creating the records necessary in the row 53 zone for validation because all of this is on a network using external DNS for that.

Yeah Yeah Yeah.

We can't we can't use the HP endpoint because it's all private.

So using something Andrew briefly sir manager is not the only option if you're using like STL STL we'll do it too.

There's like glue.

I think you know that'll do it.

You know most of the more the full featured ingress options.

We'll do certificate you know do less termination too, which has a certificate management Ceo's is called right now is called citadel.

I think if you search STL citadel I think you're going to find it.

But it's getting rolled into another service it the CEO.

Guys are simplifying their architecture a little bit.

Yeah, that makes sense.

Lots of outsource.

Take a look at later.

We're not using vista right now just this is really awesome.

And it does a lot of really cool stuff.

But it's definitely you pay and you know you pay the price with complexity because it definitely adds complexity.

Yeah yes supposedly they were to make it simpler by just throwing everything into a single game.

And as opposed to having all the parts spread out into just like SVD or something.

Yeah but like the perfect.

There's a couple of really good reasons to use this to show that you know, if someone tells you our CEO is not worth it.

You don't need it.

You can tell them to shove it and like one of the perfect reasons is the mutual TLC that you get.

So like we had this we recently had to setup key cloak with open air that as the backing for the user database.

And we like the guys where we're digging down this rabbit hole trying to figure out how to get best to work.

So that you could securely do you know through key cloak.

You could if you know, if I'm a user and I go to my key cloak and I want to change my password.

It makes a secure connection to LDAP to change the password using t allow us to do that using all that pass and they're going, oh my god, you know, we're having such a hard time getting this to work we're you know, we've got to get it to figure, we've got to figure out how to get these guys to get open LDAP and key cloak to share certificates and manage those certificates in one hour we're all like, hey stop stop wait just turn on STL mutual ts and you've got a frickin' steal tunnel all the way through.

That is is solid.

Don't worry about all that mess at all have an unencrypted l that connection that is encrypted by Mitchell teal s through its dio bam.

So that's why it is magical when it comes together like that cool, cool.

Thanks Any other questions related to anything else.

I posed the question.

So the last two days.

I had a couple of issues with the commissioning issue two instances in particular, I see 5 o'clock in here central a one they just sort of place this thread in select as well.

I posted it with an image of the arrow message.

I'm just wondering if you have like knowledge where I can find out if something like this also an image in the thread, if you click something how I can basically find out if instances I will like I like mutate.

Yes I was Ezra all the way before.

So that's interesting. you know I don't.

I mean, it's a legitimate request and you should be able to just have an alert for that.

I know that not answering your question.

But are you able to spin up a different type of instance, if that one is not available.

Yeah, I was able to adjust like it's all around us.

And we didn't care about multi easy in this case.

So like the same consensus were available in Erie central a B and C So but in general like I don't know how to do it.

And I'm currently wondering if I should like introduce and an auto scaling policies auto skaters and AWS and makes instant policy.

So yeah.

So if you do, that's what I was going to kind of get at.

If you do a what do they call a template and no template or whatever.

But a lunch template.

Yeah, you can.

You can have a priority list of instance types and it'll try your first priority first.

And if it can't because they're out.

It'll do your second priority right.

So yeah, that's just what.

I'm like, because this is what I'm currently evaluating if it's worth it because like all three zones being elephant census seems unlikely to me.

But in the case, it happens.

I like to instance, a state doesn't change if you try to open up new instances, you don't get a notification like, I don't know what's going on there.

Like I actually like I don't have any insights.

We're just kind of weird.

We also just note that notice because well apartments rental broking.

So the only reason why for us.

We actually noticed it.

Yeah, I don't even know if there's an API to be able to check if that exists without trying to launch it itself right.

This is the thing I was thinking of.

I just want to see if they meant it had any mention of the problem that you're talking about.

So this is a set of lambda as you can deploy to monitor your AW limits.

Like when you're approaching them get alerted on that which is also valuable.

But I don't know if this is going to tell you well.

Like you can't even Launch Instance to begin with, even though you haven't reached your limit on it.

Maybe Yeah.

I mean, it wasn't my limit.

It didn't vote on any of our accounts.

So it was just a hobby.

This particular song didn't have any instant intent of this type of label and you've previously been able to launch them.

Yeah, it also worked in a certain sense and type and adopt a.

I switch some back to a smaller 1 again.

And that also.

I just could not open up.

See if I think slouch like for me.

I'm neutral.

It was like like I just like you I mean, they only have a certain capacity of bare metal servers you know what I mean, that's where spot instances come from this is your buying extra capacity for cheap right.

So there could be something there could be something in spot instance availability that you could look into.

You know that if spotted then start if spot instances aren't available.

That means they're running out of that type of resource.

And the other thing to consider are spot fleets which then can be composed of multiple instance types.

So don't know.

It's almost there.

It's almost like treating instance types as cattle as well.

Don't rely on just one type of cow.

Right now right.

Yeah So that's what I'm currently thinking about if I should do it If it's worth it because, well, for one particular, obviously spin up between 20 and 100 instances throughout today and well if we cannot spin up one sentence.

We're kind of in a better shape in terms of performance and stuff.

So yeah but Yeah.

Cool Checking Spartan might be a good indicator.

You can see here.

So Jeffrey and then we have Jeffrey you've been on the call.

Yeah I'm here.

Hey Jeffrey give me that.

Well, we get to your question here.

Jeffrey Then you know the best way to do multi region would Terraform unaided of us.

Let me bring up your thread that you had brought posted.

So everyone can get the context here.

Yeah, I could also just repeat as well.

So basically what I'm looking at is to have a disaster recovery for Terraform essentially just a duplicate set of resources.

We have a Kubernetes cluster.

And we use all of S3 and then so we're primarily in the US east region.

We want to have a failover into my example is just us west too.

But then.

So what I was worried about is managing that Terraform remote state in S3 because given that AWS is S3 buckets are region specific.

If let's say my initial attempt was just to have everything in USD one managed by $1 that was and also USD 1 and then basically sweat it out that way.

So our bucket in us was to would also manage all of us Western resources.

But then if you actually have an entire region going down.

That's right.

And also in between, we also have to read from a temporary stay as a data source that way.

I mean, because it's like for example, if you want to replicate a database from their source, then you also need to just be in a know the Amazon resource name from the other.

But then if an entire region goes down, then I imagine destiny.

But it would be inaccessible.

So my next thought would be just to have a single S3 bucket that manages all your regions.

And I mean, basically whatever regions that you're they also have resources in.

But then you would have kind of like a cross region replication of that bucket into somewhere else.

So in the case that a region actually goes down, you can basically just point it to the bug in the innisfail region.

Well, yeah, that's a great summary of the problem.

And we mentioned there to two thoughts come to mind first is I think thinking of it in terms of fail over is a fragile approach that you should think of it an active active as much as possible.

Now that might mean you run smaller sizes of your workloads in those two regions, but then you are actively testing them all the time.

Because what happens.

So frequently is you fail over.

But if you're not running real workloads on there, you don't really know that it's viable.

Yeah So that's something that you actually do want to do.

But under the direction of I kind of are management.

They want the capability to have a failover.

You know that's just based off some regulations with some things that we're working with.

But then to actually just kind of just turn off that capability until further down the line.

So at the moment, we can have an active active setup, then I'd like to propose another kind of, let's say wild or wacky idea non-technical solution really to your problem.

The idea is to phase shift.

And you what you want to do is ensure that no failure of one system is coupled in some way to the failure of another system.

And well, what about just not using Amazon as your state storage back end for Terraform therefore.

Yeah, that's possible.

I mean, this is just not like I know, for example GCP.

I like their data storage is like multi region.

I mean, I guess that's the advantage there.

I mean, like, well other, I guess you have any suggestions.

Well, so that was the one suggestion.

The other is though and I I mean, I still as much as possible.

I think you should still keep the state storage bucket totally separate and isolated per region and that regions more or less share nothing with the exception of some things that you mentioned like pure BPC peering connections you know where does the states along for that.

And then like setting up database replication on the other side, you have to fail over capabilities of the product like the front end and ideally you'll be able to address that just based on the health check capabilities of route 53 or whatever system you have sitting in front of that.

So that traffic will automatically fail over to the available region as necessary.

Now, if you need to do some operation like promoting one database to be primary.

I think that's going to depend on the technology.

And that's where you're going to have to be most careful about where that state is stored that you're not in a pickle on that.

Yeah And that's kind of exactly the.

It was it was during the process of kind of restoring or I guess promoting the reader right up to a master database and eventually like reverting back is where we ran in to that where this thing is even the resource name we still have to be reading from the remote state.

And then at some point when we want to start.

Let's say you remove all these internally consistent.

So remote state and us E2 should always be reading other remote states us these two Never remote states in USD 1.

I see.

OK Yeah.

OK So.

So that means that to bring up this infrastructure in USD to you have a parallel set of resources in that region and parallel state files and all that.

OK So.

So essentially just like a cross region replication.

And replication has kind of some loaded concepts with it that things are getting like there's some process replicating it.

That's not what I'm saying or suggesting.

I'm saying that you actually have the ability to spin up a parallel stack in the other environment of which most of it may be turned off or whatever cost reasons.

And then you have your replica a database that's turned down.

Now There's also Postgres or has their global database.

Now, I don't have firsthand experience with it.

But this seems to address part of that use case that you have.

And they support Postgres and zip my SQL to now.

My 6 4 5 7 I guess.

OK I think right.

But some.

Yeah So you might be paying a price for running that.

But I mean, it's probably going to be less than the engineering cost instrument in your own system.

Yeah Right that makes sense.

Thanks a lot.

So what do you look like.

What are you proposing with the Rover well.

So they have just like Microsoft offered with the SQL database they have now a global database.

So it should support regional failover or the situation where a region is offline.

And then how were you applying that to terror for he's said so that I wouldn't have to have a good source and failover party at square one gets promoted indicates that A goes down.

So it basically just handles all of that failover.

OK and round 53 is global.

So I presume the end point for.

There are a global database that endpoint would work in.

And rather to the appropriate region since OK.

Thank you.

So we covered that one.

All right.

So interstitial topic, I thought this was pretty rad.

Slightly disappointed.

But I get why they did it the way they did it.

So one of the nice things with Cuban this is like this API where you can just throw documents at it and it creates it for you kind of like CloudFormation but genericized.

And then there are four operators down the business logic to do whatever you want.

Well Terraform therefore would let you know.

It is this gap like it would be awesome to be able to just add deployed a document to Cooper and eddies.

That describes my RDX instance or something that I wanted to deploy and then Cooper and handles the scheduling and provisioning creation of Terraform to do that.

So there's been a few attempts at doing this by other companies by other people basically rancher had one.

Lauren brought this question up again recently.

I'm going to put you up.

Pull this up.

Or not Lauren.

I mean, Ryan Ryan Smith.

So AWS has like their service operator, which is kind of a rapper for calling CloudFormation and let's see where Yeah, this was the link to the one that rancher had and these all allow you to just use Terraform open sourced to provision Terraform code on Kubernetes but now turf now hashi corp is coming out with a first class provider for this not a provider.

I mean operator for this.

And it works in tandem with Terraform cloud.

So basically, it triggers the runs to happen on Terraform cloud for you.

And then Terraform cloud can handle the approval workflows and exception handling and triggers that happen like if this project runs and trigger this other project, which would be really difficult if they had to re implement all that side of Cuban.

So that's why I get why they're making this integrated type of terror from cloud is it clear kind of what the value is of what this operator is providing for communities like I can go into more detail.

All right.

So you know it's not clear from our understanding of truth and evidence to begin with isn't there.

So I like it.

Yeah, I like to think of Kubernetes as a framework to things.

It's a framework for cloud automation that's the genericized across providers.

As you know spring is to Java as rails is to Ruby as Kubernetes is to cloud almost.

And then it's another thing.

It's a new generation of operating system.

So what are the characteristics of an operating system an operating system has a kernel that does the scheduling community says the scheduler.

That's kind of like that concept, but it treats nodes they can distribute those workloads.

Another characteristic of an operating systems that has a file system while Kubernetes has a primitive before file system operating systems have primitives for configuration like xy.

Right We have typically all the configuration of communities has config maps analogs go on and on and on and on.

So if you work within Kubernetes within the bumpers of humanities then you can achieve automation much easier.

So that includes the problem with Terraform that we have today is how do you orchestrate the actual business logic of deploying a module like if you had if you had if you wanted to do blue green with Terraform where does that logic of blue greenness go in Terraform.

There is no place or you could write your own Terraform provider to implement green or you could do weekly ops.

Basically you described in a wiki page how blue green with your telephone or you can script that with Ansel or you can script with some Bash group or maybe you know the future is actually you know orchestrating that stuff more with terrible.

That's right.

Humanities and humanities operated like totally missed the punch line.

So to kind of make sure I got to understand what the value add for this cash crop offering is that through this operator for Kubernetes nannies.

Now, I don't know what the difference is between the operator and a customer resource declaration operator implements what the customer resource definition describes.

OK, thank you.

So I guess what I'm saying, though, is that you could take this operator apply it to your Kubernetes stack and then give him whatever credentials that needs.

And then you supply a manifest saying like, hey decides this app.

I also want to run this, you know I don't know Amazon global database Aurora instance.

And so when I spit on my app it says, hey, by the way, before you do this on yours and I need this dv does that Cuban is reaches out to either of us through Terraform creates that Aurora instance comes back with the end point feeds into my hand and I have a running app but alter I think it's in the picture.

Yeah, thanks Jim for Brennan bringing that up.

That's a really good point.

So like when you deploy your apps to like here's an example half we have under a cloud posse and we're deploying this withheld file lab you know using Google days or how file that's not as helpful here.

But I wanted to point out some things that are happening here.

One is that we're deploying here a C or D for a gateway.

This is just a generic gateway that happens to be for steel.

Here we have a virtual service.

And well, what I could add below here would be a C or D for Terraform.

So here Terraform provision this module for me for RDX.

So now we have one document language that handles the configuration and deployment for everything Kubernetes and everything outside of Cuba.

And we can package all of that up and in the help package we can actually have then a help package for our application that spins up our services and companies are banking services with Terraform and RDX a provisions that I am rules with Terraform and it deploys this CloudFormation from our third party vendor in one thing.

What are the implications of this being Terraform cloud as opposed to say open source Terraform.

So no first hand account.

Of course, this just happened.

Looking over the announcement in the video, the main thing seems to be that it triggers all two things.

One state storage.

It works with the Terraform from cloud state storage.

So well if you're using that that's going to be a deal breaker.

And then to the next big deal breaker is it works on the concept of remote runs.

So with Terraform cloud.

It's really cool.

You use the client to trigger runs in the cloud.

So even if your client disconnects or you close your laptop or whatever everything keeps running and you just reattached to it in the cloud.

No And that removes the need for managing is the area that is managing the state at all whether it be or as through your back and Yeah turbulent cloud as is doing them for you makes it a black box more or less to you and you don't have to worry about corruption of that securing of that allowed.

Quick question as you've touched on Terraform cloud when you're using something like local Zach power and Terraform where you'd normally give it something like sh whatever you command might be.

Yeah My team operate in multiple OS is Unix based on Windows based.

That's always been a challenge for us with that type of thing of ensuring we get the right type of commands have a handle that in Terraform cloud.

I can answer it as much as I know from our experiments.

Also, if let's see here John is online Lan John's not online.

All right.

So to date my understanding is that Terraform cloud only supports Linux like runners and those are hosted.

So the Terraform command local exact runs in that Linux choke.

The other problem is that they don't allow to bring your own container.

So you're restricted to the commands that exist in that container.

So that's why what some people do is you have a local exact that goes and downloads the aid of US cloud.

You have a local excuse me a local exact that goes to download some other binary you depend on.

Then the others way it works is that you as the convention that you in your dock Terraform dot folder you you get and you add that to your source code.

And in there you can pick any providers that your module excuse me that your modules depend on.

OK So that partially answers your question.

That's how it works for.

For Linux based plan and applied and stuff like that.

I don't know if Terraform cloud has a solution that works on Windows yet, but I'm guessing it.

So I just.

Just as a observer of this.

I'm sure I'm impressed and surprised by how much Terraform is happening on Windows and started on Windows workstations and windows environments because it's something I haven't touched in 15 years.

Yeah, basically surprised.

And I look at my colleagues who use Windows daily and say, how but there's no I mean, I have nothing but respect for Microsoft these days they've really turned a corner in my book and spoke both supporting open source providing awesome tools.

I mean visual code is arguably one of the best ideas out there that the windows subsystems for Linux.

That's pretty cool.

And a strap that on.

And I think they've been made that more first class and improved.

So yeah, I get how it's possible in school.

Cool then adolescence.

Yeah or what are they calling it.

But Nadella sons Nadella stands to unroll unpack that one for me.

I don't get that.

Such an adult Nadella is the new CEO of course.

OK, got it.

He's turned the company around.

Yeah, that makes sense.

All right.

Well, then a personal shout out here if you guys wouldn't mind nominating me for hashi corp ambassador that be pretty awesome.

I know I reached out to some of you directly, but I'll share this in the Slack channel.

I think that would help us reach more people.

So your nomination would be appreciated.

I just sent the link to the office channel.

Right then another one.

This is AI don't know if I'm going to be able to answer it in a way that's going to be satisfactory because I don't think we've solved that out of anybody's entirely solved.

But Ben was asking in one of the channels.

And I shared across posts that link in office hours here to his question.

This one.

So hey folks.

I'm looking for some advice on how people are tackling the chicken and the egg problem with secrets management.

I had the idea to use Terraform to provision bolts but with this comes the question, where do I get the secrets needed within Terraform scripts.

Of course, I do love to use volts for that one solution I've heard is to place the scripts in a super secret repository along with a secret restricted access points like you.

And while I guess this works something about it feels dodgy but I guess these and its secrets have to be stored somewhere.

So there's a few questions here.

There's a few topics here.

So one is what to do about vault masters unlocking secrets specifically and there's actually a great answer for that.

But then there's more like like what to do generally about this cold start secret because it's going to vary case by case on the type of secrets and the type of applications you're working with.

And we can't just generalize all that.

So with vault the one of the nice enterprise features that they have now released last year at some point into the Community Edition is automatic unsealing with cameras keys.

So I believe in the vault Terraform module that's g.a. on the magic corporate web site, it supports that automatic and sealing with came as keys and then it's just then it's just a challenge.

I am.

So you've got to set up your IAM policies correctly to guard control.

But then let's talk about the other things that he talks about like you know one.

One thing is to have your secrets actually in a get out the repository in plain with access to a select few.

I think we can all categorically say no to that.

We've all learned that this is not a good thing to do.

There's an augmentation of this that we see.

It's kind of like what you can do is you can encrypt those secrets using the public key that the cluster then has the private key and the clusters enable to decrypt these at deployment time or you have that baked into this process and the cic pipelines are able to be.

I still don't really like that solution too much.

The reason why is it's not opaque.

Not that its secrets can be opaque, but I mean here you're encrypting binary data.

If somebody is inadvertently and I'm not seeing the forest.

I'm not saying as bad actor just accidentally messes up another secret.

I don't approve that request.

I approve this broken deployment of secrets going out there.

I don't think that is that great either.

So ESM Amazon Secrets Manager.

I think has obviously thought a lot about this when they implement that and how it works.

So I want to talk about another key problem here with secrets and that's rotation right.

So you can't just rotate the secret and expect you node cluster not to fail if you don't support concurrent secrets being lied at the same time.

And then lifecycle cycling out the old ones.

So you almost need to have blue green secrets kind of or you need to let you know at some for any given period of time multiple secrets online and working.

And this is how like Amazon's SPSS API works when it issues the temporary credentials your old credentials will eventually they expire.

They don't expire at that moment.

And you have some overlap where both of them are valid.

And that's to have graceful rollovers.

So in whatever solution we come up with here we need to take that into account, the other is we need to have some certainty of what is deployed and being able to point to a change that triggered it possibly.

And this is where, like, an idea.

Let's just say like security aside man wouldn't.

Everything would be just so much easier if we could just encrypt these.

Not that we could just have these secrets be in plain text in it.

It just how it with the rest of our cic pipelines man.

And if everybody was just honest and nobody would do things wrong bad and simplified everything well.

The alternative to this that Amazon Secrets Manager does that make it.

So easy is that you get a version i.e. that refers to that secret.

And that version I.t. has a computer nothing sensitive and in that one, you commit the version idea to your source control.

And that's how you roll out the changes.

So that gives you a systematic way of doing that.

Now how do we get the secrets into the ESM not us as an E&M in the first place.

I don't have the answer.

Ideally for that.

I think some kind of a front end for that integrated with no single sign on would be nice.

Obviously, you can still use the Amazon Web console for that.

I think some of the more purpose built for that would be nice.

And then lastly, I'd like to introduce like another concept that I'm finding pretty awesome is the external secrets.

Operator so the external secrets operator from Cooper and this is now again, more of a community centric solution is that it supports multiple back ends of which one password is one of them.

And I find it's kind of cool because this allows you then to present a really easy UI for the companies.

And then developers to update their secrets.

Is this going to work in a mega corporation with 2000 engineers.

Yeah, it's probably not the right solution.

But you know, for your smaller startup with 15 to 25 engineers this is probably going to save a world of hurt and complexity.

Just by using one password for a lot of those applications specific secrets.

These are the secrets I'm talking about for like integration secrets.

Third party APIs, and stuff like that.

Another option is just get out of the secrets management business together and use something like dg.

So this is very good security and what they become your third party that you entrust to manage secrets and then you just have tokenized values for everything that you where you need a secret.

You can commit those tokenized values to your source control and deploy your code here.

But the secrets themselves are with yes and the way it works is basically you export a proxy setting.

And that makes your requests go through.

Yes Now would you do this for internal services and where they ingress and egress again.

No probably doesn't make sense.

But where it makes more sense if you have third party integrations that you need to handle security.

This might be an option.

All right.

So yeah.

Any other tips on secrets management things that you're doing.

Andrew Roth.

I have some ideas or suggestions.

I posted a Lincoln office hours to get tsunami labs steal secrets.

Gotcha Yeah, that's what we would be using if we had this problem at the moment.

Yeah, I remember looking at this.

Yeah, this is that same concept basically where you encrypt this data they provide a quiet tool cube zeal to make it very easy to encrypt the secrets with the public key of the cluster.

So you can just commit the sealed secret ports control.

And I'm done with it.

It's pretty hard to decrypt it.

And creates a regular secret inside the cluster.

So that's a good tip.

What's the use case for this.

Plus the same one.

Basically you need.

Let's say you need the database credentials for RDS or something and you're not using already as I am.

Well, how do those credentials get to the cluster.

So this allows you to encrypt a.

This is a custom resource definition of a type sealed secret that you have this encrypted data that looks like this.

So here are the credentials for my service.

And then you deploy this you deploy the operator inside a Kubernetes and that operator looks for this custom resource that looks like this.

And when it sees it it decrypt it.

And writes a secret called my secret into that namespace here.

So here's the output of that operator.

They created a new resource of type secret where it.

Here's the actual password decrypted from here.

So this is secure to put in get.

This is not security putting get that.

This is what you're asking.

Haitians need to consume it now.

This is would only be able to be leveraged with Kubernetes.

Yeah, this is going to be a specific solution.

But this is, again, another reason this is why Cuban enemies make so many media outlets easier at this point, if you're not already seriously thinking about, you know starting to move towards Kubernetes as a holistic approach to your entire business.

Thank you again.

Yeah It's like imagine that you could script this stuff.

You could orchestrate something like this, you could build something with an ensemble or whatever yada, yada, yada.

But whatever you build is going to be a snowflake that won't really work in anybody else's environment.

But when you deploy an operator like this this the operators designed to work with Kubernetes.

So if you had a Cuban ID and you like this strategy.

OK, let's just install it operator.

And now you get that functionality.

So ultimately extensible.

So the DOJ is using this color like like full scale.

50,000 users.

Well, that's well that's a vote of confidence right there.

Maybe But it's very good security, then.

In other words, it is the distinction that that PDF is the external to the cluster of the words the key.

It is it's the third.

It's the Sas service.

OK Yeah.

And the idea is that you basically get turnkey compliance.

This is you could say this is like Stripe so stripe all they do is credit cards.

Well, what if you wanted a service like Stripe that works with arbitrary data and any API endpoint.

That's yes.

So you don't have to worry about any PHI p.I. CHP any of that stuff.

If depending on how you implement your yes.

Wow nice.

So you basically get a token and they support being so your customers can submit that the data like through a form like what you have to the post go directly to.

Yes So you never even touch it.

You never even see it.

And then you get back API a token that represents that bucket of data.

You can now operate with in your icons.

So what's the story behind this Terraform operator from Pasha corp. Yeah, so we briefly just covered it.

Oh, did you.

I'm sorry.

I wasn't thinking about that.

Yeah wait a minute.

You missed a great shows epic full on demo all of the bells and whistles.

You're joking.

No, it's just really brief.

Honorable mention here that it only works the terror from cloud because honestly, the edge cases you've got to handle with Terraform I get it.

Why they made it that way, even though I personally wish it just worked with open source Ed.

There are three other operators at least or four other operators for Terraform other communities that I've seen.

But this coming from, how she caught I think this is going to be the one that ultimately wins unfortunately requires hashi she for state storage.

And it actually uses hashi courts remote run capabilities for all the planning apply.

And that's also how it works with approval steps and triggers and all the other functionality that tear from cloud brings.

Gotcha So it's a kind of a simplified one stop shop as long as you sell your soul to us.

Yep And as a company that's recently implemented Terraform cloud like functionality by using a generalized CCD platform in our case is code fresh.

It was Yeah, it was a lot of it was hard.

And I can't say that we solved it, especially as good as Jeff from cloud.

But what we did was get consistency.

So that it works like the rest of our pipelines and everything's under one system.

And we have full control over what that workflow looks like versus the opportunity to turn from cloud one thing I saw that came up.

There's a guy Sebastian is a friend of mine.

His company called scalar scaler has a there there.

The early product here implementing they reverse engineer the Terraform cloud API basically.

So you can use scalar as an alternative for Terraform cloud.

I think what I think is interesting about that is the day that Atlantis or some project like Atlantis comes out as a terror cell close to terror from cloud alternative why did I do this.

So you don't have to make a contract for project corp. or is it like you pay for content posted on prem or so from cloud or sorry.

Peter scalar.

So they have their own user base.

I can't speak to the business objectives per se, but what they're trying to do is build a build an alternative to terror from a cloud based more on open source solutions like I believe it works with.

OK open policy agents leave it allows bring your own container.

So that makes it compelling.

Since with terror from power you can't do that.

Also let's see here.

I think it's the Sas.

And where I think this is actually entirely self posted.

So I don't think this is this.

Us have gone into pricing at least.

OK So Yeah, I just know honorable mention that's what somebody else is doing out there.

There's also space as facelift that I owe but I don't think a space.

Lift does influence the Terraform cloud API.

OK Whew all right.

Well, we are overtime here.

So I think it's time we got to wrap up for today.

I appreciate your time.

I think we covered all the talking points.

So here's some next actions for you if you haven't joined the call before.

Make sure you take some of these including registering for our selecting Go to the cloud policy slack.

We also have a newsletter the first line previously, I've been able to send it out as regularly as I'd like but that's cloud posse newsletter.

So you can block off the time.

I think that's the best way to be a regular on this go to cloud plus office hours for that.

Did you ever miss an episode.

You can always go to cloud posse podcast.

They'll redirect you to podcasts a clown posse which is where we syndicate all of these office hours recording.

So if you want to catch up on past episodes we've talked about that's a great place to go.

Everyone is more than welcome them to add me Ellington so we can stay in touch.

I share a lot of interesting information on LinkedIn.

So it's a great way to stay in touch professionally speaking and if you're interested.

What was that.

Oh, sorry.

Just a question.

You post all these office hours.

They're all like summary, your descriptions.

So when you see like 30, 40 videos you're like, oh, yeah.

Yeah, that's a good point.

It's a labor of love.

I would love to summarize each of these videos and add that there.

I will say that what we are doing.

Do we have the automated machine transcription.

So if there's a topic you're interested in looking for you.

You can probably go to like you do site cloud posterior and you like blogs or you search for like Prometheus back.

And I think let's see if this works well fail here on this one anyway.

Well, they are searchable from the transcripts.

But it's not ideal.

I know it's not the answer.

You're looking for.

So thanks for that feedback, though.

I'll take that to heart and see if we can't start adding some show notes for each one of these sessions.

So therefore discoverable.

And then Yeah go to see accelerate or slash quiz if you want to find out what it's like working with cloud possum.

Thanks for your time, guys.

See you guys.

Same time, same place next week.

Take care what was the worst thing besides scaler or the last thing after scaler.

Oh, that was space.

Lift that I oh OK.

Thank you.

Now the outpost that in my office hours, of ice opening.

Adios thanks very much for answering my questions this evening.

Goodbye Yeah.

Come again.

Public “Office Hours” (2020-04-29)

Public “Office Hours” (2020-04-22)

Public “Office Hours” (2020-04-15)

Public “Office Hours” (2020-04-08)

Public “Office Hours” (2020-04-01)

Public “Office Hours” (2020-03-25)

Machine Generated Transcript