Public “Office Hours” (2020-02-21)

Erik OstermanOffice Hours

1 min read

Here's the recording from our DevOps “Office Hours” session on 2020-02-21.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show star.

Welcome to Office hours.

It's February 19 2020.

My name is Eric Osterman and I'll be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator.

We help startups own their infrastructure in record time by building it for you and then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to arm yourself at anytime if you want to jump in and participate.

If you're tuning in from our podcast or YouTube channel, you can register for these live and interactive sessions by going to cloud posse slash office hours.

We host these calls every week we automatically post recording of this session to the office hours channel as well as follow up with an email.

So you can share it with your team.

If you want to share something in private.

Just ask.

And we can temporarily temporarily suspend the recording.

With that said, let's kick it off.

So I got a couple of talking points lined up here today.

If there aren't any other questions.

First, some practical recommend nations where one factor out of cabinet is a checklist I put together because it comes up time and time again, when we're working with customers.

So I want to share that with you guys.

Get your feedback.

Did I miss anything.

Is anything unclear.

Be helpful.

Also John on the call has a demo.

A nice little productivity hack for multi-line editing in the US code.

I've already seen it is pretty sweet.

So I want to make sure you all have that your back pocket.

Also Zach shared an interesting thing today.

I hadn't seen before, but it's been around for a while.

It's called Goldilocks and it's a clever way of figuring out how to right size you Kubernetes pod.

So we'll talk about that a little bit.

I am always on the list.

But we never seem to get to it.

Are you hotel ops interview questions.

So Yeah conversation was valuable over that as well.

All right, so before we get into any of these talking points.

I just want to go over any or cover any of your questions that you might have using Clyde technology or just general related to DevOps I got a question.

All right.

Fire away in a scenario where you have multiple clusters.

And you may not be using the same versions across these things.

Let's say you're in a scenario maybe you're actually managing for claims.

How do you deal with the versions of cubes deal being able to switch between cluster versions, and so on.

If you're getting strategies out there that you can actually utilize.

Yeah, I'll be happy to help answer that before I do anybody have any tips on how they do it.

All right.

So what we've been doing, because this has come up recently specifically for that use case that you have.

So I know Dale you are already very familiar with geodesic and what we do with that.

And that solves one kind of problem.

But it doesn't solve all the problems.

So for example, like you said, you need different versions of cubed cuddle.

Well, OK.

So cute cuddle I want to talk about one specific nuance about that.

And that is that you've got a list designed to be backwards compatible with I believe the last three minor releases or something.

So there's generally no harm in at least keeping it upgraded or updated but that might not still be the answer you're looking for.

One thing you wanted to I know we had to address this situation separately.

So that's why what we do is under cloud posse packages.

So this is good outcomes.

Cloud passes large packages we distribute a few versions of cubed cuddled pin at A few minor releases.

Now we think we only have the latest three versions because we only started doing this relatively recently.

But the nuance of what we're doing with our packages is using the alternative system that leave data and originally came up with.

And it's also supported on alpine.

So basically what you can do is you can install all three versions of the package.

And then you point the alternative to the version that you want to use to run those commands.

So that's one option and that requires a little bit of familiarity with the alternative systems and how to switch between them.

I think there's actually a menu.

Now that you can choose which one you want.

But the other option, which we use for Terraform is where you will want to have a bunch of 0 to 11 compatible projects and some zero total compatible projects and you want to have you don't want to be switching between alternatives to run them.

So what we did there was we actually installed them into two different paths on the system.

So like user local Terraform zero 802.11 then.

And that's where we stick the triple binary for that.

And then zero then user local Terraform 0/0 to been slash Terraform that would be the binary there.

So just by changing the search path based on your working directory or where you're at and you can use that version of Terraform you want without changing any make files without changing any tooling like that that expects it just to be Terraform in your search path.

I can dive into either one of those explanations more if you want to see more details on that.

Is that along the lines if you're looking for are you looking for something else.

Yeah Well, so there is a tool that came across before I really got into Judy's which was called tips, which happens like a form where you can actually switch between the versions are getting good.

Yeah, I was hoping that there was something that I hadn't found that was similar to that I would do a switch.

Gotcha Yeah.

There could be.

Like is it.

I just have a thing for Terraform also you're working on.

Yeah, I know your question.

Dale was related to Cuba.

But I mean, I'm just speaking for me personally.

I don't like the end switchers like rb ns of the world, the CSSM like the TAF cover version managers and stuff like that for us.

But obviously, it's a very popular pattern.

So somebody else is probably a better answer one.

Yeah OK.

So what.

There's one other idea on that.

Or let me just clarify the reason why I haven't been fond of the version managers for software of the purpose built version managers like that you need an IRB in for a real Ruby or whatever.

And then you need a terrible man for Terraform and then you need a one for coop kubectl and all these things.

What I don't like about it is where does the madness stop.

Like we technically need to be versioning all of this stuff, and that's why I don't like those solutions.

That's why I like the alternative system because it's built into how the OS package management system works.

Do you foresee taking a similar approach for Terraform and Virgin switching with.

So you could do in the future.

Well, so well.

So yeah.

So sorry.

I kind of glossed over it or handwaving so I go to get a cloud policy packages nap.

Now I don't know if you're using alpine and alpine as I come a little bit less popular lately with all the bad press about performance.

So let's see.

So if we go into the vendor folder here.

So we literally have a package for a cube cut all one of 13 one to 14 1 to 15.

So we will continue doing that for other versions and the way it works with alternatives is basically this.

And you don't have to use alpine to do this.

This is supported on the other OS distributions too.

But is you have to install the alternatives system.

And then when you have that, you can switch.

So OK.

So this here shows how to set up the alternatives that you have.

Then there's another command you use to select between the versions available.

And once you've installed all the versions it's very easy to select between those.

I just don't remember what it is up to.

It feels like a fuzzy wuzzy thing.

Yeah, I think he uses FCF for whatever that is.

Yeah You mean in your screen.

Oh, yeah.

It was.

Thank you.

Thank you guys.

It's a public service announcement.

I always forget to share my dream.

And I talk and wave and point to things like, I am.

Yeah So this is.

Yeah So here in cloud posse packages under vendor.

We have all our packages here is where we distribute the minor pin versions of cubed cuddle and then we also do the same thing like for Terraform likes.

All right.

Good question.

Anybody else have any dads that have one more.

If no one else is going to ask.

Also just really reminded me.

But it doesn't look like Andrew Roth is he on the call.

This conversation because Andrew has a similar tool to dude exec that he's calling dad's garage and the conversation was.

So if you look in I forget what channel it was a release engineering.

If you look in the release engineering channel those conversations.

Yeah Next question.

Dale So has anyone actually successfully done any multiple arctic bills for Docker.

Let's say that one more time multi architecture builds.

Oh, interesting.

So you're saying like Doctor for.

Oh, but doctor for arm and doctor for EMT 64 something that correct.

Yeah Yeah.

I'd be curious about that.

No personal experience.

I've been trying to do something while do that kind of build for going to frustrate petrol stop.

I can say is this where your Raspberry Pi could bring it closer.

Yeah So I figured, well, this is part of it.

But I just came across it where I'm really into issues with my Raspberry Pi three beetles where it's using R these seven and a lot of the packages are built to support our V8 which is 64 get to be 70 is 32-bit.

And I've run into the issue where it just would not outright launch little begin.

Figured out it was Dr. architecture that was the issue where some sources will support it.

Some will not.

Unless you're planning to do your own build and all that you know.

So I started to dig into it a little bit more guys like a rabbit hole.

Yes, it was.

It certainly was.

Yeah All this stuff is pretty easy.

So long as you stay mainstream on our protection.

What everyone else is doing.

But as soon as you want to do a slight minor variation.

The scope explodes.

Yeah, I'm at a point where much of an upgrade to the raspberry force.

Yeah And Yeah not better, just forget about it.

No, that's right.

No, but I did learned something new.

So that helps.

Yeah Yeah.

Echo Cole.

Any other questions before we get to a demo.

John, do you want to get your demo setup.

And let me know when you're ready to do that.

Yes code demo.

I'm ready to roll.

You ready to roll.

OK, let's do it.

I'm going to hand over controls.

Go ahead.

Oh, OK.

So just a quick introduction to everyone here.

So John's been a longtime member of suite ops.

He's hard core on form and developer hacks and productivity.

He gave a demo last week that was pretty interesting.

Yeah, check it out.

That was on using Terraform cloud and the new trigger triggers that are available in Terraform cloud.

So having him back on today he's going to show a cool little hack on how you can do multi-line editing which isn't something I almost.

I've seen it before.

And I kind of forgot about it.

And it's something like you almost don't think I don't need that.

But when you see it, you'll see that you need it.

And want it.

And I actually did as a quick follow up to the Haskell demo last week for run triggers and I did talk with them today and they aren't looking at doing a few UI enhancements as noted it is a beta right now.

So it's not meant for production use.

But we did talk about some use cases and some potential UI improvements and things like that.

But they are definitely excited about the feature.

So let's go over.

We see some stuff going on there.

But as far as multi-line editing.

So I'm a pretty big on efficiency.

And I hate wasting time.

And so it always.

I was managing a group of developers.

And I would always be frustrated watching them edit something because I tell them, hey guys if we're doing Terraform let's say, for instance, to keep it relevant hey go ahead and put these outputs and let's do an output for, let's say all of the available options here.

Let's do an output floor for those.

And it would basically go something like this.

All right output something output.

If I could type.

Typing is hard.

Something else.

And so you're just going through and slowly typing all of these outputs, which are all basically, let's say outputs from the telephone documentation.

And so you know I started showing them these little performance tweaks performance hacks to kind of improve on that speed.

And so what I'll do here is actually take all of these arguments for EC2 let's say if we were wrapping an EC2 instance.

And we wanted to provide all of these outputs and this is generally in the case of I mean, this is right now in the case of Terraform but it could be anything.

So we have a couple sections here.

Note we don't really care about.

So I'm just going to highlight the note.

The colon and the space and use commands to actually get rid of those.

So now we're basically dealing with just outputs.

Now, the key to multiplying editing is to find something that's common.

So the shortcut by default in Visual Studio Code your shortcuts may be different.

To do a single multi-line selection is commanding.

And so if I just use Command B on a is going to go through and select every day if I use Command Shift l it's going to select every a in the left in the document that I'm currently editing.

So it's very important to find something as unique if I just do a you can see here like PBS dash optimized single tenant CBS backed.

That's not really going to give me what I'm looking for and what I'm really looking for is to take the left side of this as the label on the right side as the description.

And so I can kind of get a little better by selecting the space around the dashes but then I still end up getting because normally they do just optional, but in this case, they did optional dash.

So it kind of catches this one as a little bit off.

So when you can't find something that's unique across all of them.

One thing that every document has is Lindbergh's.

And so if you have basically everything on a single line you can find all the line breaks and just use that.

And in this case, I'm going to actually get rid of these empty line breaks here.

So we can just have a clean structure.

And so if I basically go to the end of the line shift right arrow Command Shift l and then go back with my left.

I'm basically editing every single line.

Now I'm just at the end of all the lines.

So now I can move around and operate as if I'm on a single line, but I'm editing every line.

So if I'm doing an input here I can just type as I normally would.

And at this point, I'm inputting all don't remember how many variables it was but all 30 or 40 variables here and now have all of my descriptions filled out.

So that thing we can do is we know that for, let's say all of these options we don't want to have that in the documentation, because we're going to automatically output that we can just highlight all of the options Command Shift they'll delete it.

And then we can give it some default value as well.

And so the key is actually just moving around with command to go to front or back of line option to go by word as opposed to going by individual characters.

Because if you see line six here.

I'm here on start.

But I hit Option right arrow the same amount of times.

But here.

I'm in a different position because the length of the description changes.

So if I need to get to the beginning or to the end.

I need to use my command keys in order to more quickly get to that space.

But this is a very quick way as you can see, even with all the talking I've been able to adding all up.

Add in all of these values pretty quickly edit them get them in a massage place.

And now I can actually go in and tweak anything that I need to tweak.

There are times like this where you're not going to have as clean of a set of data.

I guess you could say so you may have to come in and make some manual adjustments or something like that, just to get this cleaned up.

But that's it.

Any questions around that.

I'm happy to answer.

But this is I just found us being something very important when I started creating modules and especially when I needed to output a whole bunch of outputs or have a whole bunch of inputs in that position.

Yeah, that was really helpful.

I've had that problem with outputs especially when you're writing modules and you want to basically proxy all the outputs from some UPS the upstream resource back out of your module.

I think this would save a lot of time.

Any questions, guys.

So it was good.

I've seen is a pretty common trend.

Like if you want to output multiple attributes of a single resource people define them individually.

Something I started doing was just outputting the entire resource itself.

Loving the end user choose from that map.

They want what they kind of want to interact with.

So is that a bad practice.

I know I've really seen it in public modules.

But I try to BBC I pretty much as output the entire VPC resource.

And then you can access it with dot notation just like you would.

The actual resource Yeah.

So you're on a terror form.

I want to correct.

So this was not something we could do in.

And so you had to output.

Every single one of these.

I switched also to doing a little bit more of just return the resource.

It's a lot better.

It's a lot easier to manage and maintain on that front.

But I guess there are some cases where you may want to adjust those things like if you have a case where you need to do.

Let's say out module.

Let's say you have an AWS instance, and you have multiple of those.

Something like that.

You may not want to output that as an idea and you may be using count just to hide it.

So this is where you kind of get into that place of doing these sort of things to kind of bring them into one output.

So I think there are some cases where you may want to do some of these custom outputs.

But yes, in general, I do like that you can return side got x I have underlined that question.

So does anyone know how can we do count dot index when we actually put resources for loop.

There's actually a few ways with a for loop it kind of depends on your scenario.

But there's a few different approaches.

I think actually the documentation is generally pretty good here on that.

So if I can find it real quick always go to a few different spots here.

Might be under expressions.

Finding it is always difficult for me.

Here anybody remember where it is it says four and discourage.

I think that's how you do the final.

Yeah So my question here is you know why I want to use for ages.

You know when you are using count.

And then you more do things maybe more the index I find on a list or maybe remove some items count actually messes up the whole sequence part.

So if you're using four for each eye creates you know results for each game.

Each item separately.

So it's kind of a set of the things or maybe a list of the things now.

But I still want to utilize the index update and had you know say, for example, if there is an area on the list, which has five items in it.

I want to trade on each of the index like zero 1, 2, 3, 4.

So I know what we were able to do it with the count.

I also do to find a way to do that similar end of stuff for each yet.

So I mean, the for each returns a map.

So what you could do is I think that use the values function to just access.

And it basically that will return a list of the values and kind of strip out the keys and then from there, you can column by index.

Oh, OK.

I tested out this page here actually goes through.

There was a place where they actually kind of gave a pretty solid breakdown.

Yeah right here.

And I linked to this in the chat but it's a pretty solid breakdown of when and why you could want to use one over the other.

And it kind of shows you like when to use count y you want to use count.

In this case, you know it's something that is kind of case by case.

Right you can use count for certain things.

But then there's certain things you may want to use a for each loop for.

So I would say it's case by case.

It really depends.

But I do agree with no one that in essence, I think everybody's use to count from 0 1 1.

So a lot of the times when you're transitioning and you want to use for each but then other resource references the original once you kind of get this mismatch.

So where possible.

I've been trying to convert from count to four each and use the keys instead of the indexes indices.

So I think it's just the problem occurs when you had previous code from 0 1 1 where you already have all the logic with contemplates and then switching it over is just an extra pain right.

Yeah So for that, you need to do a lot of DST DMB we trillion trillion more those resources and make that the data file and all graceful another question and maybe you can answer since you did a lot of modules.

So I was working on one.

One project for you know setting of the priorities of the rules and then is not to prioritise that dynamic solely.

If, for example, if we have a beautiful which deletes or maybe destroys and recreated the next room takes a bright step and then actually messes up the whole thing.

So where do you go to find a workaround for that kind of scenario.

I would have to defer that question to Andre who's been doing more of the low level Terraform stuff on like the modules and things like that.

Also, we are doing a lot more just I mean 99 percent of what we do as a company is uncovering eddies and therefore, like the bee rules is not something we deal with.

We just use ingress.

I do know that some of the stuff got a lot easier with zero 12 passing max around.

But I can't I can't give you a helpful answer right now if you ping Andre on the Terraform jam channel a that's at a k NYSE h he'll be able to answer that.

All right.

Thanks, Adam carolla.

I missed the question, but what was the question about the.

Yeah, John probably know the answer.

Yep So the question is for you know setting up the priorities of the rules in the application load balancer.

So when you have defined priorities for you know there'll be like lateral Terraform resource.

And then if there is you know if there is an instance where that rule.

Yes Yes regarding that I mean guess false destroyed if you're changing something.

The next rule takes the priority of it.

And then the whole sequence gets messed up.

So you know I was just looking to get there you know freeze to way where things remain as it is if at all.

If there is a requirement of regulating that list that rule and next rule doesn't get the priority that, yes, it is you actually can control that priority on the configuration.

So there's a priority.

Yeah So I am using that.

However, if there is an instance where you know by chance if that list never requires to get recreate it by, changing some settings, which requires us to get false destroying the default behavior of four it'll be easy.

You know, if I leave if, at least, the rule is deleted it actually changes the priority of all the rules which are coming next to that.

So if you are say, for example, at least that ruling with a priority of number five guess where you created all the rules after a number five will be unless you've had one.

One step about like six would become five 7 would become 6 and 8 would become seven.

I actually haven't seen that.

But up that may be because I always use factors of 10.

So I don't put them one by one.

I'll do like 01 10 20 up to whatever.

Yeah but.

But that may be why I haven't seen that actually.

But in all the primaries, we said we never I never seen them reset once their form applied again.

They were good to go.

Yeah, I play is good.

But if at all.

There is an instance where the results gets recreated that's where it starts freaking out.

So you know there is if you're not, you're not there.

There is a limit of number of rules that you can have and want to be.

And they will still really legacy application of the initial application, which required us to ensure that a lot or the Beatles.

And this is where I actually was able to replicate this issue interesting with how are you.

First of all, are you using you.

You wouldn't happen to be using the cloud posse AWS Albie ingress Dude, are you ingress.

No this is just simple it works on easy is not going to be noticed.

Well, yeah.

So the dad does a little bit of the confusing thing.

So what we did when we were designing our ADA are easy s modules and our modules for Albie we kind.

We made the opinion we took we had the opinion of making them feel like the Kubernetes in an interface for ingress.

So we created a module called from Adobe yes hb ingress which is which defines a rule.

So that it works similar to defining an English pressing Cuban entity even though you're defining a rule for an elite and using, say yes for example.

So when you do it this way, and you're defining a module explicitly for each rule.

And I it's been working for us.

But I can't say just like what John said, if we've encountered this thing where the priorities get re numbered or something like that hasn't been reported yet to us.

But the different part of what I want to make sure was that you weren't using something like account together with defining your rules.

So you know we do have some dynamic parts, which also involves doing this all also, again, again, a reference to the question that I asked prior to that by using for each and.

Yeah Yeah, that makes sense.

So you usually you're not emotionally mentioned the list, and then it starts freaking out.

I think that the problem.

Therefore, you have is probably more related to the module or the way you're doing the Terraform maybe than some fundamental limitation of like LP resource with themselves.

Yeah Yeah.

Because we actually hit an issue where sometimes if we have to recreate a target group like it if you change the name switching from one of the cloud posse modules, there was like a fundamental name change of target groups.

And so to delete sometimes they don't delete properly, they kind of hold on there and you so you can't believe it's hard to do the best thing use.

So we did a lot of manual deleting Warren just at this last week actually.

So we did a lot of manual deleting and I never saw priorities change all the priorities were stayed in line.

Yeah So target group is key.

But priorities only change if you know change the path or something like condition or something where that's where the listener role gets created.

Yeah but once you give it a priority number it should stay there.

I would Yeah.

But I want to I still want to fix the priorities for because those rules that dynamic.

So if you had a part, which is coming from a variable, which is a list.

Then you know you had another.

But then it really, again, speed it up a new listening to the next priority.

And then becomes harder to maintain.

Yeah So maybe look at adjusting your list and utilise a map or something to where you can define like a set priority for those that way when you loop over it.

It's a set priority every single time, no matter if you move it up and down in your list or something else, make it explicit.

Right OK.

All right.

Let's are there any other questions.

So high.

I have a question for my use cases like I kind of want to generate a report like based on like underutilized easy to incent and over Italy's ancient and automated to send the reports to an email.

Like if the incentives are running less than 10% I would like to see the instance list.

And if it's more than 80% I'd like to see the list of incentives.

So I just looking for ideas to achieve this catch up.

So the idea that somebody here might have a suggestion for that we work with mostly just infrastructure managed under Kubernetes and so we're not really concerned about any one instance, in and of itself.

From a reporting perspective.

But the.

Yeah anybody familiar with some tools to generate the reports that he's looking for AWS has the costing users report.

That's kind of built in.

I don't know about the request of emailing but I know that's in there.

Mm-hmm I could turn it into s.a. you could probably do that for sure.

Put this on the track.

It is probably the utilized utilization for some period of time right.

Yeah some period of time, like a week few days like the one of us.

Yeah somebody else comes to mind just feel free to share that either in this chat here or in office hours.

There are also I mean Yeah.

So like on talk of like there's the trusted advisor stuff, which will make similar kinds of recommendations.

And then there's I think they're SaaS services as well.

But I don't think that was necessarily what you're looking for.

Does this space for everything right outsourced totally outsourced go.

Yeah Any other questions.

I sent all right.

Well, let's cover a couple of the talking points here that came up.

Sure I know.

I definitely recognize a few people in the call here using Kubernetes a lot.

So I think this Goldilocks is a will be a welcome utility that was shared Goldilocks we haven't they I only learned of it today what I thought was interesting was how they went about implementing it.

So behind the scenes it's using the vertical pod auto scaler in just notification mode or in like debug mode.

So that it's not actually auto scaling your pods but it's making the recommendations for what your pods should be.

And then they slap a UI on that.

And here's what you get.

So I think why this is really valuable is.

Well, if you saw the viral video that went around this past week about how they dubbed it that famous video with Hitler reprimanding his troops.

Well, they took that, and they said that they parody the situation about running companies and production without setting your limits and how asinine that is.

Well, it's unfortunately, it's a pretty common thing.

So helping your developers and your teams know what the resort what the appropriate limits for your pods should be is an important thing.

So this here is a tool to make that easier just building that.

Like when I joined this team that was an issue as well.

I guess at the same time, companies were getting to companies in an early stage all the experience was not there.

And we will find that over time.

We have a lot of evictions going on within the cluster because we didn't have any resource limits set for these things.

So we actually had to go through the process of evaluating which why this kind of tool.

I didn't know existed until after questions.

Well came about which would really help because we actually implemented data to help us give us some color back in terms of what was going on.

And then started to send more traffic to see what 40 points are to determine that.

So I figured that everyone had their own methodology to determine what those resource limits were going to be as well.

And I saw this question for this flip was posted some equation in terms of how to determine the ratio of limits and you know that it was this whole thing so much looking forward to using this tool to make a determination as well.

Yeah, I think that's basically been what everyone's been doing is looking at either their Prometheus or looking at their graph on us or data dogs and determining those limits, which is fine.

What I like about this is that it just dumbs it down to the point where literally copied copy and paste this and stick it in your home values or you know if you're doing raw resources stick it in there and you're good to go.

There are a couple other tools I've seen specific.

So the concept here is right sizing, so right sizing your pods.

So if you're using ocean by spot in Pets.com so ocean is a controller for Kubernetes to right size your nodes.

Ocean also provides a dashboard to help you right size the workloads.

The pots in there.

So that's one option.

The other option is if you have cube cost cube.

Cost is a open source or open, core kind of product that will make recommendations like similar to trusted advisor.

But for Kubernetes and it also gives suggestions on how to right size your pots.

All right then let's see what was the other thing I add here.

So I asked with some feedback got it got a lot of great conversations going on in the general channel.

If you check yesterday related to this.

We're working on an engagement right now and the customer asks so what kind of apps are should we tackle first for migration to Kubernetes.

And it's a common question that we cover and all our engagement.

So I thought I just kind of whip it up and distill it to the characteristics of the ideal app.

I like to say that pretty much anything can run a Kubernetes the constructs are there the primitives are there to sort.

So to say lift and shift traditional workloads.

But traditional workloads aren't going to get the maximum benefit inside of Kubernetes like the ability to work with the horizontal pod out of sailors and use deployments and stuff like that.

So here are some considerations that I jotted down using one factor as the pattern.

So the total pull factor app.

I'm sure you've been working with this stuff for some time.

You've seen those recommendations before.

It looks something like this.

In my mind that they're slightly dated in terms of the terms as it relates to Kubernetes and it's a little bit academic.

So if we look at the explanations for how these things can work.

So OK.

So first of all, this is very opinionated just talking about like specifically Ruby and using gem files.

But let's generalize that right.

So the concept here is if we're talking about Kubernetes is we still want to pin our software releases.

But we also can just generalize that say distribute a darker and Docker file that has all your dependencies there in and into releases.

So that's kind of what I've done here in this link here.

And it's a check list.

So if you can go through and identify an at the check stuff all of these.

It's a perfect candidate to tackle first in terms of migration.

I like to say move your easiest apps first.

Don't move your hardest apps until you build all that operational competence.

So working down on the list here.

Let's see here.

So there's a little bit opinionated.

But we really feel like Polly repos now are so much easier to work with and deploy it with the pipelines to Cooper and Eddie.

So that's why we recommended working with that using obviously get based workflows.

So this is that you have a pull request review approval process.

So that you're not editing m.

Believe it or not, some companies do that.

We don't want that.

And then automated tests.

So if we want to have any type of continuous part of our delivery.

We need to make sure we have foundational tests.

Moving on then to dependencies that things that your services depend on are explicit.

One thing that we see far too often are hard coded host names in source code.

That's really bad.

We got to get those expected either into a configuration file or ideally environment variables like environment variables because those are a first class citizen.

And it is very easy to later services being loosely coupled.

This is that your services can start in any order.

Your your API service must be able to start even if your database isn't yet online.

It's frequently an older pattern where well, if the API can't connect to the database.

Well, then the API just exits and crashes.

And this can create a startup storm where your processes are constantly in a crash loop and things only start to settle once the back office kick in and your applications stop thrashing so much so hiccups.

So would that then be more on the developing team to ensure that kind of control.

Yeah So the idea is here that if you feel like this is not a hard requirements list.

But these are like, well, oh, if this jogs a memory.

Yeah, that's how our application works, then what we ask is that they change the way their application works.

So that it doesn't have these limitations.

So much actually working with a client that actually they got wind of your list, but it took your list the opposite.

Hopefully not on all the points, but most of them.

Oh no.

In fact, they're still using confusion.

Oh stop right there.

Take me back there.

Yeah Yeah.

Yeah Yeah.

So it sounds like you have a big project ahead of you, which might be changing some of the engineering norms that they've adopted over the years, we've been pretty lucky that most of the customers, like a lot of this stuff is intuitive for that we're like stuff they've already been practicing.

But then there would be like one or two things here.

There that are used like like the one thing that we get bit by all the time is like their application might be talking to S3 and then their application has some custom configuration for getting the access key and the secret access key.

So that precludes us using all the awesomeness of data.

Yes SDK for automatically discovering these things if you set up the environment correctly.

So having them undo those kinds of things would be another use case.

My main point would kind of going over these things would be to jog any reactions like anybody is totally against some of these recommendations or if any of these are controversial or anything that I've missed.

So please chime in if that's the case Banking services ideal services are totally stateless that is that you can kind of offload the durability to some other service some other service that's hopefully not running inside of humanity.

So of course, it's not saying the communities is not good for running stateful services.

It is it's just a lot larger scope right.

Managing a database inside of companies versus using art yes let's see there.

Anything else to call out.

This is a common thing.

Applications don't properly handle sig term.

So your apps want to exit when they receive that gracefully.

Some apps just ignore it.

And then what happens is ease waits until the grace period expires and then just you know speed kills your process the hard way, which we want to avoid.

Sticky sessions.

Let's get rid of those.

Oh, yes.

And this one here.

This is it.

This is surprisingly common, actually.

So we're all for feature flags.

We recommend feature flags all the time.

They can be implemented in different ways using environment variables is an awesome way of doing it.

But making your feature flags based on the environment or the stage you're operating in is a shortcut but not really an effective use of feature flags because you can't test that function.

You shouldn't you shouldn't be changing the environment in staging to test the feature.

You should be enabling or disabling the feature itself.

So that's a pet peeve of mine here and related to that.

It's also not using hardcoded host names even configurable host names in your applications.

That's that if you're running Kubernetes that's really the job of your ingress and your application should not be aware here or even care where you even care.

Exactly Yeah.

Alex siegmund posts in chat here.

See you mentioned on court finding that you should listen on non privileged ports.

But what's the harm of having your application math for 80 for example, if it's a website.

So the harm is really that.

Well, the only way you can listen on port 80 and you're inside your app is if your app starts as root.

And then that's contradictory to saying that you should run non privileged process.

So things that you should not be running your processes as root.

Yes, your container can be as rude.

Your application can start up as rude.

It combined to the port as rude and then it can drop permissions.

But so often things do not drop permissions and then you'll have all these services unnecessarily running as root.

And just for the vanity of the service running on a classical port like 80 is it really required.

And that's not the case.

So exactly so Alex says, aha.

So that's why you should run is not inside of the container.

Exactly So that's our point there.

I think a lot of times what I see is like when the doctor finds that you created because it works as a route.

No one if he goes back and say, all right, let me have a run as a privilege to use those.

So it actually works that way.

And they may hit upon button like the road blocks and don't get to work.

Probably not merely because of a lack of knowledge as to how to do that.

And then say, hey, you know, it works.

Let's just leave it.

And then it just goes out there.

And then something happens, and then that pattern gets replicated right because somebody says, oh, how do I deploy a new service.

Oh, just go look at this other people.

And then they copy paste that stuff over and then it replicates and quickly becomes the norm in the organization.

Yeah Yeah.

So logs.

This is nice to have.

I mean, it's not a hard requirement.

But it really does make a log.

Once you have all your logs centralized in a system like Cuban or any of the modern log systems like simple logic or Splunk is that if your logs are structure you're going to be able to develop much more powerful reports.

And most if you're using any language or framework they a lot of them support changing the way your events are emitted these days.

So no reason not to.

The most important thing is just don't write to this because we don't want to have to start doing log rotation and you can if quotas aren't set up correctly, which they usually aren't.

You can fill up the disk on the host OS.

So let's not do that.

Lastly this is another common problem.

I see is migrations run as part of the startup process when the app starts up.

So if you have a distributed environment you're running 25 3,100 pods or whatever, you don't want all of them trying to run a migration you want to, you want that to be an explicit step a separate step possibly run as a upgrade step in your health release or some pipeline process in your continuous delivery pipeline.

So making sure you can run your database migrations as a separate container or as a separate entry point when you start that container is important.

Same thing with Cron jobs.

We know those can be run in a specific way under communities.

So be nice if those are separate container and any sort of batch processing as well.

So the basics here is that these can run as jobs versus everything else should be either deployments or stateful sets or other primitives like a recommendation registered domain.

Put this under oh the register to the.

Exactly I do that.

Yeah, I should do that.

It's a good list.

So if you have any other recommendations.

Feel free to slack those to me.

I'll update this list.

What's your thought on the read only file system.

You know, that's good.

But you know it's probably good default. It's an optimization.

You know, I make a point here that file system access is OK.

It just shouldn't be.

It should be used for buffering or it should be used for caching or things like that.

But it shouldn't be used to persist data.

So perhaps.

Yeah having it scrapped to your point at your root file system should probably be configured at deployment to be read only.

But then provide a scratch space that is right.

I think I'll take a list and sentiment learned.

So I can.

I follow this.

There you go.

Let me follow up.

Let me know how that conversation runs or goes well.

So we've got five minutes left.

That's not enough time to really cover too much else.

Did this jog any memories or any other questions that you guys have something I know.

Yes, I have one for Kubernetes.

What's your experience with this to you.

I see a lot of hype around it.

Oh so I was actually, hypothetically you know thinking of a solution, which we can make where you can look up the geographic location.

And then we can do it again.

And it releases like release this particular piece to India.

And then the rest of the world to see if there was.

Yeah So any means.

How was holistic business, then.

Yeah And I'm sure there were others here will have some insights on that as well.

I'll share kind of my two bit cents on it.

My my biggest regret with this deal is that its CEO is in a first class cape as it is that service mesh functionality is in a first class thing inside a Kubernetes that we have to be deploying this seemingly high overhead of sidecar cars automatically to all our containers when they go out.

That said, the pattern is really required to do some of the more advanced things when you're running microservices for example, the releases that you're talking about here.

So the thing is that Kubernetes is by itself.

The primitives with ingress and services are perfect for a deal for deploying one app.

But then how do you want.

How you create this abstraction right for routing traffic between two apps that provide the same functionality.

And then track traffic shaping between them or when you run that really complex microservices architecture.

How do you get all that tracing between your apps.

And this are the stories that are lacking in Cuba and 80s out of the box.

So I think service meshes are more and more an eventual requirement as the company reaches sophistication in its utilization of communities.

But I would not recommend using a service mesh out of the box until you can appreciate that the primitives that communities ask.

It's kind of one of these things, we need when people start on E. Yes, I think that's great.

I mean, I personally don't like.

Yes, we spent a lot of time with these yes and I think ECM kind of shows the possibilities of what a container management platform can do.

And then once you've been using.

Yes with Terraform or CloudFormation for six months or a year you start to realize some things that you want to do are really hard.

And then you look over at companies and you get those things out of the box and really.

So this is how I kind of look at service measures.

I think you should start with the primitives that you get on communities.

And then once you realize the things that you're trying to do that are really hard to do.

And that the service mesh will solve.

That's the right time to start.

It's better, to grow into that need than try to just as Eric said just have a litter box and then try to use it for a problem that you don't currently have.

Yeah, I agree on that.

So Glu t.k. price supports for these two out of the box that you know eakins was becoming a big supporter of steel and you know looking at some stats that I've seen out of all of the profit, which goes to Kubernetes 19% of traffic is being sold by service measures to you.

So I don't know what the sample size.

But this isn't what I have seen the random numbers here and there.

So I agree on that.

I have done just for deployment in the Cuban 90s.

And I use traffic for that for it to be flight gross controller and it looks just like you know it wasn't a good fit for that particular deployment, but I don't think you will fly a big requirement of where you know you want to deploy a really complex solution for Ken everyday lives is mostly because that's where I see these two, you can bring in the.

So it can.

It's one way of solving that right.

The other one is to use a rich feature flags by buy something like launch darkly or flag or a couple of these up.

There's a couple of open source alternatives and in that model it requires a change kind of a little bit in your development model, and it requires knowledge of how to effectively use feature flags in a safe way.

But I think what I like about it is it puts the control back in your court versus offloading it to the service mesh.

It's kind of like the service mesh is fixing a software problem while the feature flags are fixing it with software.

Yeah Yeah like we had are quite similar scenario about what we did was to fix it with a feature flex.

So that actually took the model of actually shaping the development to suit that scenario where we could actually roll up features to selective users within our testing group.

And then they would actually do what you need to do from there.

And then we can actually break up for the next couple of weeks.

And so on.

Yep plus you get the benefit of immediate rollbacks right now you can just disable it immediately just by flipping the flag off.

Exactly So go.

So I think it.

I don't think there's a black and white answer on it.

I think that some change to the organization.

Some change the release process making smaller, more frequent changes all these things should be adopted including like trust based development.

Yeah, I agree.

Because I think introducing to their birth their idea behind future flags gives a bit more overhead on the internet operations team and taking it away from the developers themselves.

So unless you're going to train your developers to use steel to actually integrate those feature flights into the deployment.

So I think you have a better chance of using that tool darkly.

I'm starting to do that.

Yeah Yeah.

Good luck setting up the app with all this stuff in the Minikube and doing appropriate tests all that stuff for your development.

I want to know how it goes.

I still want to find some time to get my hands dirty beyond that.

But I may be doing it in a month or something.

How does seem that long term upkeep for that to be a headache.

Yeah All right, guys.

When we reach the end of the hour here we got to wrap things up.

So thank you so much for attending.

Remember, these office hours are hosted weekly.

You can register for the next one.

Bye going to cloud posse slash office hours.

Thanks again for sharing everyone.

Thank you, job for the live demo there.

Yes code was really awesome.

A recording of this call is going to be posted to the office hours channel and syndicated to our podcast at podcast.asco.org clown posse.

See you guys next week same time, same place.

Hey Eric can I ask you a quick question.

Yeah, sure.

So I just ran into an issue like 10 minutes ago and thought hate office hours is happening at clock.

All right, let's.

I do got to run to a pediatrician appointment right away.

So let's let's see if I can spot instances have you managed to deploy a target spot task with Terraform so.

Yes So we use that we have it.

We have an example.

I'm not going to say that it's a good example for you to start out with doing it.

But we have a Terraform.

Yes Yes.

Yes Atlantis module in there.

We're deploying Atlantis as you see as Fargate task using our library of Terraform modules on the plus side, it's a great example of showing you how to compose a lot of modules.

It's a great example of showing you that modules are composed of all.

And it's advanced example.

This example is exactly why I don't like you.

Yes And Terraform they don't go together with you there and we think we have a pretty advanced infrastructure.

It's ready in Fargo.

We're just going to go to Target spots.

So what do you see that module is called.

Oh stop.

Sorry Yeah.

No, no, no, no, no, no.

I don't have spot far gates Spock.

No worries, no worries.

We're going to an area that's interesting to provide provider strategy and see if we can make it do something.

But no worries.

Figured I'd just check.

Yeah, thanks for bringing it up, though.

So misunderstood.

Absolutely All right.

Take care.

But

Public “Office Hours” (2020-02-12)

Erik OstermanOffice Hours

1 min read

Here's the recording from our DevOps “Office Hours” session on 2020-02-12.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours is February 12th 2020.

My name is Eric Ostrom and I'll be leading the conversation.

I'm the CEO and founder of cloud policy.

We are a DevOps accelerator.

We help startups own their infrastructure in record time by building it for you and then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to unmute yourself at any time if you want to jump in and participate if you're tuning in from our podcast or YouTube channel, you can register for these live and interactive sessions just by going to cloud posterior slash office hours again, cloud posse slash office hours.

We host these calls every week will automatically post a recording of this session to the office hours channel as well as follow up with an email.

So you can share with your team.

If you want to share something in private just ask him could temporarily suspend the recording.

With that said, let's kick things off.

So here are some talking points that we can cover to get the conversation going.

Obviously first, I want to first cover any of your questions.

So some things that came across or came up in the past week since we had our last call.

Terraform cloud.

Now supports triggers across workspaces.

John just shared that this morning.

I'll talk about that a little bit.

The new ADA of US clay is available with no more Python dependencies.

However, I'm still not celebrating it entirely based on my initial review.

Also this is really wise is quote that was simply put in our community yesterday or some things like you can't commit to the overhead required to run something you're introducing a weakness into the system rather than a strength as they'll quickly end up in the critical path.

So that was the way that Chris Child's said something and I want to talk about that some more.

See what reactions we get.

But before we go into those things.

Let's see what questions you guys have.

I have one thing when you're going through tariff for Terraform cloud can you also go through just your general experiences with Oprah.

And I were looking at using it earlier this week or just having a little bit of some pain doing so.

Yeah, just some general experience that would be useful.

I can give you some kind of like armchair review of Terraform cloud.

We are not using it in production as part of any customer engagements we've done our own prototypes and pieces.

So I think the best thing would be when we get to that point if the other people on the call that are actually doing it day to day.

I know John bland has been doing a lot better from cloud.

I don't let me paint him on suite ops.

Let's see if he can join and share some of its experiences.

Do you guys know if you can continue using remote state with S3 with Graham cloud.

I couldn't figure out how to do that.

Well, you should be able to let me explain.

Mm-hmm It's a good question.

But IM not 100 percent of what you put into it.

So yeah.

So I Yeah, I cannot speak from experience trying to do it.

What were your problems when you tried.

I mean, I assume you had the best credentials and everything hardcoded.

And if you had that provider block or that back in Setup set up it was airing or it requires that validates that you have a from Workspace back in.

I personally came to find the place where you can even put Intel from cloud the crates.

I would be in environments settings.

So there's the build up using it as an environment variable.

I know by guy I exactly you have to do that for every single workspace yet as retarded as it is.

Exactly we don't like that either.

No awesome.

John's joining us right now.

So John has spent a lot of time with Terraform cloud.

So he can probably answer some of these questions or you and Mark.

Welcome howdy.

I is going to have you mark have you gone to play with Terry from cloud at all yet.

No, I haven't even browsed the docs.

OK Just curious.

But Brian, your.

You've been dabbling with turn from cloud a little bit or.

Yeah, just taking it out.

It was because I was working on data from provision provisioning of my EFS housing on a kill two birds with one stone.

Yeah And dabbled with it didn't love it.

So I probably am just going to do my on provisioning code fresh.

Yeah, it's a little bit more intuitive for me, especially because I used her from CLI workspaces.

Yeah, it it'll be a lot easier for me to implement something that's driving reusable if I were to just do like a cut.

I could fetch that already does the right like workspaces commands for me.

Yeah, I'm hoping that maybe in a couple of weeks or a few weeks, maybe we can do a revised code fresh Terraform demo on this.

We did one about a year ago or more.

But this time Germany on my team is working on it.

And we want to kind of recreate some of the constructs of Terraform cloud.

But inside a code fresh.

So that it integrates more with all the other pipelines and workflows we have there.

On the topic of terror from I would want it, I would want it.

So like I go back and forth on my decision to use Terraform workspaces.

I love the fact that it was so easy to use the same configuration code for so many different environments and I've been able to take advantage of that.

Why didn't love was having to kind of act together way to get all the back end to point to different S3 buckets and different AWS accounts.

I'm curious if anybody's ever worked with that.

Plus if they ever switched off of it to go the other route where we kind of have configuration per her database account that might be less dry.

Such using tigre.

OK, let's.

Yeah OK, let's table that temporarily.

I see John just joined us here.

Let's start with the first question there on first hand accounts and experiences using Terraform cloud.

I know John has spent a lot of time with it.

So I'd love to for you guys to hear from him.

He's also a great speaker.

Well, thank you.

I've seen the check in the mail.

I actually I've done a lot.

Whatever form cloud as the primary c.I. for all of our Terraform and generally, I like it mainly because it's a little malleable like you can use it for like the whole Ci aspect or you can run your c.I.

I mean, you're Terraform in fresh air anywhere.

And it's just your back.

And that's it.

Instead of having S3 buckets everywhere all your state is just stored.

They're really easy to do remote data things that I saw terraforming it is really easy to do.

They have a provider that gives you poor access.

And I did see on the agenda there the talking points the workspace, the run triggers actually did it video or that be wasn't that only to already.

Well, yeah, except now.

Yeah, I've wanted to just play with it non-zero models were recorded and it's decent.

I think they have some improvements to do to be able to visualize it.

But we actually do utilize I forget who it was speaking Brian.

I think we do utilize multiple AWS environments and from our Terraform scripts where we set it up.

We actually have each workspace control or we tell each workspace, which the environment is going to use.

Now this is using access key and secretly preferably we'd have something a little more cleaner that was a lot more secure than just having stale access fees sitting around.

So that's one gripe I didn't have with it.

But in general, we've had a lot of success running it in Terraform cloud water.

OK So you're leveraging like the Terraform cloud.

I don't want to say it's like a feature chair from cloud.

But the best practice a chair from cloud using workspaces or using lots of workspaces and terrified and how has that been.

Because while workspaces has existed for some time it wasn't previously recommended as a pattern for separating you know multiple stages like production versus dev.

How's that working out.

It actually worked out really well, because locally where you set it up, you can set up locally.

Sure Yeah.

So I mean, the reflection here once again.

There you go.

There is a difference between tech from cloud workspaces and from CLI workspace stuff.

Yeah, sure.

So the this is just my little sample account that I play around with the tutorials.

But if this was set up locally in the CLI and because this prefix is the same.

I can set my prefix locally and my local workspaces would be called integer and separated.

And so locally.

It maps directly to the local CLI actually.

So I can say Terraform workspace, select integer.

And now I'm on that.

And I can see a plan and it'll run not playing right writing on Terraform cloud.

I don't have to have variables or nothing locally.

It'll run everything in there unless it's set up as a local because you have multiple settings here.

Would you be able to do that right now.

Sorry to put you on the spot.

This is exactly what we're trying to do.

And if you're saying that it's actually much easier than I initially thought then I might reinvestigate this.

Let's try it.

But thanks for roll.

It's lights up for everyone else.

Maybe if you just joined.

John has been put on the spot here to do a unplanned for demo of Terraform cloud and workspaces.

Possibly even the new beta triggers function.

But on a different sort as it's set up there and working.

I can definitely walk through the triggers.

Peter would be especially useful for us as well, because there are scenarios where we run multiple turns from the place where you can sleep to reports, a serious long shows.

And we can also get close to Yum It want to have like five minutes and then we can talk about some other things and come right back to this.

Yeah let me get a few of my things to worry about just the connection and all that sort of stuff.

Yeah Cool.

Let's do that.

And we'll just keep the conversation going on, other things.

See what we can talk about there.

All right.

Any other questions.

All right.

So I guess I'm going to skip the Terraform cloud talking point about triggers across workspaces.

I think that's going to be really awesome to get a demo.

Basically to set that up as you decompose your terror lists into multiple projects.

How do you still kind of get that same experience where once you apply changes in one environment can trigger changes in another environment.

And that's what these triggers are now for moving on AWS has announced this week, that there's a new clay available.

I'm not sure how new it is per se, but they are providing a Binary Release of this clay.

I suppose it's still probably in Python they're just compiling it to write code might.

The downside from when I was looking at it is it's not just a single binary, you can go download somewhere there's still like AWS clay installer.

So they're following like the Java pattern right where you still got to download zips and sell stuff.

Personally, I just I've gotten so spoiled by go projects which distribute single self-contained binary.

And I just download that from get up to this page.

And I'm set to go.

So has anybody given this new client, a shot.

Now you mark calling you out.

All right.

Don't buy it yet.

Cool And then there was one other thing that came up this week as somebody was asking kind of like you know I think the question the background question was like alternatives to running bolt and if it's worth it to run bolt and Chris Chris files responded quite succinctly so thank you.

We've heard this said before, but I thought this is a really succinct way of putting it.

And that's like if you can't commit to the overhead required to run some new technology like Cuba and 80s balls or console you're introducing a weakness into the system rather than a strength as the quickly end up in the critical path of your operations.

And I think this really resonated with me, especially since we run this DevOps accelerator and our whole premise is that we want our customers to run in and take ownership of their infrastructure.

But if they don't understand what they have, and what they're running, then it is a massive liability at the same time, which is why we only work with customers ultimately that have some in-house expertise to take over this stuff.

And also Alex just Yeah getting some thumbs up here from Alex Eagleman and Bryan side both agreeing with this statement actually.

Yeah, that actual response to the that's what's the response to something I mentioned.

So the original person asking a question that came about came at then and also get.

From what I understand.

And I just thought that maybe I'd remind them you know like maybe want to just give centralized management a shot the volt really is going to want to sound like I'm pushing it that much.

But the reality is if you look at a hash record that created Terraform created volt they make most of their money from both.

They really do put a lot of product hours into featuring sorry.

Why didn't the feature set that product.

So really, it's a mature solution.

Yeah 100 percent when it comes to houses response.

It's very true.

What happens is actually, a lot of the time is if you don't commit a lot of people they like they take the route token, and then they distribute it to everybody and it becomes more of a security hold than a security feature really.

Yeah And really, it's reminiscent of terminators as well.

In my opinion, we really need like a large team of people putting energy into that to actually make full use of it.

So it's not a burn anymore.

It's actually something that can help you pick up velocity exactly like you want to take these things when it gives you an advantage a competitive advantage for your business or the problems you're trying to solve.

Not just because it's a cool toy or sounds interesting, but Yeah, those are really good summary.

Thank you for it.

For a peer.

Just secrets management.

I would.

Probably doesn't have all the bells and whistles of a vault obviously.

But I went with it to be a secret manager.

Or you can use parameters store does it much easier to maintain.

Yeah Are you making copious use of the lambdas as well with Secrets Manager to do automatic rotations.

Not yet by but definitely something that I wish I had time for.

Yeah, because it also requires application changes right to get all right, John, are you.

You need some more time.

No really.

All right.

Awesome Let's get the show started.

So this is going to be a tear from cloud demo and possibly a demo of triggers across workspaces, which is a better feature and terrifying cloud.

So this isn't going to actually give me a plan, because I don't have the actual code for these repos locally on this computer.

But this is the time to show how the workspaces actually work.

So essentially, you set up your back in as remote hostname.

You don't really have to have it.

That's the default. What organization and in this case, I'm saying prefix.

So if I actually change this in the say name to random.

And if I did in a net on this.

It no.

Yes, I need to.

Yes, there because I already initialize.

That's why so by setting the name essentially, it's supposed to.

What did I miss.

They weren't just a minute ago, I promise.

It's always this way.

Let me clean up this dirt one.

But by saying a name I kind of found this by accident.

I didn't mean for it to do it.

But they go it'll actually create the workspace for you.

So you technically don't have to do anything to create a workspace.

It'll do it for you.

In that case, it doesn't give all the configuration there.

But if you utilize prefix here instead of name just wipe it.

What it does is basically create to Terraform cloud and says, hey, everything with this as a prefix is going to be my workspaces.

So in this case, I can say, let me select the integer workspace.

That's awesome.

And so if I do a workspace listed, you'll see that same list there.

And then you can do select separator any one of those.

You can also.

And I'd have an alias for Terraform by the way.

So that's why I'm just saying to you.

You can also get your state.

Of course, if you have access to that.

So we can say show we can pull that locally.

And so it'll output the actual state here.

And then my favorite part is actually planning.

So I don't have any variables everything.

Mind you this workspace doesn't have a lot anyway.

But it's actually running this plan on Terraform cloud.

It's piping the same output.

It's common just like you would normally expect.

So it's piping everything to my local CLI here.

But for console.

But it's basically this.

So you can see the output matches.

But the beauty of this part is I can have all of my variables in here completely hidden.

Any secrets that I want and none of my developers ever see them.

They never know that they exist or anything locally, but they can play an all day and do whatever they want.

And so this is destroying because I don't have the code.

So it's like, well, it's gone.

This random resource integer, but that's the quick run through of utilizing those workspaces locally.

I mean, it's really just this.

And I have a tee up bar set with my token locally.

So that's I have that work to also go back to the tower from cloud UI that we had the settings.

The variables because just because it came out the second a little while ago.

Brian was asking about environment variables you see there that bottom.

Brian Yeah.

If you need obvious credentials you can stick with me.

OK You're not using a dubious provider right now.

No, I'm not going to put them off.

Yeah, no.

OK And nothing precludes you from using the obvious provider.

So long as you still provide the credentials.

Yeah Yeah.

So you know your random workspace.

I got created.

Do you have to go.

Do you manually go in and add the eight of his grades for those I actually Terraform the entire thing.

So that's all done through Terraform so basically terraforming Terraform cloud.

So essentially I'll generate a workspace and that'll help my general settings there, and then I'll just you the last two or three variables.

And you can do environment variables this way too.

So you can kind of tie this in too with like your token refreshes and things of that sort.

Especially those I've mentioned or.

So there's the ball provider and you can actually tap into a ball here actually gets you a token key from your AWS or however you want to do your authentication there get your token from AWS your access key, et cetera.

And then plug that into Terraform cloud.

So that way it's all automated.

And you're not just wasting variables in there.

Do you have to run your own vault or do they run that for you know you could if you're running your own.

So I would assume assume someone doesn't have all.

How do you plumb anybody's credentials and or ask.

Yes token generation in.

So we just came.

So just utilize came to mark them all as you like.

You don't want to put that stuff in code right.

And so use came as encrypt the values manually.

We built a little internal tool to do it.

But encrypt those values put them in code and then once the workspace actually runs, it'll actually create manage update all the other workspaces.

So in essence, you have one workspace that has all the references to all the other work spaces are supposed to create and it'll configure everything.

And so in that one, it'll decrypted came and then add it to the project or the specific workspace as a environment variable.

And so there is when you say came as you're using SFA we do use that system to store that the product of the commercial kitchen blob no.

Now we just encrypt the value in games.

But we actually we actually use it as a Sim for farm gate secrets.

But this repo here is where I actually have a video of it where I kind of walk through how to do the full thing with Terraform your own workspace and then using their remote data as well to pull from.

And so the pipeline feature that was playing with earlier essentially this repo.

I mean, this workspace is going to trigger this one it's going to trigger this.

And so the way it's set up and they definitely say do not use this in production yet.

But these run triggers here.

So you can specify all your workspaces that you want to actually trigger something here.

And so anytime they trigger oh it's a loop because I already have that one set anytime they trigger they will actually trigger the next one.

When we delete these real quick, and I'll just show a click Run.

And so if I cue this one where it finally kicks off.

There we go.

So that's going to go through the plan.

And this is just going to generate a random integer.

It has an output and all it does is output the results of random integer.

And so once it finishes the plan is actually going to show me which or any workspaces that it will trigger next.

In this case random separator so if I go ahead and Confirm and so this one is applying if I come over to random separator nothing's happening here.

I'm not I haven't hit anything haven't pressing buttons.

This appliance finished and there's random separator that was triggered automatically.

And so you can see like it's essentially going to go down the line there.

The good thing is it will tell you here that the run was triggered from this workspace.

And it was triggered from that run.

So you can kind of rabbit hole your way backwards into finding where and what actually triggered that one.

And so if I confirm and apply on this one that someone is actually going to trigger the last one, which is random pat.

Now, pull up the code real quick as well so that when finished and random pat is here.

And there's random pit running.

Quick, quick question.

Do you guys ever use it because I know what the VCR is integrations.

You can actually kick off a plan and apply from GitHub for example together is that.

Yes exclusively.

Yeah And so this is these repos are actually tied up to get up as well.

You do the confirmed circuit collaborative send them to the UI here do it through the UI, you could tie it in like there's a CSI you can tie it in and do it through any c really.

And so there's the end of the pipeline.

But as you can kind of tail like you will rabbit hole right like you're here.

And then it's like, well, was generated from here and you go back there and it's like, well, one was generated from another one.

And so then you end up having to go back there.

So a good visual tool would be really useful.

Jenkins blue ocean or something where or circle C I kind of chose you the path of something would be really useful, but it's kind of interesting.

I'm sure that's coming.

Yeah So this code just to show this real quick isn't using the remote state.

And so I set up variables manually for this demo, but utilizing these variables.

And it just uses the remote state data to get the value from the integer workspace.

And then the pet basically uses to remote datas to get the workspace state for both the separator and the integer workspace and then it just uses it down here in the random bit.

So it's decent.

I'm liking where they're going.

Yeah, I think this has some potential, especially to minimize the configuration drift and simplify the effort of ensuring that change is promoted through multiple workspaces.

And the good thing is like if you saw on separator I actually had to confirm and of course, that depends on your settings.

Of course.

Because you can tell it if you want to auto apply or manual apply in this case, I'm set to make makes sense.

But it goes auto.

They would have just cheered it all the way down the rewind as many.

And so the practicality of it is like if you separate your networking stack from your application and you update your networking stack and for whatever reason, it needs to run the application form as well.

You can kind of automate that now as opposed to where you ran that one.

Now the one person in the company that knows the order thing can go in and manually hit q on something else.

So yeah.

So I think there was some questions in the chat here.

Let's see.

Alex Sieckmann asks, how do you handle the chicken and the egg problem with bootstrapping saying AWS account and then Terraform enterprise to have creds and such.

Actually a good question.

So it would have to come from some somewhere right.

So like especially if you set up like AWS Organizations.

And you had like your root account that you were set with you can utilize that Reed account.

And you can actually Terraform it obvious orbs and then once that new account is set up, you can assume role and those sort of things in order to access that other client.

I mean that other account.

But there is still some manual aspects of that right.

Like you have to search your email address and then that email address is your root account.

And then you want to kind of lock that down.

So you can do use like some service control policies and things of that sort.

But there's still a little bit of a manual piece to bootstrap a full account.

That's the part that really sucks and we go through this in every engagement right.

Because if you don't reset the password and have MFA added anybody with access to email of that root account of that sub account or for that matter can do a password reset.

And take over the account.

Yeah, exactly.

So there was a question about automated destruction.

So terrifying cloud actually requires you to set confirm destroy.

So what.

So if you do automate confirm destroy set to 1, then yes, you can.

You can delete from trip cloud.

But you can't cure or destroy unless you actually have that environment variable z So you can set it like and have it as a part of your workspace and then you destroy will actually destroy it.

Nice cool.

But yes, that aspect of the chicken and egg is something that is definitely something that could be cleaned up on either side, just to help the bootstrapping especially for the clients that have like 78 IBS accounts.

Yeah which isn't as abnormal as it sounds it's one enterprise.

Yeah Any other questions related to Terraform cloud and put this in queue.

Not sure cost value.

Opinions now it's way better than before.

It was rough like multiple Tens of thousands of dollars for the Enterprise version.

And so now it's actually to where you can basically sign up and utilize it now for free.

You have to keep in mind that it is still a subset of different features.

But it is really good.

And it is.

Obviously, if you're a small team and you don't have $100,000 to spend that yet.

But I figure that it is a subset.

Yep And so the main things that you do miss cost estimation is actually pretty cool.

It will tell you if you're starting up like the T3 micro how much that's going to cost or involve large it'll kind of give you those pieces as an estimate.

But it kind of helps.

And you can utilize sentinel which is basically poly policy as code utilized sentinel and say no one can create a project.

If the cost is over $1,000 or whatever.

Or you can say, hey notify somebody or whatever already requires approval.

And so syncing it was actually pretty useful.

And then, of course, you get the normal sample.

And this is the private install a small sample clustering as you go up.

But you can.

But funny how it goes from unlimited worth.

Everything else is unlimited and unlimited workspaces the enterprise no matter limited anymore just 100 plus.

But I mean, really, this free up to five users is pretty much all you really need unless you are on a larger team than movie roles and the role basically plan read right now and admin support.

But the private registry is actually pretty cool too.

I think, as I said, as a profession need to push back on enterprises that try to make you pay for security here the MLS ISO is pretty much the only thing controlling the keys to your castle and I don't think that it's right for people to hold security as a tool for making money.

Yeah, that's like always there 1, 2 right.

Yeah, but I hope we get the industry aligned with security as a first printing like the first class it isn't all products, not just for if you're willing to pay.

Yeah, there were other.

We've shared this before like the SSL attacks website, go to ss no doubt tax, then it says it all.

Yet it's funny.

It's the wall of shame and the price increases with pleasure.

Areas things I need to add terrible glad to know.

Exactly base price pressure.

So price.

It's just insane.

The gouging that goes on.

Look at HubSpot my Jesus.

63 percent increase 586 call us.

Well, you want to factor that's going to cost you.

Yeah Yeah.

Two factors another.

Yeah Well, I mean that comes usually with whatever you picked as you say.

So Right but I don't just a cloud offer to factor.

It does.

Yeah So I have one set up here just normal.

I use all the networks, but then again, I'm also I have my day jobs account on theirs.

And it's paid.

So maybe that's good.

Yeah, maybe that's where it comes from maybe that's not over for.

Any other questions on cloud for a small team.

Do you guys think $7 a month is worth it for just set to no.

It depends what kind of roles do you have in place for like your instruction.

My team is a team of one right now.

So there is no like actual rules automated but obviously being proactive about it.

Just when I'm speaking of infrastructure.

But as the team grows.

And I think we're growing our security function here too.

I think a lot of security engineers I'm talking to, they're doing it manually where they go into your database console and like check it you know your last $3 are public.

I was saying like we could automate this with a sentinel.

I was curious if I do say a team of five is $7 a month worth it if you have those sort of rules in place.

Yes Yeah.

This is basically like a pre-emptive you can choose to block or you can just walk.

And so in this case, it was essentially a function and they're adding to this resources and they end up pulling it back.

Right And so you can basically take those things.

And as you validate them you can give specific messages that you want.

And basically say yay or nay if it's approved and it'll basically block the run.

So it is a good way to catch it ahead of time.

And you can catch some of those things.

Another thing that you can do as a team of security.

We talked about open policy agent integration with Terraform that can also do some of this stuff and also someone else recommended comm test, which is built on top of.

OK and add support for HCl and Terraform plans as well.

Yeah, there's a little library that's like a Lancer as well to offset this one.

And it's pretty decent too.

And I can catch like $3 an HTTP where so they should be.

Yes And it also provides a way here to where you can take these rules and you can actually ignore it for like a specific line like if you want an ingress here and you don't care about this.

This rule here.

It's just it's a requirement.

You have to have it mean you can ignore it.

But you can tie this directly and with see I can just run to you set up for locally with Docker and so I would probably start there as opposed to going to sentinel because then you do have to manage quite well you have to write the Central Policy you need to manage that.

And then you assume a lot of that risk at that point to you know all you have to develop all those opinions on what you mean.

Right well that makes a lot of sense though, when you have like cyclops that focus on that if you're the 119 and suddenly just adds to your plate.

Can you share this link to get up 45 seconds and office hours channel and the episodes after shooting for.

And let's see.

So we got 15 minutes left or so 15.

There were some other questions here unrelated to Terraform cloud as one to see if we can get to that.

Alex, do you still want to talk about this your Prometheus question.

See I can't.

He is chatting in the Zoom chat.

Looks like Zac helped you out with the answer a little bit.

I guess I'll just read it for everyone else's benefit.

How do you know.

Let's see.

Assuming you have the Prometheus already running the Prometheus operator and you run gipsy deal get Prometheus all names faces you'd set up a Service Monitor.

Oh, this is from Zach.

I did not.

Yeah, I gave him like 1,000 foot view of an answer for how to set up a Service Monitor for Prometheus a custom service running in a cluster.

I wouldn't have answers so quickly if I weren't your candidate the exact same moment.

That's cool.

Yeah, maybe we can.

The essence, you don't have to make today.

Let's punt on the question to next week if you're able to join us and we can talk more about service mind stuff.

I'm curious if anyone else is using anything other than custom rules for you know, if there's any other tooling out there for Service Monitoring or adding you know people have multiple teams, multiple microservices and you know if there's any organizational strategies around tooling.

This in a declarative manner any I can answer how we do it.

But I'm interested also first before I talk about what other people are doing.

We've talked about just monitoring the individual services oh Yeah.

Just Prometheus right.

Hangs in multiple services and you know there could be a thumb roll.

You know that some of them come and go ensuring that generic monitoring gets put in place and teams that they want to put extra and additional monitoring and you know, for items you know that those are also able to be deployed.

Yeah I'm just really struggling with the getting a good template going I thought.

Yeah Are you using helm your.

I am.

Yeah then sorry I missed my computer's not responsive here.

So then yeah, I can kind of show you because this came up recently, for example, with century.

That's the good example, my ship is going to invest one but I'll show it.

So we've talked about in the past that we use this chart called the chart that we developed.

Zach, are you familiar with the chart.

Dude I am so familiar with the model chart.

OK created my own version of it.

So yeah.

Well, thank you.

Yeah So the pattern there that we have.

And then are you familiar with the service monitors that we like the Prometheus findings that we have in the model chart, you know I probably should go revisit that on that.

Honestly, I haven't looked at it.

OK So I will give an example of that here and I'm getting it cued up in my browser.

So let me rephrase the question or let me rephrase.

But let me restate the question and add some additional context.

So in my own words, I think what you're describing is how do you offload the burden of how a application is monitored to the application developers themselves or the teams at least responsible for that service.

In the old school model it kind of be like employer your services, and then you throw it over the fence to ops and say, hey, I was deployed.

Update now those are some archaic system like that.

And monitoring and that never worked well.

And it's like very much like this data center mentality static infrastructure.

And then you have a different model, which is kind of like an Datadog where it will maybe auto discover some of the things running there and figure out a strategy for monitoring it, which is magical but it isn't very scalable right.

Magic doesn't scale.

So you want something that allows configuration, but also doesn't bottleneck on some team to roll that stuff out.

So this is why I think Prometheus operators pretty rad.

Because you can deploy your monitoring configuration alongside your apps themselves.

So we had this just came up kind of like what you said Zack about you were just actively working on this other problem that Alex heads.

That's why I was fresh in your memory.

So this is something that we did yesterday, actually.

So we run a century on prem.

We've had some issues lately with century stop ingesting new events while everything seems totally normal.

So it's passing health checks.

Everything's running everything's green and hunky Dory.

But we wanted to catch the situation where it stopped processing events.

So at the bottom here, we've added using the motto chart, for example, we don't have to create a separate release for this technique but we're doing that here and using them on a chart.

What we do is then we add the Prometheus rules.

So we can monitor that the rate or the delta here of jobs started in five minutes over a five minute period is not zero or in this case is 0.

So that's when we alert k minus 1.

My point here.

Those So let's see are we using mono chart to deploy this.

Do you have something that keeps a baseline level of jobs starting a busy cluster a busy environment.

So like is this generalizable no.

But in this environment.

So here's the thing.

Oh, I think it's generalizable because you could make that Cron job you know it does.

So in our case, we have century Kubernetes deployed.

So we have a pod inside the cluster that is ingesting all of the events from the Kubernetes API and sending those to century.

So you could say that we just buy it by having that installed.

We have our own event generator because Kubernetes is always doing something right.

So we ran when we ran this query we side identified the two times over the past month that had the outage.

So we deployed it, and went live with that.

But I just want it.

So this mono chart though, is this pattern where you can define one chart that describes the interface of a Mike or service in your organization.

This happens to be ours that we use in customer engagements.

But you can add you can forget or you can create your own that does the same kind of thing.

And let me go over to our charts here and see Mike in more a different example that we have.

So here's an example, like a simple example of deploying an engine x container using our motto chart.

And the idea is that, what does everything you deploy to coordinate his needs.

Well, it needs.

Well, OK, if you're pulling private images you're going to need maybe possibly you'll need a secret.

So we define a way of doing port secrets.

Everything is going to need an image.

So we define a way of specifying the image.

Most things are going to need config maps.

We define a simple way of having consistent config maps and all of these things are a lot more terse than writing the raw Kubernetes resources.

But you can also then start adding other things in here like then we provide a simple way of defining infinity rules.

So you can specify an inline affinity rule, which is very verbose like this, or you can just use one of the kind of the macros the place holder ones that we define here should be on different node.

And this is an example of how you can kind of create a library of common kinds of alerts that we deploy.

Now I'm talking.

I'm conflating two things affinity rules with alerts.

I just happen to have an example here of infinity in helm files here to share your screen.

Oh my god.

I can do that again.

But I'm just always used to having my screen shared.

So I. Yeah So sorry.

OK So this makes it a little less handwaving then by seeing my screen here.

Here's what I had open, which was just an example of using our monocytes chart to define the Prometheus rules to alert on centuries.

So here at saying century job started minus a century job started five minutes ago.

And if that's zero we weren't on.

So mono chart itself.

Here are using Monroe chart just to define some rules.

But Monroe chart allows you to define your deployment.

So here's where deploying engine x we're setting some config map values we're setting some secret some environment variables.

Here's the definition of the deployment.

But we also, unfortunately, we don't have adjacent schema spec for this yet.

So you kind of got look at our examples of how we use Monroe chart.

And that's a drawback if I search for this here that we'll find a better example.

Who's so for example, I'm not sure Calhoun is going to help fly the where we use Monroe chart frequently is a lot of upstream charts that we depend on.

Don't always provide the resources we need.

So then we can use Monroe chart much like we used the rod chart to define the rules.

So here is where we're deploying Cam for a k I.

This is a controller that pulls metrics out of k I am and sends them to Prometheus.

So somebody provided a container for us.

But the chart was apart.

So we just used our monocytes chart instead.

So here we.

Define a bunch of service RGB monitor rules to monitor.

In this case kIm so this is complicated like using the raw expressions for Prometheus but I don't want to say that like in your case.

Zack what I would do is I would define canned policies like this that you can enable in your chart for four typical types of services.

OK So that is the route I'm going.

And so with the sanity check means I'm not going the wrong round.

I know it's just it seems like a lot of work.

It is.

But the thing is like so.

But nothing else does.

There's no one else is doing this.

So this is like I say, I don't see.

I haven't seen any other option out there, aside from magical auto discovery of things running for monitoring this thing where applications, deploy their own configuration for monitoring very.

I don't know of any Sas product that does that.

And it's very specific to the team and organization and the labeling that you have in place.

Yeah So.

All right.

Well, I mean, did the model chart is the right route in my mind as well.

I've been going that route.

I call it chart architecture.

Yeah but I'm using that to do a bunch of other deployments and not forget to microservice.

So this will be rolled into it.

So thank you for the answer.

Yeah, no, I want to just add one other thing that came up just for to help contrast the significance of what we're showing here is yes, this stuff is a bit messy.

I wish this could be cleaned up.

And it wasn't as dense, but when you compare this to like let's say, Datadog and Datadog has an API.

There's a careful provider for Datadog.

But I would say that's the classic way of setting a monitoring.

It's a tad better than using nodules because there's an API, you can use Terraform but it's not much better than using nodules because there's still this thing where you deploy your app and then this other thing has to run to configure monitoring for that versus what we're saying here is we deploy the app and the monitoring of the app as one package to Kubernetes using.

Well, we're almost out of time today.

Are there any last thoughts or questions related to perhaps as Prometheus stuff.

I didn't check if you posted anything else here.

Alex and Chad thank you for the Terraform cloud demo.

Thank you.

That was all a demo.

Thanks, man.

No problem.

Was month month this year.

I think it's half on sales next month.

The lies and generalizations which can be hard.

I suppose for most HP REST API.

You could do some kind of anomaly detection or basic five minute alerts but there is Yeah, there's not a general, there's no general metrics across all kinds of services.

So Yeah, that's right, Alex.

So that's what all these other services do like data dogs this thing is they'll provide you some good general kinds of alerts but nothing purpose built for your app.

All right then let's see.

I'm just going to be up closing.

Slide here.

Well well there you go.

There's my secret sauce.

That's what we're doing here.

We're at the end of the hour.

There are some links for you guys to check out.

You enjoyed office hours today.

Go ahead join our slack team if you haven't already joined that.

You can go to a cloud posse slash slack you sign up for a newsletter by going to cloud posse slash newsletter.

If you ever get registered for office hours.

Definitely go there and sign up.

So you get the calendar invite for future sessions.

We post these every single Wednesday.

We syndicate these to our podcasts.

If you go to cloud policy slash podcast.

You can find out the links where you can subscribe to this.

Like iTunes or whatever podcast player use connect with me on LinkedIn.

And thanks again for.

Yeah, for all your input and participation area.

This is awesome.

What makes meet UPS possible.

Thank you, job for that presentation.

And I'll see you guys all in the hall next week.

Take care.

Thank you guys.

Thank you.

Public “Office Hours” (2020-02-05)

Erik OstermanOffice Hours

1 min read

Here's the recording from our DevOps “Office Hours” session on 2020-02-05.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours.

It's February 5th 2020.

My name is Eric Osterman and I'll be meeting conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator.

We help startups own their infrastructure by building it for you and then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to unleash yourself at any time if you want to jump in and participate.

If you're tuning in from our podcast or on our YouTube channel, you can register for these live and interactive sessions by going to cloud pop second slash office hours.

We all these we host these calls every week and will automatically post a video recording of this session to the office hours channel as well as follow up with an email.

So you can share it with your team.

If you want to share something in private.

Just ask.

And we can temporarily suspend the recording.

And with that said, let's kick it off.

So I don't have really any demos prepared for today or more interesting talking points.

One of the things that we didn't get to cover on the last call or maybe what some of the interesting DevOps interview questions are what are some of your favorites.

Interestingly enough.

This has come up a couple of times lately in the jobs channel either people going in for hiring or people using some of these questions in their recent interviews.

So I'd like to hear that when Brian Tye who's a regular meals.

He hasn't joined yet.

So we'll probably cover that as soon as he joins.

He has been working on adopting Prometheus with Griffin on EFS but he has a very unique use case compared to others.

He does ephemeral blue green clusters for their environments and he's had a challenge using the CSS provision or on ephemeral clusters.

So you need a different strategy.

So he's got to talk about what that strategy could look like.

But I'm going to wait until they grow quicker.

Ender I work with Brian, I'll grab him and love.

Now OK, awesome.

Thanks Oh, yeah Andrea.

All right.

In the meantime, let's turn mic over.

Anybody else have some questions something interesting.

They're working on something, I want to share.

It's really open ended.

Well, I'm putting together.

I've been doing this as a skunk works at the office for the last couple of months.

But I'm putting together like a showcase of open source DevOps.

They don't have to be open source, but they have UPS tools.

OK So if anybody wants to anybody has something they want to contribute or any experiments they want to run or anything like that.

There Welcome to do that.

Cool can you just mention that in the office hours channel so that they know how to read you so Mike going to put you on the spot here.

What are you working on.

How'd you end up ending up in arms.

Yeah, so I'm we.

My companies recently started using Terraform and we found that cloud policy posi templates to be very helpful, especially when learning at all.

And so I just came here to kind of find out some practices beyond what's available online and a bunch of us on our team actually read the Terraform up and running the second edition.

Yes, that's a good one.

Yeah Yeah.

So we're just trying and more media specifically under trying to find best practices.

And I guess one of my questions was going to be, how does everybody kind of lay out their terror Terraform like I understand the concept of modules being reusable but next step is like defining different environments.

So like we're going to be using separate AWS accounts for different environments.

And so just wanted to get to more expert advice from you guys are just also just learn more about DevOps in general.

Yeah, it's a broad definition.

Nobody has an accurate their own.

Everybody has their own definition of DevOps.

So Yeah, it's loaded.

Really we don't even have to talk about that.

I go there to block all.

So Yeah, it's a hot topic.

I'm sure a lot of people here you can share kind of what their structure is.

There's no like canonical source of truth and there's been I would also like to say that there has been a pretty standardized convention for how layout projects for Terraform that I think is being challenged with the release of Terraform cloud.

And what I want to point out here is that.

So they actually caught dogs and gotten a lot better, especially like becoming a little bit more opinionated or idiomatic on how to arrange things.

And that's a good thing because they're the maintainers of.

So one of things I came across a few months ago when I was looking to explain this to a customer was, what the different strategies for.

And here, they kind of lay out, for example, the common repo architectures as my mouse pings and I can't scroll down.

This is the problem of screen sharing with do it.

All right.

I'll wait for that to wake up and continue talking what it lays out here is basically three strategies.

One is the strategy basically that hashi corp. is now recommending which you can briefly see at the top of my screen here, which is using them on our people or using kind of poorly mono repos.

What I mean by that is maybe breaking it out.

Maybe by a purpose.

So maybe you have one repository for networking one repository for some teams project or something like that.

But what you do is you don't have environment.

Let me see if I can.

Everyone Waking up as my mouse is going crazy there you go.

So multiple workspaces for people.

So what.

What has she caught.

But started doing was saying, hey, let's use work the workspaces concept for environments.

Even though originally, they said in their documentation don't use workspaces this way.

So I don't know if you saw it anywhere but now they've done a mea culpa on that an about face and we've been taking a second look at this.

And I think that it's making projects a lot easier actually for developers to understand when you just have the code and you just have the configuration and you don't conflate the two.

So the other approach is like more traditional and development, you have maybe a branch for each environment.

There's a controversial thing as well.

I don't like it because you have long live branches and keeping them in sync and merge conflicts and managing that they don't diverge is not.

It still takes extra effort.

Also, if you're an organization that leaves and trunk based development, then the branch mile won't long based long live branches like this isn't ideal.

And then there's the directory structure.

This is what's going to get to.

It's like this has been the canonical way of organizing Terraform projects maybe in large part because you know grunt works has a lot of influence in this area and with the usage of terror grants and tools like that.

This has been a good way of doing it, but there's some problems with this.

So what I like about it.

First of all, as you have the separate modules folder right in the modules folder has your what we call route modules that top level implications.

And those are reusable.

That's like your service catalog the developers.

And then you have your environments here and those kind of describe everything that you have for production everything that you have for staging and like these might be broken out more like you should have project folders enterprise.

You wouldn't have them.

This would be considered like a monolithic project or terror list.

You don't want triplets right.

So you underpriced you maybe have VPC you'd have maybe chaos and you'd have maybe some other project or service that you have.

And then under there all the Terraform code.

The problem is that these things still end up diverging because you have to remember to open for requests to promote those changes to all to all the environments, or you have to create one heavy.

Pull request that modifies all those environments.

This has been a big pain for us.

Even at cloud passes.

So we started off with an interesting approach a cloud posse which is, which is to organize accounts treat accounts like applications.

And in that case when I say accounts.

I mean, Amazon accounts.

So like the root account or your master account is one configuration you have your staging configuration, your core configuration, your data configuration.

And what I love about this is you have like a strict shared nothing approach.

Even the good history shares nothing and you share nothing has kind of been this holy grail to reduce the blast radius.

The other things like when your web hooks fire.

They only fire on the dev account.

And because we have these strict partitions there's no accidental mistakes of updating the wrong environment.

And every change is explicit.

Now there is a great quote in a podcast that just came out the other week, whatever the change log and they interview Kelsey Hightower on like I think the top was like the future is model.

And this is the constant battle of bundling and unbundling and bundling and unbundling is like basically, I guess you get anywhere you go to consolidate and then expand.

And then you expand and you realize that that can work well you consolidate and you expand it and you say so.

But my point here is more like one of the things he said, and that was like the problem with microprocessors is that it requires the developers to have a certain level of rigor.

I'm paraphrasing my own words.

It's asking it in an organization that wasn't practicing it before.

So how are they going to get it right.

This time by moving to microservices.

I want to find the exact quote somewhere maybe somebody can post office hours if they have it handy.

But that was it.

So that's the thing here.

What I describe here.

This is beautiful.

And when you have a well oiled machine that is excellent at keeping track and promoting changes to all your environments and no change is left behind, then it works well, but this is an unrealistic expectation.

So that's why I'm we're considering the Hasse corp. recommended this practice now under using multiple workspaces repo and under this model when you open up a pull request kind of what you see is OK, here's what's going to happen in production staging and anywhere else you're using that code because inadvertently you might have drift either from know human monkey patching going into the console or that maybe applies failed in some environments.

And that was never addressed.

So now you have diverged that way.

That's a valid error or maybe you have other Terraform code that is imported to state or something.

And it's manipulating those same resources.

There's been bugs and Terraform dividers and all that stuff.

So I want to see when I open up a board with what's going to happen to all those environments that that's what I really like about this workspace approach.

That's the opinion we keep getting like we'll see people that broke it out in a similar way that I can do Adam cat home experts where you'll have like the different end of US accounts and then it's free and it's interesting to me that you're like, well, now we're going to try this multiple workspaces so tight.

Where do I go.

Where do you go from here.

And you're right to feel frustrated and you know the ecosystem is in frustration towards you.

No no I know, just in general.

I would say like you know in software development more or less the best practices for how to do your pipeline some promote code is very well understood.

And we're trying to adapt some of those same things to infrastructure.

The problem is that we're different.

We're operating at different points in the software development lifecycle and the capabilities at our disposal are different.

So let's take, for example, infrastructure infrastructure.

But if you listen to this podcast you talk to us.

I always talk about this.

It's like the four layers of infrastructure you got your foundational infrastructure you shared services.

So you've got your platform and then on top of your platform you've got your shared services.

And then the last layer is your applications, most people working with this stuff.

Assume layers one through 3 exists and don't have to worry about it.

It's a separation of concerns.

All they care about is deploying their app.

They don't have any sympathy or empathy for how you manage Layers 1 through 3.

But if you're in our industry, that's one of the biggest problems is that we have to maintain stability of the platform while also providing the ability to run multiple environments for developers and stuff like that.

So my reason for bringing up all of this is that like Terraform as a tool doesn't work like deploying your standard microservice your standard go rails node app or whatever.

Like if you to play.

Note app, and it fails, you can roll back or you didn't even need to expose that error at all because your health checks caught it and you're running parallel versions of all that all that's offshore et cetera when you're dealing with infrastructure and using a tool like Terraform, it's a lot more like doing database migrations without transactions.

So there is no recourse.

So how do you operate when you're dealing with this really critical stuff in a place where you have poor recourse.

So you need good processes for that.

And let's see here.

So that's why we're still trying to figure this out.

I think as relates to DevOps what is the best course of action.

There's been Atlantis.

There's been Terraform cloud a lot of people are using Jenkins to roll this out.

Some people combine things like Spinnaker to it.

You can orchestrate the whole workflow, because the problem is you're standing CCD systems and the answer for this.

I don't think.

I don't think there is AI don't think anybody's just nailed it yet.

We're all so that it's a fair question if you go back to your cat home experts and you can click on one of those accounts.

I'm just curious now are you still referring to the cloud as opposed to modules or do you build modules within here.

No It's really all just config.

Both both.

So let me describe that a little bit more of what our strategy has been kind of like food for thought.

So So here I have one of these eight US accounts and under control, we have the projects.

So this is how we had kind of like the microservices of Terraform.

So it's not a monolith it's micro certain microservice architecture.

So to say, if I look in at any one of these we've been using a pattern similar to what like you'd get with grant works, which is where you have a centralized module repository.

So here we have the Terraform route modules and cloud posse.

So where you pull those root modules from really doesn't matter like you could have a centralized service catalog at cloud posse which works well for our customers because their customers are implementing.

I mean, our customers are implementing our way of doing.

Now if you know depending on how much experience and opinions you have about this stuff you could fork out modules you could start your own.

And what we have typically is the customer has their own modules.

We have our own modules and you kind of pick and choose from those here what we have is we're using environment variables with Terraform, which this pattern worked really well in Terraform 0 at eye level but in Terraform 0 12 for example Terraform broke a fundamental way we were using the product.

So when you run Terraform admits dash from module and you point out a module, you used to be able to initialize your current working directory with a remote module, even if your working directory had a file in it.

So that works really awesome.

We didn't need any wrappers we didn't need any tools.

But then Terraform 0 12 comes up and they say, no, the directory has to be empty.

Oh, and by the way, many Terraform commands like Terraform output.

Don't allow you to specify where your state directory is.

So that it could get the outputs.

So most commands you can say Terraform plan in this directorate Terraform applied this plan file or in the strategy Terraform it is in this directory.

But other commands like Terraform show.

I think and Terraform graph and Terraform output.

Don't allow you to specify the directory.

So there's an inconsistency in the interface of Terraform and then they broke the ability to initialize the local directory.

So anyways, my point with this is saying perhaps there is some confusion on our side as well, because the interface was changed underneath us.

So going back to your question under here.

So then if you go to this folder here and you go to like Terraform root modules here a bunch of opinionated implementations of modules for this company.

So here's the idea as the company grows in sophistication you're going to have more opinions on how you're going to do EMR how you're going to do Postgres how you do all those things.

And that's the purpose of this.

And then a developer all they need to know is that look, I can follow best practices.

If I point to that.

And then you DevOps team or people stakeholders in that with that label.

So to say, can be deciding what are the best practices for implementing those modules and then you've seen quite possibly them.

Here we have our terraforming modules.

Now our modules.

I want to preface this with.

We have these as kind of boiler plate examples for how to implement common things, but they're not necessarily the canonical best practice for implementing.

So our case root module here implements a basic case cluster with a single of scale group basically a single notebook.

Now that gets complicated right because starting companies they're going to need to know pool would use other.

And they're going to need a high memory node pool they're going to need a high you pool and how you mix and match that.

That's too opinionated for us to generalize Yeah.

Well, that's all I can say is yes.

Now, I guess.

And then what.

Yeah, and that's kind of what we're going through now just figuring out what works for us deciding on the different structures of everything and definitely taking advantage or looking at what you guys have already done and looking at a lot of things.

And just reading all over.

So yeah.

Well, that's a good read it posts.

You know everyone happens to blog and medium.

So take Jake out.

Exactly And think that's the other thing is just finding documentation on zero that 12 compared to zero 9/11.

And you know, refining my Google searches and only searching the past like five month type of thing.

That's a good point.

Yeah, there's not.

You can you can end up on a lot of outdated stuff, especially how fast and stuff.

So you know I was reading a blog from July of 2019 and blindly assumed that they were talking about doing about 12 when in fact, they were talking about zero 9/11.

So yeah, but I got to move on to the next question.

I see Brian Tyler joined us.

He's over at the audit board, and he's been working on setting up Prometheus on with the effects on ephemeral shortly clusters.

Kind of an interesting problem.

Those of you attended last few calls this Prometheus best thing has come up quite a lot.

I want to just kind of tee.

This question up a little bit and frame it for everyone.

So we called upon.

So we've been supporting a lot of different monitoring solutions, but we always come back to that yet Prometheus with refined eyes pretty awesome or Cuban community provides a lot of nice dashboards and it's proven to scale.

So one of the patterns we've been using that others here have used as well.

It works pretty well is to actually host the Prometheus time series database on each of us.

And I guess your mileage will vary and if you're running Facebook scale.

Yeah, you're probably going to need to use Daniels or whatever some bigger things.

But EFS is highly scalable.

It's POSIX compliant and it works ridiculously well with Prometheus for simple implementation.

The problem that Brian has is his clusters are totally ephemeral like when they do.

Roll out.

They bring up a whole new cluster and deploy a whole new stack to that.

And then verify that that validate that that works.

And shut down the old one.

And with Prometheus any offense.

Well, we've been using is the EFS provisionally and with the F has provisionally it'll automatically allocate a BBC system volume claim on your yet this file system.

Problem is those system ideas are unique.

They're generated by the company's platform.

So if you have several clusters how do you reattached any of this system.

If you're using the yet this provisional well the kicker is if you are doing it this way.

Well, then maybe the DFS provision provision or isn't the right tool.

You can still use CSS but the provision isn't going to be the right.

And instead, what you're going to need to do is mount the S file systems to the host using traditional methods amounting DFS fought for your operating system.

So if you're using if you're using cops you're going to use the hooks for when the service starts up and add a system hook that's going to mount that DFS file system to the host OS.

And now with Kubernetes what you can do is you can have you can mount the actual hosts volume that's been mounted there into your container and then you know what you're naming you are the decider of the naming convention at that point how you keep an secure and everything was Brian, is that clear.

I can talk more on that.

Yeah, no, that makes sense.

Yeah, it's unfortunate that I have to go that route.

But sometimes you don't get the turnkey solution.

Yeah, I mean so before the FSA provision or we were using this pattern like on core OS with Kubernetes for some time.

And it worked OK.

So I just wanted to point out for those who are not familiar with the cops manifest and capabilities that the cops manifest.

So under cloud positive reference architectures is kind of what we've been using in our consulting engagements to implement cognitive clusters with Cox.

I'm mostly speaking here to you.

Mike, who is looking for inspiration.

This here is doing that bolt poly repo architecture, which I'm undecided on at this point from being totally gung ho.

So we go.

So we.

So we go here now to the cops private topology, which is our manifest for cops.

What I love about cops is cops basically implemented a Kubernetes like API for managing Kubernetes itself.

So if you look at any C or D They're almost the same as these.

But then there's this escape hatch that works really well is that you can define system.

These units at startup that get installed.

So here you go.

Brian you'll create a system d unit that mount that that just calls exact start to mount the file system the file system.

Yeah Have you guys looked at offense CSI driver.

No So maybe there's other maybe there's more elegant implementations like I pasted in the chat.

But it was what I was looking into next, which I believe could solve my problem for me, just because I create the PVR itself.

Then I can define the path for that person simple.

Oh, OK.

Yeah If you can define in a predictable path, then you're around it.

Yeah, it's just so long as long as you're in the territory of having random ideas then yeah.

I believe, if I'm green that persistent volume myself instead of doing the PDC route, then I can create the volume.

And then when I'm provisioning Prometheus I can just tell it, which volume to mount instead of going and doing and creating a PVC through Prometheus.

Operator OK.

I think that's the route I'm going to try first.

If not, then obviously, the Cox system D1 is the one that for sure will work.

Yeah, that's your that's plan B self.

Andrew Roth you might have some experience in this area.

Do you have anything to add to it.

I've never used a CSI driver if only used.

I've only used DFS provision prisoner in that deal.

Our very own certified cougar Kubernetes administrator.

Any insight here will.

Your initial thought was the tree my have to have decreased itself be attached.

And then have to continue from there.

The CSI driver before.

I've only just started to mess around with it.

But not enough experience to really say anything much.

Playing around with rook sort of when it's set.

All right.

Well, I guess that's the end of that question then unless you have any other thing you wanted to add to that, Brian.

No thank you.

Yeah Any other questions.

We got quite a lot of people on the call.

I just have a general question.

It's worn.

So I know in the company that I work for we use a lot of the love you guys repost.

I know a lot of it is revolving around AWS.

Now do you have any or do you plan on doing any repos for like maybe Digital Ocean.

That's a good question.

But fun fact is that we use a bit of Digital Ocean kind of for our own pieces to keep our costs low and manageable.

I don't know if where would we're going to really be investing directly in Digital Ocean because of who we cater towards which are more like well-heeled startups.

And they need serious infrastructure.

So I think Digital Ocean is awesome for pieces.

But I wouldn't want to based my $100 million startup on it.

Cool I do have like a low com and also we actually do use that data process of using the telephone cloud with the workspaces.

How how's that working out for you.

Anything you want to add.

Like a for use case or firsthand experience so far.

And pretty good.

You know anytime we have any issues with a certain, I guess stage, we just have to do a pull request requested that particular stage.

OK You just go from there.

And it's pretty simple.

You know because I've only been involved with a company with a little over year.

So which of the repo strategies are you guys using with Terraform cloud the repo strategy.

I guess, depending on the stage or so like we have stages like proud of you 80q a.

So like maybe 80 and support they all branched off from the master branch.

OK So you're using the branching strategy, then where were you a team would be a separate branch, not in git repo it'll all be on the same branch in the git repo.

OK So anything that's merged into master any structures then or using the official approach that they describe up here.

Once again, it sounds like a single branch like a master of the branch from that.

And then merge back to yes in the long run.

Yeah, but are the environments is you a team is you just a workspace or is you a team a project folder like this.

You said you a team will just be a workspace.

OK So yeah, you're using that strategy.

The first strategy we were talking about here multiple workspace.

I wanted to expand on that in that we have an older Terraform repo that has a few of them that are using workspaces but we have a lot of stuff that we're doing tarragon to manage it.

And I haven't sat down really to think about it.

But is there a best practice for managing workspaces VA terabyte.

I was exploring that the other day just in terms of EPBC and out of the box.

Terry Grant does not support workspaces.

It's anti pattern for them.

And I would say that based on what I said earlier that what they used to be the official anti pattern.

Like you don't use workspaces for this but even hashi corp has done an about face on this.

So terror.

I'm not sure if they're going to be adding support for it or merging or even addressing this topic.

However, I was kind of able to get it to work using.

And I don't know what the term was in tarragon but I think the equivalent of hooks right.

So I had a pre had a well, it was a prerequisite Yeah, for either for the plan or the net or whatever.

I just use a hook to select the workspace.

And then the or create the workspace.

If it did any if the selecting failed.

And that seemed to more or less work for me.

So the challenge is you need to call this other command, which is Terraform workspace, select or Terraform workspace new and using the hooks was one way to achieve that.

You see here in a terminal window.

I might still have that question on those using Terraform workspaces.

And I haven't been keeping too up to date with Terraform 12.

Have they made the change.

So that you can interpolate your state as to your state like location or not.

It's still not a thing that's the amount of things I can talk to that as well actually.

Another vehicle that kind of.

So the OK.

So for everyone's clarification.

Let's see here.

That's a good example.

So the Terraform state back in with S3 requires a number of parameters like the bucket name the role you need to assume to use that bucket key prefixes all that kind of stuff.

And one thing that we've been practicing a cloud policy has been this thing like share nothing right.

So here's an example.

Brian's talking about.

So you see this bucket name here.

You see this key prefix here.

You see the region.

Well, the shared nothing approach right.

You really don't want to share this bucket especially if you're having a multi region architecture.

And you to be able to administrator multiple regions into that one region is down you definitely have to have multiple state buckets right.

However, because Terraform to this day does not support interpolation basically using variables for this.

That's kind of limited how you can initialize the back.

So one of the things that they introduced in a relatively late version of Terraform like Terraform from or something was the ability to set all the back end parameters with command line arguments.

So you could say terraforming minute.

And I think the argument was like config and then back n equals.

Yeah bucket equals whether I know something.

And then.

But then there was still no way to standardize that.

So that's why a cloud posse we were using.

There's this thing that you can use.

It's going to type it out here.

So Terraform supports a certain environment variables.

So you could say export TF Kawhi are was kind of weird t if Clyde plan args I think it was equals and then here you could say like dash BackendConfig equals bucket equals like my bucket.

So because you had environment variables you could kind of get around the fact that you can use interpolation.

Are we clear up to this point before I kind of do my soul searching moment here for me, this is clearer.

Yeah OK.

So I'm just one voice.

Eric, are you trying to shoot your screen.

Yeah, you might need to.

It's definitely being shared because I see the green arrow around my window.

You might need if you're in Zoom you might need to tab around to see it.

Yeah, I can see it in here.

Yeah zoom is funky like that.

I see it now.

Yeah So you see here.

I might be off by some kick camel.

Some capitalization here somewhere maybe.

But it is more or less what you can do to get around it.

So you can stick that in and see you can stick that in your makefile.

You could have users, you could have your wrapper script.

OK And then the last option.

Yeah if you're using terror ground terror Brent will help set this up for you using bypassing these arguments for you in your terror.

I grant that each sealed.

So there are options right.

But a lot of these are self-inflicted problems based on problems that we have created.

Like I said self click flick the problems we've created for our self right.

The problem we create for ourselves that cloud posse was this strict adherence to share do not share the Terraform state bucket across stages and accounts always provision a new one.

This made it very difficult when you wanted to spin up a new account because you had to wire up the state in managing Terraform state with Terraform we have a team of state back in module, we use it all the time.

It works, but it's kind of messy when you've got a initialize Terraform state without Terraform state using the module.

So it creates the bucket.

The Dynamo table and all that stuff.

Then you run some commands via script to import that into Terraform and now you can use Terraform the way it was meant to use with this bucket those created by Terraform but it's real.

And you see you see the catch-22 here.

And if you're doing this at every account is that idea.

So has she.

So grunt works took a different approach with Terry Grant and Terry Grant.

They have the tool Terry Grant provision.

The Dynamo be back in.

And the term the S3 bucket for you.

So it's kind of ironic that your whole purpose with using tenure reform is to provision your infrastructure as code and then they're using this escape hatch to create the bucket for you.

And so.

So let's just say that that was a necessary evil.

I'm not.

Know I get it.

It's a necessary evil.

Well, on the topic of necessary evils let's talk about the alternative to this to eliminate like all this grief that we've been having.

Screw it.

Single state bucket and just use IAM policies on path prefixes on workspaces to keep things secure.

The downside is yes, we gotta use IAM policies and make sure that there's never any problems with those policies, or we meet we can weak state.

But it makes everything 10x 100x easier.

It's like using.

So one of the things Terraform cloud came out with was managed state like.

And that's free forever or whatever you believe.

But just having the managed state made a lot of things easier with terrible cloud.

I still like the idea of having control over it.

We Terraform in S3 on Amazon where we manage all the policies and access to that.

So that's what.

All right.

So when you're done using workspaces together with the state storage bucket the other thing you gotta keep in mind is using this extra parameter here.

So workspace key prefix.

So if you're using the shared S3 buckets strategy, then you're going to want to always make sure you set the workspace key prefix so that you can properly control IAM permissions and that fucking based on workspaces.

So a workspace might be dead.

A workspace might be prob somewhat thank you very explaining that perhaps Teflon cleared up some confusion when it comes to workspaces.

But you said, where do you keep the state.

But if you divide that all overtime rules and policies it can be done by keeping it in a single state.

The single buffer could I should say one of the things Why we haven't liked this approach is that.

OK Let's say, OK.

And I just want to correct one thing I was kind of saying it wrong or misleading.

So the workspace key prefix would be kind of your project.

So if your project is case, the workspace key prefix would be e chaos and then Terraform automatically creates a folder within there for every workspace.

So there'll be a workspace folder for private workspace all to fit.

So there that.

Now why we haven't liked this approach is the.

So I am is a production grade where we're using I am in production, and let's say the master This bucket is in our master AWS count and we're using I am there and we're using I am to control access to death or control access to the staging workspaces or control access to some arbitrary number of workspaces while we're actually doing is we are modifying production.

IAM policies without ever testing those IAM policies in another environment by procedure like we're not enforcing.

You can still test these things.

But it's on your own accord that you're testing that somewhere else.

And that's what I don't like about it is that you're kind of doing the equivalent of cowboy.

I am in production, obviously with Terraform as code, but nonetheless, you almost need a directory server for this sort of thing.

Yeah Yeah Yeah Yeah.

That's interesting.

Is there.

I am directory integration.

I haven't looked trying to do that.

But Yeah.

So sorry once I got some comments here by Alex but if you update the reference version and dev but.

But then get pulled away, and it gets forgotten later environments come back to it's like default.

So the then coming back and being like, what if we dropped the ball on requires some fancy diff work and just tedious investigation.

I kind of want a dashboard that tells me what version of every module.

I'm referencing in each environment.

This doesn't cover everything but doing cool stuff like this is just messy, so Yeah.

So Alex.

So Alex siegmund is actually one of our recent customers and one of our recent engagements and I'm not.

So one of the problems that we have in our current approach that we've been doing, which we've done up to this point has been that you might merge pars for dead and apply those in dev but those never get promoted those changes never get promoted through the pipeline.

And then they get lost an advantage.

And that is the whole problem with what I argue is that with both the poorly repo approach that we've been taking.

But it's also the problem with the directory structure approach that's been the canonical way of doing things in Terraform for some time.

The proud directory the dev directory.

All of those things have the same problem, which you forget what's been ruled out.

So that's why I like this approach where you have one PR and you'd never work.

OK There's variations of this implementation.

One One variation one is that PR is never merged until there is an exit successful applied in every environment.

So then that PR is that dashboard that Alex is talking about.

He wants a dashboard that tells what version has been deployed everywhere.

Well, this is kind of like that dashboard when that PR is open.

Then you know that there's an inconsistency across environments.

And based on branch based on status checks and your pull request you see success in that success in staging.

And no, no update from production.

OK that's pretty clear.

Now where it's been updated versus this approach where you then merge it.

And then you need to make sure that whatever happens after that has been orchestrated in the form of a pipeline where you systematically promote that change to every environment that it needs to go after that.

But now the onus is back on you that you have to implement something to make sure that happens in Terraform cloud.

They have this workflow where it will plan in every environment.

And then when you merge it can then you can set up rules.

I believe on what happens when you merge.

So when you merge maybe it goes automatically out the staging, and then you have maybe a process of clicking the button to apply to the other environments.

What's nice about this is you'll still see that that has not been applied to those other environments.

And you need that somewhere.

So whether you use the pull request to indicate that that's been applied or whether you use a tool like Terraform cloud to see that it's been applied or whether you use a tool like Spinnaker to create some elaborate workflow for this.

That's open ended.

Let's see.

I think we've got some more messages here.

You've removed the need for such a dashboard by making part of your process ensuring that it's all repositories or environments.

Yes So I'm still not 100% sure.

OK, awesome.

So Yeah, Alex says that this alternative strategy we're proposing eliminates some of these needs for doing that because in the other model the process is basically the onus is on you to have the rigor to make sure these things get rolled out everywhere.

And especially the larger your team gets, the less oversight there is ensuring that that stuff happens.

And so are one thing I'm personally taking away from this is to try the work space recommended thing on Terraform docks first, and then you share and report back and report back.

All these pitfalls.

Yes also I am.

And to be candid, I have not.

I have not watched or listen to it yet.

And you're going to find detractors for this, right.

But that's good.

Many rich ecosystem conflicts and Anton Anton the thing go is a was it is it prolific is it forced them or I forget what the conference is he has just done a presentation on why you want to go on selling everything I said and saying, why you have to have the directory approach.

So that might be a video to watch.

Is it good.

Anton is a great speaker.

I know I'll look, I with the I actually met this guy upstairs.

Yeah Yeah.

He was there last year.

Yeah super nice guy.

He's awesome one.

Yeah, I really like.

I like him.

I met him up in San Francisco and reinvent I think this guy goes to like 25 conferences a year or something.

It's a.

So you were saying he kind of has the right combination even with zero.

That's 12 to stay with a director.

I think so.

So he is a very he's a very big terabyte user.

And Tara grant is a community in and of itself with their own ways of doing things.

So I therefore, I suspect this will be very much at promoting that because in the abstract it was something like, why you have to do this this way.

I'll find it though while this is going.

Any other questions you're I have a bit of a question that kind of is a little bit higher level.

But my experts whatever form is that when you use it, it's kind of very low level.

It's a fairly abstracted from the API.

And you have, of course, you know the built in kind of semantics that has you gives you rails as it were sort of like you know, this is how we just say transitions.

So we do this.

So we do that.

And it's kind of like you know operate inside of that construct.

Yeah What's your experience with four thoughts around using higher order constructs like what's available database TDK for example, in some of the things you could do with that in a fully complete language.

Yeah Yeah.

It's good.

I like the question.

And let's see how I answer it.

So this has come up recently with one of our most recent engagements and the developers kind of on that team challenged us to like, why are we not using TDK for some of the stuff.

Let's be like, let's be totally honest.

That like scene k is pretty new.

And we've been doing this for a long time.

So our natural bias is going to be that we want to use Terraform because it's just a richer experience.

But there's a lot of other reasons why I think one can argue for using something like Terraform and not something that's more Turing complete like SDK or blooming and that's that requirements don't always translate to awesome outcomes or products.

And the problem is that when you can do everything anything possible every way possible you get into this scenario of why people hate Jenkins and why people hate like Groovy pipelines and Jenkins because you develop these things that start off quite simple quite elegant.

They do all these things you need and then 25 people work on it and it becomes a mush of pipeline code a mush of infrastructure code.

If we're talking the c.d. k right.

And things like that.

This is not saying you can't use it.

I'm thinking more like there's a time and place for it.

So if we talk about Amazon in the Amazon ecosystem.

For example, I like to bring up is easy.

It's easy s has been wildly popular as a quick way to get up and running with containers in a reliable way that's fully integrated with AWS services.

But it's been a colossal failure.

When you look at reusable code across organizations.

And this is where Kubernetes just kicks butt over.

Yes Yes.

So in Kubernetes they quickly Helen became the dominant package manager in this ecosystem.

Yeah, there's been a lot of hatred towards helm for some security things or whatnot, but it's been incredibly successful because now, are literally hundreds and hundreds of helm charts many written by the developers of these things to release and deploy their software.

The reason why I bring up helm and Kubernetes is that's provided proved to be a very effective ecosystem.

We talk about Docker same thing incredibly productive ecosystem.

And so with Docker Hub.

There's this registry.

And you know security implications aside there's a container for everything.

People put them out there your mileage may vary and some might be exploitable but that's part of the secret.

Why doctor has been so successful.

It's easy DSL easy distribution platform and works everywhere.

Same pattern like going back in the days to Ruby gems and then you know Python modules and all these things.

This is very effective.

Then we go to Amazon and we have easy yes and there's none of that.

So we get all these snowflake infrastructures built in every organization to spec.

And then every company, every time you come into a new company at least as us as contractors you know two environments look the same.

They're using the same store tools stack.

But there's too many degrees of variation and I don't like that.

So this is where I think that the six part of the success of Terraform has been that the language isn't that powerful and that you are constrained in some of these things.

And then the concept of modules is that registry component, which allows for great tremendous usability across organizations.

And that's what I'm all for.

And that's like our whole mission statement that cloud passes to build reusable infrastructure across organizations that's consistent and reliable.

So back to TDK question and the answer that I gave the customer was this.

Let's do this.

Let's continue to roll out Terraform for all the foundational infrastructure, the stuff that doesn't change that much the stuff that's highly usable across organizations.

And then let's consider your developers to use TDK for the end for the last layer of your infrastructure.

What I'm talking about there.

And I'm not sure at what point you join.

But in the very beginning, the call.

I talked about what I always talk about which are the four layers of infrastructure basically layer one, layer two layer through layer 1 is foundational infrastructure layer 2.

This your platform layer 3 are your shared services layer 4 are your applications, your applications go for it.

Go nuts you may use CloudFormation, you know you server framework like if somebody is using the service framework, which is purpose built for doing lambdas and providing the structure other rails but for lambdas use it.

I'm not going to say because we use Terraform in this company, you're not going to be able to use service that's not the right answer.

So the answer is that's going to depend on what you wear at where you're operating.

Yeah, I really I really like that for Lamont.

I miss that from the beginning of the call, But that really makes a lot of sense because you want to have your foundations a little bit more rigid you don't want to have that much that you described earlier.

And that's where I think at a lower level the tie constructs that that Terraform gives you the high opinionation, I should say that makes sense, because you can only do so much.

And moreover you have a pretty mature kind of you used to be with Terraform it know you'd have temporal plant and then Terraform Apply could be quite different.

But I think this equipment has become much more mature at this point.

Yeah And they and they really do a good job predicting when they're going to destroy your shit.

Yeah And yeah.

And they have and they added just enough more functionality to HCM to make it less painful.

Which I think is going to quell some of the arguments around Turing completeness.

And then the other thing I wanted to say related to that is like the problem we had in pre h CO2 all the time was count of cannot be computed.

That was the bane of our existence.

And one of the top one of our top linked articles in our documentation was like, all the reasons why count cannot be computed.

Now that's almost we don't see it as much anymore.

So I'm a lot happier with that.

The only other thing I was going to add and I'm not sure I'm 100% on this anymore.

I was alone.

Well, I wasn't 100.

So I was maybe 50 60% before.

Now maybe 30, 40 but I was wondering like maybe maybe HDL is more of like a CSS language and you need something like Sas on top of it, to give a better developer experience.

But for all the reasons I mentioned about CTE came my concern is that we would get back into this problem of non reusable vendor lock kind of solutions and unless it comes from hashi core you run the risk of running afoul of kind of division.

They see for the product also.

Alex siegmund shared in the Zoom chat don't Alex keep you posted to the suite ops office hours channel as well.

Yeah, this is the.

This is the talk that the Anton Banco did at DeForest them and the recording has been posted and he just posted it is LinkedIn.

I'll look it up after this call and share it.

I do think actually though boy, this has been a long conversation today.

I think we already at the end here.

Are there any last parting questions.

No more just to thank you.

Thanks for calling me out earlier.

And then taking the whole hour to talk about her farm.

I appreciate that.

Well, that's great.

I really enjoyed today's session.

As always so lets see are you going to just wrap this up here with the standard spiel.

All right, everyone looks like we reached the end of the hour.

That about wraps things up.

Remember to register for our weekly office hours if you haven't already, go to cloud posterior slash office hours.

Again, that's cloud posse slash office hours.

Thanks again, everyone for sharing your questions.

I always get a lot out of this, and I hope you learned something from it.

A recording of this call will be posted to the office hours channel and syndicated to our podcast at podcast.asco.org dot cloud plus.

So see you guys next week.

Same place same time.

Thanks a lot.

Thank you, sir.

Public “Office Hours” (2020-01-29)

Erik OstermanOffice Hours

1 min read

Here's the recording from our DevOps “Office Hours” session on 2020-01-29.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Jenkins Pros & Cons (2020)

Erik OstermanCI/CD, Cloud Technologies, DevOpsLeave a Comment

9 min read

I spent some time this weekend to get caught up on the state of Jenkins in 2020. This post will focus on the pros and cons of Jenkins (not Jenkins X – which is a complete rewrite). My objective was to set up Jenkins following “Infrastructure as Code” best practices on Kubernetes using Helm. As part of this, I wanted to see a modern & clean UI throughout and create a positive developer experience. Below is more or less a braindump of this experiment.

Cons

  • Jenkins has a lot of redundant plugins. Knowing which one to use takes some experimentation and failed attempts. The most common example cited is “docker”. Personally, I don't mind the hunt – that's part of the fun.
  • Jenkins has many plugins that seem no longer maintained. It's important to make sure whatever plugins you chose are still receiving regular updates (as in something pushed within the last ~12 months).
  • Not all plugins are compatible with Declarative Pipelines. IMO using Declarative Pipelines is the current gold standard for Jenkins. Raw imperative groovy pipelines are notoriously complicated and unmanageable.
  • No less than a few dozen plugins are required to “modernize” Jenkins. The more plugins, the greater the chance there will be problems during upgrades. This can be somewhat mitigated by moving towards using command-line driven tools run inside containers as opposed to installing some of the more exotic plugins (credit: Steve Boardwell).
  • There's no (maintained) YAML interface for Jenkins Pipelines (e.g. Jenkinsfile.yaml). Most modern CI/CD platforms today have adopted YAML for pipeline configuration. In fact, Jenkins X has also moved to YAML. The closest thing I could find was an alpha-grade prototype with no commits in 18 months.
  • The Kubernetes Plugin works well but complicates Docker Builds. Running the Jenkins Slaves on Kubernetes and then building containers requires some trickery. There are a few options, but the easiest one is to modify the PodTemplate to bind-mount /var/run/docker.sock. This is not a best-practice, however, because it exposes the host-OS to bad actors. Basically, if you have access to the docker socket, you can do anything you want on the host OS. The alternatives like running “PodMan“, “Buildah“, “Kaniko“, “Makisu“, or “Docker BuildKit” on Jenkins have virtually zero documentation, so I didn't try it.
  • The PodTemplate approach is problematic in a modern CI/CD environment. Basically, with a PodTemplate you have to define before the Jenkins slave starts the types of containers you're going to need as part of your pipeline. For example, you define one PodTemplate with docker, golang and terraform. When the Jenkins slave starts up, a Kubernetes Pod will be launched with 3 contains (docker, golang and terraform). One nice thing is that all those containers will be able to share a filesystem and talk over localhost since they are in the same Pod. Also, since it's a Pod, Kubernetes will be able to properly schedule where that Pod should start, and if you have autoscaling configured, new nodes will be spun up on-demand. The problem with this, however, is subtle. What if you want a 4th container to run that is a product of the “docker” container and share the same filesystem? there's no really easy way to do that. These days, we frequently will build a docker container in one step, then in the next step run that container and execute some tests. I'm sure this can be achieved, but nowhere near as easily as with codefresh.
  • It's still not practical today to run “multi-master” Jenkins for High Availability without using Jenkins Enterprise. That said, I think it's moot when operating Jenkins on Kubernetes with Helm. Kubernetes is constantly monitoring the Jenkins process and will restart it if unhealthy (and I tested this inadvertently!). Also, when using Helm if the rollout fails health checks, the previous generation will stay online allowing the bugs to be fixed.
  • Docker layer caching is non-trivial if running with Ephemeral Jenkins Slaves under kubernetes. If you have a large pool of nodes, chances are that every build will hit a new node, thus not taking advantage of layer caching. Alternatively, if using the “Docker in Docker” (dnd) build-strategy every build will necessarily pull down all the layers. This will both add considerably to transit costs and build times as docker images are easily 1G these days.
  • There's lots of stale/out-of-date documentation for Jenkins. I frequently stumbled on how to implement something that seemed pretty basic. Anyways, this is true of any mature ecosystem that has a tremendous amount of innovation, lots of open sources, and been around for 20 years.
  • The “yaml” escape hatch for defining pod specs is sinfully ugly. In fact, I think it's a horrible precedent that will turn people off from Jenkins. It's part of what gives it a bad wrap. The rest of the Jenkinsfile DSL is rather clean and readable, but embedding raw YAML into my declarative pipelines is not a practice I would encourage for any team. To be fair,  some of the ugliness could be eliminated by using readFile or readTrusted steps (credit: Steve Boardwell), but again it’s not that simple.

Pros

I would like to end this on a positive note. All in all, I was very pleasantly surprised by how far Jenkins has come in the past few years since we last evaluated it.

  • Helm chart makes it trivial to deploy Jenkins in a “GitOps” friendly way
  • Blue Ocean + Material Theme for Jenkins makes it look like any other modern CI/CD system
  • Rich ecosystem of Plugins enables the ultimate level customization, much more than any SaaS
  • Overall simple architecture to deploy (when compared to “modern” CI/CD systems). No need to run tons of backing services.
  • Easily extract metrics from your build system into a platform like Prometheus. Centralize your monitoring of things running inside of CI/CD infrastructure. This is very difficult (or not even possible) to do with many SaaS offerings.
  • Achieve greater economies of scale by leveraging your existing infrastructure to build your projects.  If you run Jenkins on Kubernetes, you immediately get all the other benefits. Spin up node pools powered by “SpotInst.com Ocean” and get cheap compute capacity with preemptible “spot” instances. If you're running Prometheus with Grafana, you can leverage that to monitor all your build infrastructure.
  • Integrate with Single Signon without paying the SSO tax levied by enterprise software.
  • Arguably the Jenkinsfile Declarative Pipelines DSL is very readable, in fact, it looks a lot like HCL (HashiCorp Configuration Language). To some, this will be a “Con” – especially if YAML is a requirement.
  • Jenkins “Configuration as Code” plugin supports nearly everything you'd need to run Jenkins itself in a “GitOps” compliant manner. And if it doesn’t there is always the configuration-as-code-groovy plugin which allows you to run arbitrary Groovy scripts for the bits you need (credit: Steve Boardwell).
  • Jenkins can be easily deployed for multiple teams. This is an easy way to mitigate one of the common complaints that Jenkins is unstable because lots of different teams have their hands in the cookie jar.
  • Jenkins can be used much like the modern CI/CD forms that use container steps rather than complicated groovy scripts. This is to say, yes, teams can do bad things with Jenkins but with the right “best practices” your pipelines should be about as manageable as any developed with CircleCI or Codefresh. Stick to using container steps to reduce the complexity in the pipelines themselves.
  • Jenkins Shared Libraries are also pretty awesome (and also one of the most polarizing features). What I like about the libraries is the ability for teams to define “Pipelines as Interfaces”. That is, applications or services in your organization should almost always be deployed in the same way. Using versioned libraries of pipelines helps to achieve this without necessarily introducing instability.
  • Just like with GitHub Actions, with Jenkins, it's possible to “auto-discover” new repositories and pipelines. This is sweet because it eliminates all the ClickOps associated with most other CI/CD systems including CircleCI, TravisCI, and Codefresh. I really like it when I can just create a new repository, stick in a Jenkinsfile, and it “just works”.
  • Jenkins supports what seems like an unlimited number of credential backends. This is a big drawback with most SaaS-based CI/CD platforms. With the Jenkins credential backends, it's possible to “plug and play” things like “AWS SSM Parameter Store”, “AWS Secrets Manager” or HashiCorp Vault. I like this more than trusting some smaller third-party to securely handle my AWS credentials!
  • Jenkins PodTemplates supports annotations, which means we can create specially crafted templates that will automatically assume AWS roles. This is rad because we don't even need to hardcode any AWS credentials as part of our CI/CD pipelines. For GitOps, this is a holy grail.
  • Jenkins is 100% Free and Open Source. You can upgrade and get commercial support from Cloud Bees which also includes a “tried and tested” version of Jenkins (albeit more limited in the selection of plugins).

To conclude, Jenkins is still one of the most powerful Swiss Army knifes to get the job done. I feel like with Jenkins anything is possible, albeit sometimes with more effort and 3 dozen plugins. As systems integrators, we're constantly faced with yet-unknown requirements that pop-up at the last minute. Adopting tools that provide “escape hatches” provide a kind of “peace of mind” knowing we can solve any problem.

Parts of it feel quite dated like the GUI Configurations, but that is mitigated by Configuration as Code and GitOps. I wish some things like building and running Docker containers inside of pipelines on Kubernetes was easier. Let's face it. Jenkins is not the cool kid on the block anymore and there are many great tools out there. But the truth is few will stand the test of time the way Jenkins has in the Open Source and Enterprise space.

Must-Have Plugins

  • kubernetes (we tested 1.21.2) is what enables Jenkins Slaves to be spun upon demand. It comes preconfigured when using the official Jenkins helm chart.
  • workflow-job (we tested 2.36)
  • credentials-binding (we tested 1.20)
  • git (we tested 4.0.0)
  • workflow-multibranch (we tested 2.21) is essential for keeping Jenkinsfiles in your repos. The multi-branch pipeline detects branches, tags, etc, from within a configured repository, so Jenkins works more like Circle, Codefresh or Travis.
  • github-branch-source (we tested 2.5.8) – once configured will scan your GitHub organizations for new repositories and automatically pick up new pipelines. I really wish more CI/CD platforms had this level of autoconfiguration.
  • workflow-aggregator (we tested 2.6)
  • configuration-as-code (we tested 1.35) allows nearly the entire Jenkins configuration to be defined as code
  • greenballs (we tested 1.15) because I've always green is the color of success =P
  • blueocean (we tested 1.21.0) gives Jenkins a modern look. It clearly depicts stages, progress as well as most other systems we've seen like CircleCI, Travis or Codefresh.
  • pipeline-input-step(we tested 2.11)
  • simple-theme-plugin (we tested 0.5.1) allows all CSS to be extended. Combined with the “material” theme for Jenkins you get a complete facelift.
  • ansicolor (we tested 0.6.2) – because many tools these days have ANSI output like Terraform or NPM. It's easy to disable the color output, but as a developer, I like the colors as it helps me quickly parse the screen output.
  • slack (we tested 2.35)
  • saml (we tested 1.1.5)
  • timestamper (we tested 1.10) – because with long-running steps, it's helpful to know how much time elapsed between lines of output

Pro Tips

  • Add the following nginx-ingress annotation to make Blue Ocean the default”
    nginx.ingress.kubernetes.io/app-root: /blue/organizations/jenkins/pipelines
  • Use this Material Theme for Jenkins with the simple-theme-plugin to get a beautiful looking Jenkins
  • Hide the [Pipeline] output with some simple CSS and the simple-theme-plugin:
    .pipeline-new-node {
      display: none;
    }
    
    

References

When researching this post and my “Proof of Concept”, here some of the links and articles I referenced.