Public “Office Hours” (2020-02-21)

Erik OstermanOffice Hours

Here's the recording from our DevOps “Office Hours” session on 2020-02-21.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show star.

Welcome to Office hours.

It's February 19 2020.

My name is Eric Osterman and I'll be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator.

We help startups own their infrastructure in record time by building it for you and then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to arm yourself at anytime if you want to jump in and participate.

If you're tuning in from our podcast or YouTube channel, you can register for these live and interactive sessions by going to cloud posse slash office hours.

We host these calls every week we automatically post recording of this session to the office hours channel as well as follow up with an email.

So you can share it with your team.

If you want to share something in private.

Just ask.

And we can temporarily temporarily suspend the recording.

With that said, let's kick it off.

So I got a couple of talking points lined up here today.

If there aren't any other questions.

First, some practical recommend nations where one factor out of cabinet is a checklist I put together because it comes up time and time again, when we're working with customers.

So I want to share that with you guys.

Get your feedback.

Did I miss anything.

Is anything unclear.

Be helpful.

Also John on the call has a demo.

A nice little productivity hack for multi-line editing in the US code.

I've already seen it is pretty sweet.

So I want to make sure you all have that your back pocket.

Also Zach shared an interesting thing today.

I hadn't seen before, but it's been around for a while.

It's called Goldilocks and it's a clever way of figuring out how to right size you Kubernetes pod.

So we'll talk about that a little bit.

I am always on the list.

But we never seem to get to it.

Are you hotel ops interview questions.

So Yeah conversation was valuable over that as well.

All right, so before we get into any of these talking points.

I just want to go over any or cover any of your questions that you might have using Clyde technology or just general related to DevOps I got a question.

All right.

Fire away in a scenario where you have multiple clusters.

And you may not be using the same versions across these things.

Let's say you're in a scenario maybe you're actually managing for claims.

How do you deal with the versions of cubes deal being able to switch between cluster versions, and so on.

If you're getting strategies out there that you can actually utilize.

Yeah, I'll be happy to help answer that before I do anybody have any tips on how they do it.

All right.

So what we've been doing, because this has come up recently specifically for that use case that you have.

So I know Dale you are already very familiar with geodesic and what we do with that.

And that solves one kind of problem.

But it doesn't solve all the problems.

So for example, like you said, you need different versions of cubed cuddle.

Well, OK.

So cute cuddle I want to talk about one specific nuance about that.

And that is that you've got a list designed to be backwards compatible with I believe the last three minor releases or something.

So there's generally no harm in at least keeping it upgraded or updated but that might not still be the answer you're looking for.

One thing you wanted to I know we had to address this situation separately.

So that's why what we do is under cloud posse packages.

So this is good outcomes.

Cloud passes large packages we distribute a few versions of cubed cuddled pin at A few minor releases.

Now we think we only have the latest three versions because we only started doing this relatively recently.

But the nuance of what we're doing with our packages is using the alternative system that leave data and originally came up with.

And it's also supported on alpine.

So basically what you can do is you can install all three versions of the package.

And then you point the alternative to the version that you want to use to run those commands.

So that's one option and that requires a little bit of familiarity with the alternative systems and how to switch between them.

I think there's actually a menu.

Now that you can choose which one you want.

But the other option, which we use for Terraform is where you will want to have a bunch of 0 to 11 compatible projects and some zero total compatible projects and you want to have you don't want to be switching between alternatives to run them.

So what we did there was we actually installed them into two different paths on the system.

So like user local Terraform zero 802.11 then.

And that's where we stick the triple binary for that.

And then zero then user local Terraform 0/0 to been slash Terraform that would be the binary there.

So just by changing the search path based on your working directory or where you're at and you can use that version of Terraform you want without changing any make files without changing any tooling like that that expects it just to be Terraform in your search path.

I can dive into either one of those explanations more if you want to see more details on that.

Is that along the lines if you're looking for are you looking for something else.

Yeah Well, so there is a tool that came across before I really got into Judy's which was called tips, which happens like a form where you can actually switch between the versions are getting good.

Yeah, I was hoping that there was something that I hadn't found that was similar to that I would do a switch.

Gotcha Yeah.

There could be.

Like is it.

I just have a thing for Terraform also you're working on.

Yeah, I know your question.

Dale was related to Cuba.

But I mean, I'm just speaking for me personally.

I don't like the end switchers like rb ns of the world, the CSSM like the TAF cover version managers and stuff like that for us.

But obviously, it's a very popular pattern.

So somebody else is probably a better answer one.

Yeah OK.

So what.

There's one other idea on that.

Or let me just clarify the reason why I haven't been fond of the version managers for software of the purpose built version managers like that you need an IRB in for a real Ruby or whatever.

And then you need a terrible man for Terraform and then you need a one for coop kubectl and all these things.

What I don't like about it is where does the madness stop.

Like we technically need to be versioning all of this stuff, and that's why I don't like those solutions.

That's why I like the alternative system because it's built into how the OS package management system works.

Do you foresee taking a similar approach for Terraform and Virgin switching with.

So you could do in the future.

Well, so well.

So yeah.

So sorry.

I kind of glossed over it or handwaving so I go to get a cloud policy packages nap.

Now I don't know if you're using alpine and alpine as I come a little bit less popular lately with all the bad press about performance.

So let's see.

So if we go into the vendor folder here.

So we literally have a package for a cube cut all one of 13 one to 14 1 to 15.

So we will continue doing that for other versions and the way it works with alternatives is basically this.

And you don't have to use alpine to do this.

This is supported on the other OS distributions too.

But is you have to install the alternatives system.

And then when you have that, you can switch.

So OK.

So this here shows how to set up the alternatives that you have.

Then there's another command you use to select between the versions available.

And once you've installed all the versions it's very easy to select between those.

I just don't remember what it is up to.

It feels like a fuzzy wuzzy thing.

Yeah, I think he uses FCF for whatever that is.

Yeah You mean in your screen.

Oh, yeah.

It was.

Thank you.

Thank you guys.

It's a public service announcement.

I always forget to share my dream.

And I talk and wave and point to things like, I am.

Yeah So this is.

Yeah So here in cloud posse packages under vendor.

We have all our packages here is where we distribute the minor pin versions of cubed cuddle and then we also do the same thing like for Terraform likes.

All right.

Good question.

Anybody else have any dads that have one more.

If no one else is going to ask.

Also just really reminded me.

But it doesn't look like Andrew Roth is he on the call.

This conversation because Andrew has a similar tool to dude exec that he's calling dad's garage and the conversation was.

So if you look in I forget what channel it was a release engineering.

If you look in the release engineering channel those conversations.

Yeah Next question.

Dale So has anyone actually successfully done any multiple arctic bills for Docker.

Let's say that one more time multi architecture builds.

Oh, interesting.

So you're saying like Doctor for.

Oh, but doctor for arm and doctor for EMT 64 something that correct.

Yeah Yeah.

I'd be curious about that.

No personal experience.

I've been trying to do something while do that kind of build for going to frustrate petrol stop.

I can say is this where your Raspberry Pi could bring it closer.

Yeah So I figured, well, this is part of it.

But I just came across it where I'm really into issues with my Raspberry Pi three beetles where it's using R these seven and a lot of the packages are built to support our V8 which is 64 get to be 70 is 32-bit.

And I've run into the issue where it just would not outright launch little begin.

Figured out it was Dr. architecture that was the issue where some sources will support it.

Some will not.

Unless you're planning to do your own build and all that you know.

So I started to dig into it a little bit more guys like a rabbit hole.

Yes, it was.

It certainly was.

Yeah All this stuff is pretty easy.

So long as you stay mainstream on our protection.

What everyone else is doing.

But as soon as you want to do a slight minor variation.

The scope explodes.

Yeah, I'm at a point where much of an upgrade to the raspberry force.

Yeah And Yeah not better, just forget about it.

No, that's right.

No, but I did learned something new.

So that helps.

Yeah Yeah.

Echo Cole.

Any other questions before we get to a demo.

John, do you want to get your demo setup.

And let me know when you're ready to do that.

Yes code demo.

I'm ready to roll.

You ready to roll.

OK, let's do it.

I'm going to hand over controls.

Go ahead.

Oh, OK.

So just a quick introduction to everyone here.

So John's been a longtime member of suite ops.

He's hard core on form and developer hacks and productivity.

He gave a demo last week that was pretty interesting.

Yeah, check it out.

That was on using Terraform cloud and the new trigger triggers that are available in Terraform cloud.

So having him back on today he's going to show a cool little hack on how you can do multi-line editing which isn't something I almost.

I've seen it before.

And I kind of forgot about it.

And it's something like you almost don't think I don't need that.

But when you see it, you'll see that you need it.

And want it.

And I actually did as a quick follow up to the Haskell demo last week for run triggers and I did talk with them today and they aren't looking at doing a few UI enhancements as noted it is a beta right now.

So it's not meant for production use.

But we did talk about some use cases and some potential UI improvements and things like that.

But they are definitely excited about the feature.

So let's go over.

We see some stuff going on there.

But as far as multi-line editing.

So I'm a pretty big on efficiency.

And I hate wasting time.

And so it always.

I was managing a group of developers.

And I would always be frustrated watching them edit something because I tell them, hey guys if we're doing Terraform let's say, for instance, to keep it relevant hey go ahead and put these outputs and let's do an output for, let's say all of the available options here.

Let's do an output floor for those.

And it would basically go something like this.

All right output something output.

If I could type.

Typing is hard.

Something else.

And so you're just going through and slowly typing all of these outputs, which are all basically, let's say outputs from the telephone documentation.

And so you know I started showing them these little performance tweaks performance hacks to kind of improve on that speed.

And so what I'll do here is actually take all of these arguments for EC2 let's say if we were wrapping an EC2 instance.

And we wanted to provide all of these outputs and this is generally in the case of I mean, this is right now in the case of Terraform but it could be anything.

So we have a couple sections here.

Note we don't really care about.

So I'm just going to highlight the note.

The colon and the space and use commands to actually get rid of those.

So now we're basically dealing with just outputs.

Now, the key to multiplying editing is to find something that's common.

So the shortcut by default in Visual Studio Code your shortcuts may be different.

To do a single multi-line selection is commanding.

And so if I just use Command B on a is going to go through and select every day if I use Command Shift l it's going to select every a in the left in the document that I'm currently editing.

So it's very important to find something as unique if I just do a you can see here like PBS dash optimized single tenant CBS backed.

That's not really going to give me what I'm looking for and what I'm really looking for is to take the left side of this as the label on the right side as the description.

And so I can kind of get a little better by selecting the space around the dashes but then I still end up getting because normally they do just optional, but in this case, they did optional dash.

So it kind of catches this one as a little bit off.

So when you can't find something that's unique across all of them.

One thing that every document has is Lindbergh's.

And so if you have basically everything on a single line you can find all the line breaks and just use that.

And in this case, I'm going to actually get rid of these empty line breaks here.

So we can just have a clean structure.

And so if I basically go to the end of the line shift right arrow Command Shift l and then go back with my left.

I'm basically editing every single line.

Now I'm just at the end of all the lines.

So now I can move around and operate as if I'm on a single line, but I'm editing every line.

So if I'm doing an input here I can just type as I normally would.

And at this point, I'm inputting all don't remember how many variables it was but all 30 or 40 variables here and now have all of my descriptions filled out.

So that thing we can do is we know that for, let's say all of these options we don't want to have that in the documentation, because we're going to automatically output that we can just highlight all of the options Command Shift they'll delete it.

And then we can give it some default value as well.

And so the key is actually just moving around with command to go to front or back of line option to go by word as opposed to going by individual characters.

Because if you see line six here.

I'm here on start.

But I hit Option right arrow the same amount of times.

But here.

I'm in a different position because the length of the description changes.

So if I need to get to the beginning or to the end.

I need to use my command keys in order to more quickly get to that space.

But this is a very quick way as you can see, even with all the talking I've been able to adding all up.

Add in all of these values pretty quickly edit them get them in a massage place.

And now I can actually go in and tweak anything that I need to tweak.

There are times like this where you're not going to have as clean of a set of data.

I guess you could say so you may have to come in and make some manual adjustments or something like that, just to get this cleaned up.

But that's it.

Any questions around that.

I'm happy to answer.

But this is I just found us being something very important when I started creating modules and especially when I needed to output a whole bunch of outputs or have a whole bunch of inputs in that position.

Yeah, that was really helpful.

I've had that problem with outputs especially when you're writing modules and you want to basically proxy all the outputs from some UPS the upstream resource back out of your module.

I think this would save a lot of time.

Any questions, guys.

So it was good.

I've seen is a pretty common trend.

Like if you want to output multiple attributes of a single resource people define them individually.

Something I started doing was just outputting the entire resource itself.

Loving the end user choose from that map.

They want what they kind of want to interact with.

So is that a bad practice.

I know I've really seen it in public modules.

But I try to BBC I pretty much as output the entire VPC resource.

And then you can access it with dot notation just like you would.

The actual resource Yeah.

So you're on a terror form.

I want to correct.

So this was not something we could do in.

And so you had to output.

Every single one of these.

I switched also to doing a little bit more of just return the resource.

It's a lot better.

It's a lot easier to manage and maintain on that front.

But I guess there are some cases where you may want to adjust those things like if you have a case where you need to do.

Let's say out module.

Let's say you have an AWS instance, and you have multiple of those.

Something like that.

You may not want to output that as an idea and you may be using count just to hide it.

So this is where you kind of get into that place of doing these sort of things to kind of bring them into one output.

So I think there are some cases where you may want to do some of these custom outputs.

But yes, in general, I do like that you can return side got x I have underlined that question.

So does anyone know how can we do count dot index when we actually put resources for loop.

There's actually a few ways with a for loop it kind of depends on your scenario.

But there's a few different approaches.

I think actually the documentation is generally pretty good here on that.

So if I can find it real quick always go to a few different spots here.

Might be under expressions.

Finding it is always difficult for me.

Here anybody remember where it is it says four and discourage.

I think that's how you do the final.

Yeah So my question here is you know why I want to use for ages.

You know when you are using count.

And then you more do things maybe more the index I find on a list or maybe remove some items count actually messes up the whole sequence part.

So if you're using four for each eye creates you know results for each game.

Each item separately.

So it's kind of a set of the things or maybe a list of the things now.

But I still want to utilize the index update and had you know say, for example, if there is an area on the list, which has five items in it.

I want to trade on each of the index like zero 1, 2, 3, 4.

So I know what we were able to do it with the count.

I also do to find a way to do that similar end of stuff for each yet.

So I mean, the for each returns a map.

So what you could do is I think that use the values function to just access.

And it basically that will return a list of the values and kind of strip out the keys and then from there, you can column by index.

Oh, OK.

I tested out this page here actually goes through.

There was a place where they actually kind of gave a pretty solid breakdown.

Yeah right here.

And I linked to this in the chat but it's a pretty solid breakdown of when and why you could want to use one over the other.

And it kind of shows you like when to use count y you want to use count.

In this case, you know it's something that is kind of case by case.

Right you can use count for certain things.

But then there's certain things you may want to use a for each loop for.

So I would say it's case by case.

It really depends.

But I do agree with no one that in essence, I think everybody's use to count from 0 1 1.

So a lot of the times when you're transitioning and you want to use for each but then other resource references the original once you kind of get this mismatch.

So where possible.

I've been trying to convert from count to four each and use the keys instead of the indexes indices.

So I think it's just the problem occurs when you had previous code from 0 1 1 where you already have all the logic with contemplates and then switching it over is just an extra pain right.

Yeah So for that, you need to do a lot of DST DMB we trillion trillion more those resources and make that the data file and all graceful another question and maybe you can answer since you did a lot of modules.

So I was working on one.

One project for you know setting of the priorities of the rules and then is not to prioritise that dynamic solely.

If, for example, if we have a beautiful which deletes or maybe destroys and recreated the next room takes a bright step and then actually messes up the whole thing.

So where do you go to find a workaround for that kind of scenario.

I would have to defer that question to Andre who's been doing more of the low level Terraform stuff on like the modules and things like that.

Also, we are doing a lot more just I mean 99 percent of what we do as a company is uncovering eddies and therefore, like the bee rules is not something we deal with.

We just use ingress.

I do know that some of the stuff got a lot easier with zero 12 passing max around.

But I can't I can't give you a helpful answer right now if you ping Andre on the Terraform jam channel a that's at a k NYSE h he'll be able to answer that.

All right.

Thanks, Adam carolla.

I missed the question, but what was the question about the.

Yeah, John probably know the answer.

Yep So the question is for you know setting up the priorities of the rules in the application load balancer.

So when you have defined priorities for you know there'll be like lateral Terraform resource.

And then if there is you know if there is an instance where that rule.

Yes Yes regarding that I mean guess false destroyed if you're changing something.

The next rule takes the priority of it.

And then the whole sequence gets messed up.

So you know I was just looking to get there you know freeze to way where things remain as it is if at all.

If there is a requirement of regulating that list that rule and next rule doesn't get the priority that, yes, it is you actually can control that priority on the configuration.

So there's a priority.

Yeah So I am using that.

However, if there is an instance where you know by chance if that list never requires to get recreate it by, changing some settings, which requires us to get false destroying the default behavior of four it'll be easy.

You know, if I leave if, at least, the rule is deleted it actually changes the priority of all the rules which are coming next to that.

So if you are say, for example, at least that ruling with a priority of number five guess where you created all the rules after a number five will be unless you've had one.

One step about like six would become five 7 would become 6 and 8 would become seven.

I actually haven't seen that.

But up that may be because I always use factors of 10.

So I don't put them one by one.

I'll do like 01 10 20 up to whatever.

Yeah but.

But that may be why I haven't seen that actually.

But in all the primaries, we said we never I never seen them reset once their form applied again.

They were good to go.

Yeah, I play is good.

But if at all.

There is an instance where the results gets recreated that's where it starts freaking out.

So you know there is if you're not, you're not there.

There is a limit of number of rules that you can have and want to be.

And they will still really legacy application of the initial application, which required us to ensure that a lot or the Beatles.

And this is where I actually was able to replicate this issue interesting with how are you.

First of all, are you using you.

You wouldn't happen to be using the cloud posse AWS Albie ingress Dude, are you ingress.

No this is just simple it works on easy is not going to be noticed.

Well, yeah.

So the dad does a little bit of the confusing thing.

So what we did when we were designing our ADA are easy s modules and our modules for Albie we kind.

We made the opinion we took we had the opinion of making them feel like the Kubernetes in an interface for ingress.

So we created a module called from Adobe yes hb ingress which is which defines a rule.

So that it works similar to defining an English pressing Cuban entity even though you're defining a rule for an elite and using, say yes for example.

So when you do it this way, and you're defining a module explicitly for each rule.

And I it's been working for us.

But I can't say just like what John said, if we've encountered this thing where the priorities get re numbered or something like that hasn't been reported yet to us.

But the different part of what I want to make sure was that you weren't using something like account together with defining your rules.

So you know we do have some dynamic parts, which also involves doing this all also, again, again, a reference to the question that I asked prior to that by using for each and.

Yeah Yeah, that makes sense.

So you usually you're not emotionally mentioned the list, and then it starts freaking out.

I think that the problem.

Therefore, you have is probably more related to the module or the way you're doing the Terraform maybe than some fundamental limitation of like LP resource with themselves.

Yeah Yeah.

Because we actually hit an issue where sometimes if we have to recreate a target group like it if you change the name switching from one of the cloud posse modules, there was like a fundamental name change of target groups.

And so to delete sometimes they don't delete properly, they kind of hold on there and you so you can't believe it's hard to do the best thing use.

So we did a lot of manual deleting Warren just at this last week actually.

So we did a lot of manual deleting and I never saw priorities change all the priorities were stayed in line.

Yeah So target group is key.

But priorities only change if you know change the path or something like condition or something where that's where the listener role gets created.

Yeah but once you give it a priority number it should stay there.

I would Yeah.

But I want to I still want to fix the priorities for because those rules that dynamic.

So if you had a part, which is coming from a variable, which is a list.

Then you know you had another.

But then it really, again, speed it up a new listening to the next priority.

And then becomes harder to maintain.

Yeah So maybe look at adjusting your list and utilise a map or something to where you can define like a set priority for those that way when you loop over it.

It's a set priority every single time, no matter if you move it up and down in your list or something else, make it explicit.

Right OK.

All right.

Let's are there any other questions.

So high.

I have a question for my use cases like I kind of want to generate a report like based on like underutilized easy to incent and over Italy's ancient and automated to send the reports to an email.

Like if the incentives are running less than 10% I would like to see the instance list.

And if it's more than 80% I'd like to see the list of incentives.

So I just looking for ideas to achieve this catch up.

So the idea that somebody here might have a suggestion for that we work with mostly just infrastructure managed under Kubernetes and so we're not really concerned about any one instance, in and of itself.

From a reporting perspective.

But the.

Yeah anybody familiar with some tools to generate the reports that he's looking for AWS has the costing users report.

That's kind of built in.

I don't know about the request of emailing but I know that's in there.

Mm-hmm I could turn it into s.a. you could probably do that for sure.

Put this on the track.

It is probably the utilized utilization for some period of time right.

Yeah some period of time, like a week few days like the one of us.

Yeah somebody else comes to mind just feel free to share that either in this chat here or in office hours.

There are also I mean Yeah.

So like on talk of like there's the trusted advisor stuff, which will make similar kinds of recommendations.

And then there's I think they're SaaS services as well.

But I don't think that was necessarily what you're looking for.

Does this space for everything right outsourced totally outsourced go.

Yeah Any other questions.

I sent all right.

Well, let's cover a couple of the talking points here that came up.

Sure I know.

I definitely recognize a few people in the call here using Kubernetes a lot.

So I think this Goldilocks is a will be a welcome utility that was shared Goldilocks we haven't they I only learned of it today what I thought was interesting was how they went about implementing it.

So behind the scenes it's using the vertical pod auto scaler in just notification mode or in like debug mode.

So that it's not actually auto scaling your pods but it's making the recommendations for what your pods should be.

And then they slap a UI on that.

And here's what you get.

So I think why this is really valuable is.

Well, if you saw the viral video that went around this past week about how they dubbed it that famous video with Hitler reprimanding his troops.

Well, they took that, and they said that they parody the situation about running companies and production without setting your limits and how asinine that is.

Well, it's unfortunately, it's a pretty common thing.

So helping your developers and your teams know what the resort what the appropriate limits for your pods should be is an important thing.

So this here is a tool to make that easier just building that.

Like when I joined this team that was an issue as well.

I guess at the same time, companies were getting to companies in an early stage all the experience was not there.

And we will find that over time.

We have a lot of evictions going on within the cluster because we didn't have any resource limits set for these things.

So we actually had to go through the process of evaluating which why this kind of tool.

I didn't know existed until after questions.

Well came about which would really help because we actually implemented data to help us give us some color back in terms of what was going on.

And then started to send more traffic to see what 40 points are to determine that.

So I figured that everyone had their own methodology to determine what those resource limits were going to be as well.

And I saw this question for this flip was posted some equation in terms of how to determine the ratio of limits and you know that it was this whole thing so much looking forward to using this tool to make a determination as well.

Yeah, I think that's basically been what everyone's been doing is looking at either their Prometheus or looking at their graph on us or data dogs and determining those limits, which is fine.

What I like about this is that it just dumbs it down to the point where literally copied copy and paste this and stick it in your home values or you know if you're doing raw resources stick it in there and you're good to go.

There are a couple other tools I've seen specific.

So the concept here is right sizing, so right sizing your pods.

So if you're using ocean by spot in Pets.com so ocean is a controller for Kubernetes to right size your nodes.

Ocean also provides a dashboard to help you right size the workloads.

The pots in there.

So that's one option.

The other option is if you have cube cost cube.

Cost is a open source or open, core kind of product that will make recommendations like similar to trusted advisor.

But for Kubernetes and it also gives suggestions on how to right size your pots.

All right then let's see what was the other thing I add here.

So I asked with some feedback got it got a lot of great conversations going on in the general channel.

If you check yesterday related to this.

We're working on an engagement right now and the customer asks so what kind of apps are should we tackle first for migration to Kubernetes.

And it's a common question that we cover and all our engagement.

So I thought I just kind of whip it up and distill it to the characteristics of the ideal app.

I like to say that pretty much anything can run a Kubernetes the constructs are there the primitives are there to sort.

So to say lift and shift traditional workloads.

But traditional workloads aren't going to get the maximum benefit inside of Kubernetes like the ability to work with the horizontal pod out of sailors and use deployments and stuff like that.

So here are some considerations that I jotted down using one factor as the pattern.

So the total pull factor app.

I'm sure you've been working with this stuff for some time.

You've seen those recommendations before.

It looks something like this.

In my mind that they're slightly dated in terms of the terms as it relates to Kubernetes and it's a little bit academic.

So if we look at the explanations for how these things can work.

So OK.

So first of all, this is very opinionated just talking about like specifically Ruby and using gem files.

But let's generalize that right.

So the concept here is if we're talking about Kubernetes is we still want to pin our software releases.

But we also can just generalize that say distribute a darker and Docker file that has all your dependencies there in and into releases.

So that's kind of what I've done here in this link here.

And it's a check list.

So if you can go through and identify an at the check stuff all of these.

It's a perfect candidate to tackle first in terms of migration.

I like to say move your easiest apps first.

Don't move your hardest apps until you build all that operational competence.

So working down on the list here.

Let's see here.

So there's a little bit opinionated.

But we really feel like Polly repos now are so much easier to work with and deploy it with the pipelines to Cooper and Eddie.

So that's why we recommended working with that using obviously get based workflows.

So this is that you have a pull request review approval process.

So that you're not editing m.

Believe it or not, some companies do that.

We don't want that.

And then automated tests.

So if we want to have any type of continuous part of our delivery.

We need to make sure we have foundational tests.

Moving on then to dependencies that things that your services depend on are explicit.

One thing that we see far too often are hard coded host names in source code.

That's really bad.

We got to get those expected either into a configuration file or ideally environment variables like environment variables because those are a first class citizen.

And it is very easy to later services being loosely coupled.

This is that your services can start in any order.

Your your API service must be able to start even if your database isn't yet online.

It's frequently an older pattern where well, if the API can't connect to the database.

Well, then the API just exits and crashes.

And this can create a startup storm where your processes are constantly in a crash loop and things only start to settle once the back office kick in and your applications stop thrashing so much so hiccups.

So would that then be more on the developing team to ensure that kind of control.

Yeah So the idea is here that if you feel like this is not a hard requirements list.

But these are like, well, oh, if this jogs a memory.

Yeah, that's how our application works, then what we ask is that they change the way their application works.

So that it doesn't have these limitations.

So much actually working with a client that actually they got wind of your list, but it took your list the opposite.

Hopefully not on all the points, but most of them.

Oh no.

In fact, they're still using confusion.

Oh stop right there.

Take me back there.

Yeah Yeah.

Yeah Yeah.

So it sounds like you have a big project ahead of you, which might be changing some of the engineering norms that they've adopted over the years, we've been pretty lucky that most of the customers, like a lot of this stuff is intuitive for that we're like stuff they've already been practicing.

But then there would be like one or two things here.

There that are used like like the one thing that we get bit by all the time is like their application might be talking to S3 and then their application has some custom configuration for getting the access key and the secret access key.

So that precludes us using all the awesomeness of data.

Yes SDK for automatically discovering these things if you set up the environment correctly.

So having them undo those kinds of things would be another use case.

My main point would kind of going over these things would be to jog any reactions like anybody is totally against some of these recommendations or if any of these are controversial or anything that I've missed.

So please chime in if that's the case Banking services ideal services are totally stateless that is that you can kind of offload the durability to some other service some other service that's hopefully not running inside of humanity.

So of course, it's not saying the communities is not good for running stateful services.

It is it's just a lot larger scope right.

Managing a database inside of companies versus using art yes let's see there.

Anything else to call out.

This is a common thing.

Applications don't properly handle sig term.

So your apps want to exit when they receive that gracefully.

Some apps just ignore it.

And then what happens is ease waits until the grace period expires and then just you know speed kills your process the hard way, which we want to avoid.

Sticky sessions.

Let's get rid of those.

Oh, yes.

And this one here.

This is it.

This is surprisingly common, actually.

So we're all for feature flags.

We recommend feature flags all the time.

They can be implemented in different ways using environment variables is an awesome way of doing it.

But making your feature flags based on the environment or the stage you're operating in is a shortcut but not really an effective use of feature flags because you can't test that function.

You shouldn't you shouldn't be changing the environment in staging to test the feature.

You should be enabling or disabling the feature itself.

So that's a pet peeve of mine here and related to that.

It's also not using hardcoded host names even configurable host names in your applications.

That's that if you're running Kubernetes that's really the job of your ingress and your application should not be aware here or even care where you even care.

Exactly Yeah.

Alex siegmund posts in chat here.

See you mentioned on court finding that you should listen on non privileged ports.

But what's the harm of having your application math for 80 for example, if it's a website.

So the harm is really that.

Well, the only way you can listen on port 80 and you're inside your app is if your app starts as root.

And then that's contradictory to saying that you should run non privileged process.

So things that you should not be running your processes as root.

Yes, your container can be as rude.

Your application can start up as rude.

It combined to the port as rude and then it can drop permissions.

But so often things do not drop permissions and then you'll have all these services unnecessarily running as root.

And just for the vanity of the service running on a classical port like 80 is it really required.

And that's not the case.

So exactly so Alex says, aha.

So that's why you should run is not inside of the container.

Exactly So that's our point there.

I think a lot of times what I see is like when the doctor finds that you created because it works as a route.

No one if he goes back and say, all right, let me have a run as a privilege to use those.

So it actually works that way.

And they may hit upon button like the road blocks and don't get to work.

Probably not merely because of a lack of knowledge as to how to do that.

And then say, hey, you know, it works.

Let's just leave it.

And then it just goes out there.

And then something happens, and then that pattern gets replicated right because somebody says, oh, how do I deploy a new service.

Oh, just go look at this other people.

And then they copy paste that stuff over and then it replicates and quickly becomes the norm in the organization.

Yeah Yeah.

So logs.

This is nice to have.

I mean, it's not a hard requirement.

But it really does make a log.

Once you have all your logs centralized in a system like Cuban or any of the modern log systems like simple logic or Splunk is that if your logs are structure you're going to be able to develop much more powerful reports.

And most if you're using any language or framework they a lot of them support changing the way your events are emitted these days.

So no reason not to.

The most important thing is just don't write to this because we don't want to have to start doing log rotation and you can if quotas aren't set up correctly, which they usually aren't.

You can fill up the disk on the host OS.

So let's not do that.

Lastly this is another common problem.

I see is migrations run as part of the startup process when the app starts up.

So if you have a distributed environment you're running 25 3,100 pods or whatever, you don't want all of them trying to run a migration you want to, you want that to be an explicit step a separate step possibly run as a upgrade step in your health release or some pipeline process in your continuous delivery pipeline.

So making sure you can run your database migrations as a separate container or as a separate entry point when you start that container is important.

Same thing with Cron jobs.

We know those can be run in a specific way under communities.

So be nice if those are separate container and any sort of batch processing as well.

So the basics here is that these can run as jobs versus everything else should be either deployments or stateful sets or other primitives like a recommendation registered domain.

Put this under oh the register to the.

Exactly I do that.

Yeah, I should do that.

It's a good list.

So if you have any other recommendations.

Feel free to slack those to me.

I'll update this list.

What's your thought on the read only file system.

You know, that's good.

But you know it's probably good default. It's an optimization.

You know, I make a point here that file system access is OK.

It just shouldn't be.

It should be used for buffering or it should be used for caching or things like that.

But it shouldn't be used to persist data.

So perhaps.

Yeah having it scrapped to your point at your root file system should probably be configured at deployment to be read only.

But then provide a scratch space that is right.

I think I'll take a list and sentiment learned.

So I can.

I follow this.

There you go.

Let me follow up.

Let me know how that conversation runs or goes well.

So we've got five minutes left.

That's not enough time to really cover too much else.

Did this jog any memories or any other questions that you guys have something I know.

Yes, I have one for Kubernetes.

What's your experience with this to you.

I see a lot of hype around it.

Oh so I was actually, hypothetically you know thinking of a solution, which we can make where you can look up the geographic location.

And then we can do it again.

And it releases like release this particular piece to India.

And then the rest of the world to see if there was.

Yeah So any means.

How was holistic business, then.

Yeah And I'm sure there were others here will have some insights on that as well.

I'll share kind of my two bit cents on it.

My my biggest regret with this deal is that its CEO is in a first class cape as it is that service mesh functionality is in a first class thing inside a Kubernetes that we have to be deploying this seemingly high overhead of sidecar cars automatically to all our containers when they go out.

That said, the pattern is really required to do some of the more advanced things when you're running microservices for example, the releases that you're talking about here.

So the thing is that Kubernetes is by itself.

The primitives with ingress and services are perfect for a deal for deploying one app.

But then how do you want.

How you create this abstraction right for routing traffic between two apps that provide the same functionality.

And then track traffic shaping between them or when you run that really complex microservices architecture.

How do you get all that tracing between your apps.

And this are the stories that are lacking in Cuba and 80s out of the box.

So I think service meshes are more and more an eventual requirement as the company reaches sophistication in its utilization of communities.

But I would not recommend using a service mesh out of the box until you can appreciate that the primitives that communities ask.

It's kind of one of these things, we need when people start on E. Yes, I think that's great.

I mean, I personally don't like.

Yes, we spent a lot of time with these yes and I think ECM kind of shows the possibilities of what a container management platform can do.

And then once you've been using.

Yes with Terraform or CloudFormation for six months or a year you start to realize some things that you want to do are really hard.

And then you look over at companies and you get those things out of the box and really.

So this is how I kind of look at service measures.

I think you should start with the primitives that you get on communities.

And then once you realize the things that you're trying to do that are really hard to do.

And that the service mesh will solve.

That's the right time to start.

It's better, to grow into that need than try to just as Eric said just have a litter box and then try to use it for a problem that you don't currently have.

Yeah, I agree on that.

So Glu t.k. price supports for these two out of the box that you know eakins was becoming a big supporter of steel and you know looking at some stats that I've seen out of all of the profit, which goes to Kubernetes 19% of traffic is being sold by service measures to you.

So I don't know what the sample size.

But this isn't what I have seen the random numbers here and there.

So I agree on that.

I have done just for deployment in the Cuban 90s.

And I use traffic for that for it to be flight gross controller and it looks just like you know it wasn't a good fit for that particular deployment, but I don't think you will fly a big requirement of where you know you want to deploy a really complex solution for Ken everyday lives is mostly because that's where I see these two, you can bring in the.

So it can.

It's one way of solving that right.

The other one is to use a rich feature flags by buy something like launch darkly or flag or a couple of these up.

There's a couple of open source alternatives and in that model it requires a change kind of a little bit in your development model, and it requires knowledge of how to effectively use feature flags in a safe way.

But I think what I like about it is it puts the control back in your court versus offloading it to the service mesh.

It's kind of like the service mesh is fixing a software problem while the feature flags are fixing it with software.

Yeah Yeah like we had are quite similar scenario about what we did was to fix it with a feature flex.

So that actually took the model of actually shaping the development to suit that scenario where we could actually roll up features to selective users within our testing group.

And then they would actually do what you need to do from there.

And then we can actually break up for the next couple of weeks.

And so on.

Yep plus you get the benefit of immediate rollbacks right now you can just disable it immediately just by flipping the flag off.

Exactly So go.

So I think it.

I don't think there's a black and white answer on it.

I think that some change to the organization.

Some change the release process making smaller, more frequent changes all these things should be adopted including like trust based development.

Yeah, I agree.

Because I think introducing to their birth their idea behind future flags gives a bit more overhead on the internet operations team and taking it away from the developers themselves.

So unless you're going to train your developers to use steel to actually integrate those feature flights into the deployment.

So I think you have a better chance of using that tool darkly.

I'm starting to do that.

Yeah Yeah.

Good luck setting up the app with all this stuff in the Minikube and doing appropriate tests all that stuff for your development.

I want to know how it goes.

I still want to find some time to get my hands dirty beyond that.

But I may be doing it in a month or something.

How does seem that long term upkeep for that to be a headache.

Yeah All right, guys.

When we reach the end of the hour here we got to wrap things up.

So thank you so much for attending.

Remember, these office hours are hosted weekly.

You can register for the next one.

Bye going to cloud posse slash office hours.

Thanks again for sharing everyone.

Thank you, job for the live demo there.

Yes code was really awesome.

A recording of this call is going to be posted to the office hours channel and syndicated to our podcast at podcast.asco.org clown posse.

See you guys next week same time, same place.

Hey Eric can I ask you a quick question.

Yeah, sure.

So I just ran into an issue like 10 minutes ago and thought hate office hours is happening at clock.

All right, let's.

I do got to run to a pediatrician appointment right away.

So let's let's see if I can spot instances have you managed to deploy a target spot task with Terraform so.

Yes So we use that we have it.

We have an example.

I'm not going to say that it's a good example for you to start out with doing it.

But we have a Terraform.

Yes Yes.

Yes Atlantis module in there.

We're deploying Atlantis as you see as Fargate task using our library of Terraform modules on the plus side, it's a great example of showing you how to compose a lot of modules.

It's a great example of showing you that modules are composed of all.

And it's advanced example.

This example is exactly why I don't like you.

Yes And Terraform they don't go together with you there and we think we have a pretty advanced infrastructure.

It's ready in Fargo.

We're just going to go to Target spots.

So what do you see that module is called.

Oh stop.

Sorry Yeah.

No, no, no, no, no, no.

I don't have spot far gates Spock.

No worries, no worries.

We're going to an area that's interesting to provide provider strategy and see if we can make it do something.

But no worries.

Figured I'd just check.

Yeah, thanks for bringing it up, though.

So misunderstood.

Absolutely All right.

Take care.

But

Public “Office Hours” (2020-02-12)

Erik OstermanOffice Hours

Here's the recording from our DevOps “Office Hours” session on 2020-02-12.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours is February 12th 2020.

My name is Eric Ostrom and I'll be leading the conversation.

I'm the CEO and founder of cloud policy.

We are a DevOps accelerator.

We help startups own their infrastructure in record time by building it for you and then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to unmute yourself at any time if you want to jump in and participate if you're tuning in from our podcast or YouTube channel, you can register for these live and interactive sessions just by going to cloud posterior slash office hours again, cloud posse slash office hours.

We host these calls every week will automatically post a recording of this session to the office hours channel as well as follow up with an email.

So you can share with your team.

If you want to share something in private just ask him could temporarily suspend the recording.

With that said, let's kick things off.

So here are some talking points that we can cover to get the conversation going.

Obviously first, I want to first cover any of your questions.

So some things that came across or came up in the past week since we had our last call.

Terraform cloud.

Now supports triggers across workspaces.

John just shared that this morning.

I'll talk about that a little bit.

The new ADA of US clay is available with no more Python dependencies.

However, I'm still not celebrating it entirely based on my initial review.

Also this is really wise is quote that was simply put in our community yesterday or some things like you can't commit to the overhead required to run something you're introducing a weakness into the system rather than a strength as they'll quickly end up in the critical path.

So that was the way that Chris Child's said something and I want to talk about that some more.

See what reactions we get.

But before we go into those things.

Let's see what questions you guys have.

I have one thing when you're going through tariff for Terraform cloud can you also go through just your general experiences with Oprah.

And I were looking at using it earlier this week or just having a little bit of some pain doing so.

Yeah, just some general experience that would be useful.

I can give you some kind of like armchair review of Terraform cloud.

We are not using it in production as part of any customer engagements we've done our own prototypes and pieces.

So I think the best thing would be when we get to that point if the other people on the call that are actually doing it day to day.

I know John bland has been doing a lot better from cloud.

I don't let me paint him on suite ops.

Let's see if he can join and share some of its experiences.

Do you guys know if you can continue using remote state with S3 with Graham cloud.

I couldn't figure out how to do that.

Well, you should be able to let me explain.

Mm-hmm It's a good question.

But IM not 100 percent of what you put into it.

So yeah.

So I Yeah, I cannot speak from experience trying to do it.

What were your problems when you tried.

I mean, I assume you had the best credentials and everything hardcoded.

And if you had that provider block or that back in Setup set up it was airing or it requires that validates that you have a from Workspace back in.

I personally came to find the place where you can even put Intel from cloud the crates.

I would be in environments settings.

So there's the build up using it as an environment variable.

I know by guy I exactly you have to do that for every single workspace yet as retarded as it is.

Exactly we don't like that either.

No awesome.

John's joining us right now.

So John has spent a lot of time with Terraform cloud.

So he can probably answer some of these questions or you and Mark.

Welcome howdy.

I is going to have you mark have you gone to play with Terry from cloud at all yet.

No, I haven't even browsed the docs.

OK Just curious.

But Brian, your.

You've been dabbling with turn from cloud a little bit or.

Yeah, just taking it out.

It was because I was working on data from provision provisioning of my EFS housing on a kill two birds with one stone.

Yeah And dabbled with it didn't love it.

So I probably am just going to do my on provisioning code fresh.

Yeah, it's a little bit more intuitive for me, especially because I used her from CLI workspaces.

Yeah, it it'll be a lot easier for me to implement something that's driving reusable if I were to just do like a cut.

I could fetch that already does the right like workspaces commands for me.

Yeah, I'm hoping that maybe in a couple of weeks or a few weeks, maybe we can do a revised code fresh Terraform demo on this.

We did one about a year ago or more.

But this time Germany on my team is working on it.

And we want to kind of recreate some of the constructs of Terraform cloud.

But inside a code fresh.

So that it integrates more with all the other pipelines and workflows we have there.

On the topic of terror from I would want it, I would want it.

So like I go back and forth on my decision to use Terraform workspaces.

I love the fact that it was so easy to use the same configuration code for so many different environments and I've been able to take advantage of that.

Why didn't love was having to kind of act together way to get all the back end to point to different S3 buckets and different AWS accounts.

I'm curious if anybody's ever worked with that.

Plus if they ever switched off of it to go the other route where we kind of have configuration per her database account that might be less dry.

Such using tigre.

OK, let's.

Yeah OK, let's table that temporarily.

I see John just joined us here.

Let's start with the first question there on first hand accounts and experiences using Terraform cloud.

I know John has spent a lot of time with it.

So I'd love to for you guys to hear from him.

He's also a great speaker.

Well, thank you.

I've seen the check in the mail.

I actually I've done a lot.

Whatever form cloud as the primary c.I. for all of our Terraform and generally, I like it mainly because it's a little malleable like you can use it for like the whole Ci aspect or you can run your c.I.

I mean, you're Terraform in fresh air anywhere.

And it's just your back.

And that's it.

Instead of having S3 buckets everywhere all your state is just stored.

They're really easy to do remote data things that I saw terraforming it is really easy to do.

They have a provider that gives you poor access.

And I did see on the agenda there the talking points the workspace, the run triggers actually did it video or that be wasn't that only to already.

Well, yeah, except now.

Yeah, I've wanted to just play with it non-zero models were recorded and it's decent.

I think they have some improvements to do to be able to visualize it.

But we actually do utilize I forget who it was speaking Brian.

I think we do utilize multiple AWS environments and from our Terraform scripts where we set it up.

We actually have each workspace control or we tell each workspace, which the environment is going to use.

Now this is using access key and secretly preferably we'd have something a little more cleaner that was a lot more secure than just having stale access fees sitting around.

So that's one gripe I didn't have with it.

But in general, we've had a lot of success running it in Terraform cloud water.

OK So you're leveraging like the Terraform cloud.

I don't want to say it's like a feature chair from cloud.

But the best practice a chair from cloud using workspaces or using lots of workspaces and terrified and how has that been.

Because while workspaces has existed for some time it wasn't previously recommended as a pattern for separating you know multiple stages like production versus dev.

How's that working out.

It actually worked out really well, because locally where you set it up, you can set up locally.

Sure Yeah.

So I mean, the reflection here once again.

There you go.

There is a difference between tech from cloud workspaces and from CLI workspace stuff.

Yeah, sure.

So the this is just my little sample account that I play around with the tutorials.

But if this was set up locally in the CLI and because this prefix is the same.

I can set my prefix locally and my local workspaces would be called integer and separated.

And so locally.

It maps directly to the local CLI actually.

So I can say Terraform workspace, select integer.

And now I'm on that.

And I can see a plan and it'll run not playing right writing on Terraform cloud.

I don't have to have variables or nothing locally.

It'll run everything in there unless it's set up as a local because you have multiple settings here.

Would you be able to do that right now.

Sorry to put you on the spot.

This is exactly what we're trying to do.

And if you're saying that it's actually much easier than I initially thought then I might reinvestigate this.

Let's try it.

But thanks for roll.

It's lights up for everyone else.

Maybe if you just joined.

John has been put on the spot here to do a unplanned for demo of Terraform cloud and workspaces.

Possibly even the new beta triggers function.

But on a different sort as it's set up there and working.

I can definitely walk through the triggers.

Peter would be especially useful for us as well, because there are scenarios where we run multiple turns from the place where you can sleep to reports, a serious long shows.

And we can also get close to Yum It want to have like five minutes and then we can talk about some other things and come right back to this.

Yeah let me get a few of my things to worry about just the connection and all that sort of stuff.

Yeah Cool.

Let's do that.

And we'll just keep the conversation going on, other things.

See what we can talk about there.

All right.

Any other questions.

All right.

So I guess I'm going to skip the Terraform cloud talking point about triggers across workspaces.

I think that's going to be really awesome to get a demo.

Basically to set that up as you decompose your terror lists into multiple projects.

How do you still kind of get that same experience where once you apply changes in one environment can trigger changes in another environment.

And that's what these triggers are now for moving on AWS has announced this week, that there's a new clay available.

I'm not sure how new it is per se, but they are providing a Binary Release of this clay.

I suppose it's still probably in Python they're just compiling it to write code might.

The downside from when I was looking at it is it's not just a single binary, you can go download somewhere there's still like AWS clay installer.

So they're following like the Java pattern right where you still got to download zips and sell stuff.

Personally, I just I've gotten so spoiled by go projects which distribute single self-contained binary.

And I just download that from get up to this page.

And I'm set to go.

So has anybody given this new client, a shot.

Now you mark calling you out.

All right.

Don't buy it yet.

Cool And then there was one other thing that came up this week as somebody was asking kind of like you know I think the question the background question was like alternatives to running bolt and if it's worth it to run bolt and Chris Chris files responded quite succinctly so thank you.

We've heard this said before, but I thought this is a really succinct way of putting it.

And that's like if you can't commit to the overhead required to run some new technology like Cuba and 80s balls or console you're introducing a weakness into the system rather than a strength as the quickly end up in the critical path of your operations.

And I think this really resonated with me, especially since we run this DevOps accelerator and our whole premise is that we want our customers to run in and take ownership of their infrastructure.

But if they don't understand what they have, and what they're running, then it is a massive liability at the same time, which is why we only work with customers ultimately that have some in-house expertise to take over this stuff.

And also Alex just Yeah getting some thumbs up here from Alex Eagleman and Bryan side both agreeing with this statement actually.

Yeah, that actual response to the that's what's the response to something I mentioned.

So the original person asking a question that came about came at then and also get.

From what I understand.

And I just thought that maybe I'd remind them you know like maybe want to just give centralized management a shot the volt really is going to want to sound like I'm pushing it that much.

But the reality is if you look at a hash record that created Terraform created volt they make most of their money from both.

They really do put a lot of product hours into featuring sorry.

Why didn't the feature set that product.

So really, it's a mature solution.

Yeah 100 percent when it comes to houses response.

It's very true.

What happens is actually, a lot of the time is if you don't commit a lot of people they like they take the route token, and then they distribute it to everybody and it becomes more of a security hold than a security feature really.

Yeah And really, it's reminiscent of terminators as well.

In my opinion, we really need like a large team of people putting energy into that to actually make full use of it.

So it's not a burn anymore.

It's actually something that can help you pick up velocity exactly like you want to take these things when it gives you an advantage a competitive advantage for your business or the problems you're trying to solve.

Not just because it's a cool toy or sounds interesting, but Yeah, those are really good summary.

Thank you for it.

For a peer.

Just secrets management.

I would.

Probably doesn't have all the bells and whistles of a vault obviously.

But I went with it to be a secret manager.

Or you can use parameters store does it much easier to maintain.

Yeah Are you making copious use of the lambdas as well with Secrets Manager to do automatic rotations.

Not yet by but definitely something that I wish I had time for.

Yeah, because it also requires application changes right to get all right, John, are you.

You need some more time.

No really.

All right.

Awesome Let's get the show started.

So this is going to be a tear from cloud demo and possibly a demo of triggers across workspaces, which is a better feature and terrifying cloud.

So this isn't going to actually give me a plan, because I don't have the actual code for these repos locally on this computer.

But this is the time to show how the workspaces actually work.

So essentially, you set up your back in as remote hostname.

You don't really have to have it.

That's the default. What organization and in this case, I'm saying prefix.

So if I actually change this in the say name to random.

And if I did in a net on this.

It no.

Yes, I need to.

Yes, there because I already initialize.

That's why so by setting the name essentially, it's supposed to.

What did I miss.

They weren't just a minute ago, I promise.

It's always this way.

Let me clean up this dirt one.

But by saying a name I kind of found this by accident.

I didn't mean for it to do it.

But they go it'll actually create the workspace for you.

So you technically don't have to do anything to create a workspace.

It'll do it for you.

In that case, it doesn't give all the configuration there.

But if you utilize prefix here instead of name just wipe it.

What it does is basically create to Terraform cloud and says, hey, everything with this as a prefix is going to be my workspaces.

So in this case, I can say, let me select the integer workspace.

That's awesome.

And so if I do a workspace listed, you'll see that same list there.

And then you can do select separator any one of those.

You can also.

And I'd have an alias for Terraform by the way.

So that's why I'm just saying to you.

You can also get your state.

Of course, if you have access to that.

So we can say show we can pull that locally.

And so it'll output the actual state here.

And then my favorite part is actually planning.

So I don't have any variables everything.

Mind you this workspace doesn't have a lot anyway.

But it's actually running this plan on Terraform cloud.

It's piping the same output.

It's common just like you would normally expect.

So it's piping everything to my local CLI here.

But for console.

But it's basically this.

So you can see the output matches.

But the beauty of this part is I can have all of my variables in here completely hidden.

Any secrets that I want and none of my developers ever see them.

They never know that they exist or anything locally, but they can play an all day and do whatever they want.

And so this is destroying because I don't have the code.

So it's like, well, it's gone.

This random resource integer, but that's the quick run through of utilizing those workspaces locally.

I mean, it's really just this.

And I have a tee up bar set with my token locally.

So that's I have that work to also go back to the tower from cloud UI that we had the settings.

The variables because just because it came out the second a little while ago.

Brian was asking about environment variables you see there that bottom.

Brian Yeah.

If you need obvious credentials you can stick with me.

OK You're not using a dubious provider right now.

No, I'm not going to put them off.

Yeah, no.

OK And nothing precludes you from using the obvious provider.

So long as you still provide the credentials.

Yeah Yeah.

So you know your random workspace.

I got created.

Do you have to go.

Do you manually go in and add the eight of his grades for those I actually Terraform the entire thing.

So that's all done through Terraform so basically terraforming Terraform cloud.

So essentially I'll generate a workspace and that'll help my general settings there, and then I'll just you the last two or three variables.

And you can do environment variables this way too.

So you can kind of tie this in too with like your token refreshes and things of that sort.

Especially those I've mentioned or.

So there's the ball provider and you can actually tap into a ball here actually gets you a token key from your AWS or however you want to do your authentication there get your token from AWS your access key, et cetera.

And then plug that into Terraform cloud.

So that way it's all automated.

And you're not just wasting variables in there.

Do you have to run your own vault or do they run that for you know you could if you're running your own.

So I would assume assume someone doesn't have all.

How do you plumb anybody's credentials and or ask.

Yes token generation in.

So we just came.

So just utilize came to mark them all as you like.

You don't want to put that stuff in code right.

And so use came as encrypt the values manually.

We built a little internal tool to do it.

But encrypt those values put them in code and then once the workspace actually runs, it'll actually create manage update all the other workspaces.

So in essence, you have one workspace that has all the references to all the other work spaces are supposed to create and it'll configure everything.

And so in that one, it'll decrypted came and then add it to the project or the specific workspace as a environment variable.

And so there is when you say came as you're using SFA we do use that system to store that the product of the commercial kitchen blob no.

Now we just encrypt the value in games.

But we actually we actually use it as a Sim for farm gate secrets.

But this repo here is where I actually have a video of it where I kind of walk through how to do the full thing with Terraform your own workspace and then using their remote data as well to pull from.

And so the pipeline feature that was playing with earlier essentially this repo.

I mean, this workspace is going to trigger this one it's going to trigger this.

And so the way it's set up and they definitely say do not use this in production yet.

But these run triggers here.

So you can specify all your workspaces that you want to actually trigger something here.

And so anytime they trigger oh it's a loop because I already have that one set anytime they trigger they will actually trigger the next one.

When we delete these real quick, and I'll just show a click Run.

And so if I cue this one where it finally kicks off.

There we go.

So that's going to go through the plan.

And this is just going to generate a random integer.

It has an output and all it does is output the results of random integer.

And so once it finishes the plan is actually going to show me which or any workspaces that it will trigger next.

In this case random separator so if I go ahead and Confirm and so this one is applying if I come over to random separator nothing's happening here.

I'm not I haven't hit anything haven't pressing buttons.

This appliance finished and there's random separator that was triggered automatically.

And so you can see like it's essentially going to go down the line there.

The good thing is it will tell you here that the run was triggered from this workspace.

And it was triggered from that run.

So you can kind of rabbit hole your way backwards into finding where and what actually triggered that one.

And so if I confirm and apply on this one that someone is actually going to trigger the last one, which is random pat.

Now, pull up the code real quick as well so that when finished and random pat is here.

And there's random pit running.

Quick, quick question.

Do you guys ever use it because I know what the VCR is integrations.

You can actually kick off a plan and apply from GitHub for example together is that.

Yes exclusively.

Yeah And so this is these repos are actually tied up to get up as well.

You do the confirmed circuit collaborative send them to the UI here do it through the UI, you could tie it in like there's a CSI you can tie it in and do it through any c really.

And so there's the end of the pipeline.

But as you can kind of tail like you will rabbit hole right like you're here.

And then it's like, well, was generated from here and you go back there and it's like, well, one was generated from another one.

And so then you end up having to go back there.

So a good visual tool would be really useful.

Jenkins blue ocean or something where or circle C I kind of chose you the path of something would be really useful, but it's kind of interesting.

I'm sure that's coming.

Yeah So this code just to show this real quick isn't using the remote state.

And so I set up variables manually for this demo, but utilizing these variables.

And it just uses the remote state data to get the value from the integer workspace.

And then the pet basically uses to remote datas to get the workspace state for both the separator and the integer workspace and then it just uses it down here in the random bit.

So it's decent.

I'm liking where they're going.

Yeah, I think this has some potential, especially to minimize the configuration drift and simplify the effort of ensuring that change is promoted through multiple workspaces.

And the good thing is like if you saw on separator I actually had to confirm and of course, that depends on your settings.

Of course.

Because you can tell it if you want to auto apply or manual apply in this case, I'm set to make makes sense.

But it goes auto.

They would have just cheered it all the way down the rewind as many.

And so the practicality of it is like if you separate your networking stack from your application and you update your networking stack and for whatever reason, it needs to run the application form as well.

You can kind of automate that now as opposed to where you ran that one.

Now the one person in the company that knows the order thing can go in and manually hit q on something else.

So yeah.

So I think there was some questions in the chat here.

Let's see.

Alex Sieckmann asks, how do you handle the chicken and the egg problem with bootstrapping saying AWS account and then Terraform enterprise to have creds and such.

Actually a good question.

So it would have to come from some somewhere right.

So like especially if you set up like AWS Organizations.

And you had like your root account that you were set with you can utilize that Reed account.

And you can actually Terraform it obvious orbs and then once that new account is set up, you can assume role and those sort of things in order to access that other client.

I mean that other account.

But there is still some manual aspects of that right.

Like you have to search your email address and then that email address is your root account.

And then you want to kind of lock that down.

So you can do use like some service control policies and things of that sort.

But there's still a little bit of a manual piece to bootstrap a full account.

That's the part that really sucks and we go through this in every engagement right.

Because if you don't reset the password and have MFA added anybody with access to email of that root account of that sub account or for that matter can do a password reset.

And take over the account.

Yeah, exactly.

So there was a question about automated destruction.

So terrifying cloud actually requires you to set confirm destroy.

So what.

So if you do automate confirm destroy set to 1, then yes, you can.

You can delete from trip cloud.

But you can't cure or destroy unless you actually have that environment variable z So you can set it like and have it as a part of your workspace and then you destroy will actually destroy it.

Nice cool.

But yes, that aspect of the chicken and egg is something that is definitely something that could be cleaned up on either side, just to help the bootstrapping especially for the clients that have like 78 IBS accounts.

Yeah which isn't as abnormal as it sounds it's one enterprise.

Yeah Any other questions related to Terraform cloud and put this in queue.

Not sure cost value.

Opinions now it's way better than before.

It was rough like multiple Tens of thousands of dollars for the Enterprise version.

And so now it's actually to where you can basically sign up and utilize it now for free.

You have to keep in mind that it is still a subset of different features.

But it is really good.

And it is.

Obviously, if you're a small team and you don't have $100,000 to spend that yet.

But I figure that it is a subset.

Yep And so the main things that you do miss cost estimation is actually pretty cool.

It will tell you if you're starting up like the T3 micro how much that's going to cost or involve large it'll kind of give you those pieces as an estimate.

But it kind of helps.

And you can utilize sentinel which is basically poly policy as code utilized sentinel and say no one can create a project.

If the cost is over $1,000 or whatever.

Or you can say, hey notify somebody or whatever already requires approval.

And so syncing it was actually pretty useful.

And then, of course, you get the normal sample.

And this is the private install a small sample clustering as you go up.

But you can.

But funny how it goes from unlimited worth.

Everything else is unlimited and unlimited workspaces the enterprise no matter limited anymore just 100 plus.

But I mean, really, this free up to five users is pretty much all you really need unless you are on a larger team than movie roles and the role basically plan read right now and admin support.

But the private registry is actually pretty cool too.

I think, as I said, as a profession need to push back on enterprises that try to make you pay for security here the MLS ISO is pretty much the only thing controlling the keys to your castle and I don't think that it's right for people to hold security as a tool for making money.

Yeah, that's like always there 1, 2 right.

Yeah, but I hope we get the industry aligned with security as a first printing like the first class it isn't all products, not just for if you're willing to pay.

Yeah, there were other.

We've shared this before like the SSL attacks website, go to ss no doubt tax, then it says it all.

Yet it's funny.

It's the wall of shame and the price increases with pleasure.

Areas things I need to add terrible glad to know.

Exactly base price pressure.

So price.

It's just insane.

The gouging that goes on.

Look at HubSpot my Jesus.

63 percent increase 586 call us.

Well, you want to factor that's going to cost you.

Yeah Yeah.

Two factors another.

Yeah Well, I mean that comes usually with whatever you picked as you say.

So Right but I don't just a cloud offer to factor.

It does.

Yeah So I have one set up here just normal.

I use all the networks, but then again, I'm also I have my day jobs account on theirs.

And it's paid.

So maybe that's good.

Yeah, maybe that's where it comes from maybe that's not over for.

Any other questions on cloud for a small team.

Do you guys think $7 a month is worth it for just set to no.

It depends what kind of roles do you have in place for like your instruction.

My team is a team of one right now.

So there is no like actual rules automated but obviously being proactive about it.

Just when I'm speaking of infrastructure.

But as the team grows.

And I think we're growing our security function here too.

I think a lot of security engineers I'm talking to, they're doing it manually where they go into your database console and like check it you know your last $3 are public.

I was saying like we could automate this with a sentinel.

I was curious if I do say a team of five is $7 a month worth it if you have those sort of rules in place.

Yes Yeah.

This is basically like a pre-emptive you can choose to block or you can just walk.

And so in this case, it was essentially a function and they're adding to this resources and they end up pulling it back.

Right And so you can basically take those things.

And as you validate them you can give specific messages that you want.

And basically say yay or nay if it's approved and it'll basically block the run.

So it is a good way to catch it ahead of time.

And you can catch some of those things.

Another thing that you can do as a team of security.

We talked about open policy agent integration with Terraform that can also do some of this stuff and also someone else recommended comm test, which is built on top of.

OK and add support for HCl and Terraform plans as well.

Yeah, there's a little library that's like a Lancer as well to offset this one.

And it's pretty decent too.

And I can catch like $3 an HTTP where so they should be.

Yes And it also provides a way here to where you can take these rules and you can actually ignore it for like a specific line like if you want an ingress here and you don't care about this.

This rule here.

It's just it's a requirement.

You have to have it mean you can ignore it.

But you can tie this directly and with see I can just run to you set up for locally with Docker and so I would probably start there as opposed to going to sentinel because then you do have to manage quite well you have to write the Central Policy you need to manage that.

And then you assume a lot of that risk at that point to you know all you have to develop all those opinions on what you mean.

Right well that makes a lot of sense though, when you have like cyclops that focus on that if you're the 119 and suddenly just adds to your plate.

Can you share this link to get up 45 seconds and office hours channel and the episodes after shooting for.

And let's see.

So we got 15 minutes left or so 15.

There were some other questions here unrelated to Terraform cloud as one to see if we can get to that.

Alex, do you still want to talk about this your Prometheus question.

See I can't.

He is chatting in the Zoom chat.

Looks like Zac helped you out with the answer a little bit.

I guess I'll just read it for everyone else's benefit.

How do you know.

Let's see.

Assuming you have the Prometheus already running the Prometheus operator and you run gipsy deal get Prometheus all names faces you'd set up a Service Monitor.

Oh, this is from Zach.

I did not.

Yeah, I gave him like 1,000 foot view of an answer for how to set up a Service Monitor for Prometheus a custom service running in a cluster.

I wouldn't have answers so quickly if I weren't your candidate the exact same moment.

That's cool.

Yeah, maybe we can.

The essence, you don't have to make today.

Let's punt on the question to next week if you're able to join us and we can talk more about service mind stuff.

I'm curious if anyone else is using anything other than custom rules for you know, if there's any other tooling out there for Service Monitoring or adding you know people have multiple teams, multiple microservices and you know if there's any organizational strategies around tooling.

This in a declarative manner any I can answer how we do it.

But I'm interested also first before I talk about what other people are doing.

We've talked about just monitoring the individual services oh Yeah.

Just Prometheus right.

Hangs in multiple services and you know there could be a thumb roll.

You know that some of them come and go ensuring that generic monitoring gets put in place and teams that they want to put extra and additional monitoring and you know, for items you know that those are also able to be deployed.

Yeah I'm just really struggling with the getting a good template going I thought.

Yeah Are you using helm your.

I am.

Yeah then sorry I missed my computer's not responsive here.

So then yeah, I can kind of show you because this came up recently, for example, with century.

That's the good example, my ship is going to invest one but I'll show it.

So we've talked about in the past that we use this chart called the chart that we developed.

Zach, are you familiar with the chart.

Dude I am so familiar with the model chart.

OK created my own version of it.

So yeah.

Well, thank you.

Yeah So the pattern there that we have.

And then are you familiar with the service monitors that we like the Prometheus findings that we have in the model chart, you know I probably should go revisit that on that.

Honestly, I haven't looked at it.

OK So I will give an example of that here and I'm getting it cued up in my browser.

So let me rephrase the question or let me rephrase.

But let me restate the question and add some additional context.

So in my own words, I think what you're describing is how do you offload the burden of how a application is monitored to the application developers themselves or the teams at least responsible for that service.

In the old school model it kind of be like employer your services, and then you throw it over the fence to ops and say, hey, I was deployed.

Update now those are some archaic system like that.

And monitoring and that never worked well.

And it's like very much like this data center mentality static infrastructure.

And then you have a different model, which is kind of like an Datadog where it will maybe auto discover some of the things running there and figure out a strategy for monitoring it, which is magical but it isn't very scalable right.

Magic doesn't scale.

So you want something that allows configuration, but also doesn't bottleneck on some team to roll that stuff out.

So this is why I think Prometheus operators pretty rad.

Because you can deploy your monitoring configuration alongside your apps themselves.

So we had this just came up kind of like what you said Zack about you were just actively working on this other problem that Alex heads.

That's why I was fresh in your memory.

So this is something that we did yesterday, actually.

So we run a century on prem.

We've had some issues lately with century stop ingesting new events while everything seems totally normal.

So it's passing health checks.

Everything's running everything's green and hunky Dory.

But we wanted to catch the situation where it stopped processing events.

So at the bottom here, we've added using the motto chart, for example, we don't have to create a separate release for this technique but we're doing that here and using them on a chart.

What we do is then we add the Prometheus rules.

So we can monitor that the rate or the delta here of jobs started in five minutes over a five minute period is not zero or in this case is 0.

So that's when we alert k minus 1.

My point here.

Those So let's see are we using mono chart to deploy this.

Do you have something that keeps a baseline level of jobs starting a busy cluster a busy environment.

So like is this generalizable no.

But in this environment.

So here's the thing.

Oh, I think it's generalizable because you could make that Cron job you know it does.

So in our case, we have century Kubernetes deployed.

So we have a pod inside the cluster that is ingesting all of the events from the Kubernetes API and sending those to century.

So you could say that we just buy it by having that installed.

We have our own event generator because Kubernetes is always doing something right.

So we ran when we ran this query we side identified the two times over the past month that had the outage.

So we deployed it, and went live with that.

But I just want it.

So this mono chart though, is this pattern where you can define one chart that describes the interface of a Mike or service in your organization.

This happens to be ours that we use in customer engagements.

But you can add you can forget or you can create your own that does the same kind of thing.

And let me go over to our charts here and see Mike in more a different example that we have.

So here's an example, like a simple example of deploying an engine x container using our motto chart.

And the idea is that, what does everything you deploy to coordinate his needs.

Well, it needs.

Well, OK, if you're pulling private images you're going to need maybe possibly you'll need a secret.

So we define a way of doing port secrets.

Everything is going to need an image.

So we define a way of specifying the image.

Most things are going to need config maps.

We define a simple way of having consistent config maps and all of these things are a lot more terse than writing the raw Kubernetes resources.

But you can also then start adding other things in here like then we provide a simple way of defining infinity rules.

So you can specify an inline affinity rule, which is very verbose like this, or you can just use one of the kind of the macros the place holder ones that we define here should be on different node.

And this is an example of how you can kind of create a library of common kinds of alerts that we deploy.

Now I'm talking.

I'm conflating two things affinity rules with alerts.

I just happen to have an example here of infinity in helm files here to share your screen.

Oh my god.

I can do that again.

But I'm just always used to having my screen shared.

So I. Yeah So sorry.

OK So this makes it a little less handwaving then by seeing my screen here.

Here's what I had open, which was just an example of using our monocytes chart to define the Prometheus rules to alert on centuries.

So here at saying century job started minus a century job started five minutes ago.

And if that's zero we weren't on.

So mono chart itself.

Here are using Monroe chart just to define some rules.

But Monroe chart allows you to define your deployment.

So here's where deploying engine x we're setting some config map values we're setting some secret some environment variables.

Here's the definition of the deployment.

But we also, unfortunately, we don't have adjacent schema spec for this yet.

So you kind of got look at our examples of how we use Monroe chart.

And that's a drawback if I search for this here that we'll find a better example.

Who's so for example, I'm not sure Calhoun is going to help fly the where we use Monroe chart frequently is a lot of upstream charts that we depend on.

Don't always provide the resources we need.

So then we can use Monroe chart much like we used the rod chart to define the rules.

So here is where we're deploying Cam for a k I.

This is a controller that pulls metrics out of k I am and sends them to Prometheus.

So somebody provided a container for us.

But the chart was apart.

So we just used our monocytes chart instead.

So here we.

Define a bunch of service RGB monitor rules to monitor.

In this case kIm so this is complicated like using the raw expressions for Prometheus but I don't want to say that like in your case.

Zack what I would do is I would define canned policies like this that you can enable in your chart for four typical types of services.

OK So that is the route I'm going.

And so with the sanity check means I'm not going the wrong round.

I know it's just it seems like a lot of work.

It is.

But the thing is like so.

But nothing else does.

There's no one else is doing this.

So this is like I say, I don't see.

I haven't seen any other option out there, aside from magical auto discovery of things running for monitoring this thing where applications, deploy their own configuration for monitoring very.

I don't know of any Sas product that does that.

And it's very specific to the team and organization and the labeling that you have in place.

Yeah So.

All right.

Well, I mean, did the model chart is the right route in my mind as well.

I've been going that route.

I call it chart architecture.

Yeah but I'm using that to do a bunch of other deployments and not forget to microservice.

So this will be rolled into it.

So thank you for the answer.

Yeah, no, I want to just add one other thing that came up just for to help contrast the significance of what we're showing here is yes, this stuff is a bit messy.

I wish this could be cleaned up.

And it wasn't as dense, but when you compare this to like let's say, Datadog and Datadog has an API.

There's a careful provider for Datadog.

But I would say that's the classic way of setting a monitoring.

It's a tad better than using nodules because there's an API, you can use Terraform but it's not much better than using nodules because there's still this thing where you deploy your app and then this other thing has to run to configure monitoring for that versus what we're saying here is we deploy the app and the monitoring of the app as one package to Kubernetes using.

Well, we're almost out of time today.

Are there any last thoughts or questions related to perhaps as Prometheus stuff.

I didn't check if you posted anything else here.

Alex and Chad thank you for the Terraform cloud demo.

Thank you.

That was all a demo.

Thanks, man.

No problem.

Was month month this year.

I think it's half on sales next month.

The lies and generalizations which can be hard.

I suppose for most HP REST API.

You could do some kind of anomaly detection or basic five minute alerts but there is Yeah, there's not a general, there's no general metrics across all kinds of services.

So Yeah, that's right, Alex.

So that's what all these other services do like data dogs this thing is they'll provide you some good general kinds of alerts but nothing purpose built for your app.

All right then let's see.

I'm just going to be up closing.

Slide here.

Well well there you go.

There's my secret sauce.

That's what we're doing here.

We're at the end of the hour.

There are some links for you guys to check out.

You enjoyed office hours today.

Go ahead join our slack team if you haven't already joined that.

You can go to a cloud posse slash slack you sign up for a newsletter by going to cloud posse slash newsletter.

If you ever get registered for office hours.

Definitely go there and sign up.

So you get the calendar invite for future sessions.

We post these every single Wednesday.

We syndicate these to our podcasts.

If you go to cloud policy slash podcast.

You can find out the links where you can subscribe to this.

Like iTunes or whatever podcast player use connect with me on LinkedIn.

And thanks again for.

Yeah, for all your input and participation area.

This is awesome.

What makes meet UPS possible.

Thank you, job for that presentation.

And I'll see you guys all in the hall next week.

Take care.

Thank you guys.

Thank you.

Public “Office Hours” (2020-02-05)

Erik OstermanOffice Hours

Here's the recording from our DevOps “Office Hours” session on 2020-02-05.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours.

It's February 5th 2020.

My name is Eric Osterman and I'll be meeting conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator.

We help startups own their infrastructure by building it for you and then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to unleash yourself at any time if you want to jump in and participate.

If you're tuning in from our podcast or on our YouTube channel, you can register for these live and interactive sessions by going to cloud pop second slash office hours.

We all these we host these calls every week and will automatically post a video recording of this session to the office hours channel as well as follow up with an email.

So you can share it with your team.

If you want to share something in private.

Just ask.

And we can temporarily suspend the recording.

And with that said, let's kick it off.

So I don't have really any demos prepared for today or more interesting talking points.

One of the things that we didn't get to cover on the last call or maybe what some of the interesting DevOps interview questions are what are some of your favorites.

Interestingly enough.

This has come up a couple of times lately in the jobs channel either people going in for hiring or people using some of these questions in their recent interviews.

So I'd like to hear that when Brian Tye who's a regular meals.

He hasn't joined yet.

So we'll probably cover that as soon as he joins.

He has been working on adopting Prometheus with Griffin on EFS but he has a very unique use case compared to others.

He does ephemeral blue green clusters for their environments and he's had a challenge using the CSS provision or on ephemeral clusters.

So you need a different strategy.

So he's got to talk about what that strategy could look like.

But I'm going to wait until they grow quicker.

Ender I work with Brian, I'll grab him and love.

Now OK, awesome.

Thanks Oh, yeah Andrea.

All right.

In the meantime, let's turn mic over.

Anybody else have some questions something interesting.

They're working on something, I want to share.

It's really open ended.

Well, I'm putting together.

I've been doing this as a skunk works at the office for the last couple of months.

But I'm putting together like a showcase of open source DevOps.

They don't have to be open source, but they have UPS tools.

OK So if anybody wants to anybody has something they want to contribute or any experiments they want to run or anything like that.

There Welcome to do that.

Cool can you just mention that in the office hours channel so that they know how to read you so Mike going to put you on the spot here.

What are you working on.

How'd you end up ending up in arms.

Yeah, so I'm we.

My companies recently started using Terraform and we found that cloud policy posi templates to be very helpful, especially when learning at all.

And so I just came here to kind of find out some practices beyond what's available online and a bunch of us on our team actually read the Terraform up and running the second edition.

Yes, that's a good one.

Yeah Yeah.

So we're just trying and more media specifically under trying to find best practices.

And I guess one of my questions was going to be, how does everybody kind of lay out their terror Terraform like I understand the concept of modules being reusable but next step is like defining different environments.

So like we're going to be using separate AWS accounts for different environments.

And so just wanted to get to more expert advice from you guys are just also just learn more about DevOps in general.

Yeah, it's a broad definition.

Nobody has an accurate their own.

Everybody has their own definition of DevOps.

So Yeah, it's loaded.

Really we don't even have to talk about that.

I go there to block all.

So Yeah, it's a hot topic.

I'm sure a lot of people here you can share kind of what their structure is.

There's no like canonical source of truth and there's been I would also like to say that there has been a pretty standardized convention for how layout projects for Terraform that I think is being challenged with the release of Terraform cloud.

And what I want to point out here is that.

So they actually caught dogs and gotten a lot better, especially like becoming a little bit more opinionated or idiomatic on how to arrange things.

And that's a good thing because they're the maintainers of.

So one of things I came across a few months ago when I was looking to explain this to a customer was, what the different strategies for.

And here, they kind of lay out, for example, the common repo architectures as my mouse pings and I can't scroll down.

This is the problem of screen sharing with do it.

All right.

I'll wait for that to wake up and continue talking what it lays out here is basically three strategies.

One is the strategy basically that hashi corp. is now recommending which you can briefly see at the top of my screen here, which is using them on our people or using kind of poorly mono repos.

What I mean by that is maybe breaking it out.

Maybe by a purpose.

So maybe you have one repository for networking one repository for some teams project or something like that.

But what you do is you don't have environment.

Let me see if I can.

Everyone Waking up as my mouse is going crazy there you go.

So multiple workspaces for people.

So what.

What has she caught.

But started doing was saying, hey, let's use work the workspaces concept for environments.

Even though originally, they said in their documentation don't use workspaces this way.

So I don't know if you saw it anywhere but now they've done a mea culpa on that an about face and we've been taking a second look at this.

And I think that it's making projects a lot easier actually for developers to understand when you just have the code and you just have the configuration and you don't conflate the two.

So the other approach is like more traditional and development, you have maybe a branch for each environment.

There's a controversial thing as well.

I don't like it because you have long live branches and keeping them in sync and merge conflicts and managing that they don't diverge is not.

It still takes extra effort.

Also, if you're an organization that leaves and trunk based development, then the branch mile won't long based long live branches like this isn't ideal.

And then there's the directory structure.

This is what's going to get to.

It's like this has been the canonical way of organizing Terraform projects maybe in large part because you know grunt works has a lot of influence in this area and with the usage of terror grants and tools like that.

This has been a good way of doing it, but there's some problems with this.

So what I like about it.

First of all, as you have the separate modules folder right in the modules folder has your what we call route modules that top level implications.

And those are reusable.

That's like your service catalog the developers.

And then you have your environments here and those kind of describe everything that you have for production everything that you have for staging and like these might be broken out more like you should have project folders enterprise.

You wouldn't have them.

This would be considered like a monolithic project or terror list.

You don't want triplets right.

So you underpriced you maybe have VPC you'd have maybe chaos and you'd have maybe some other project or service that you have.

And then under there all the Terraform code.

The problem is that these things still end up diverging because you have to remember to open for requests to promote those changes to all to all the environments, or you have to create one heavy.

Pull request that modifies all those environments.

This has been a big pain for us.

Even at cloud passes.

So we started off with an interesting approach a cloud posse which is, which is to organize accounts treat accounts like applications.

And in that case when I say accounts.

I mean, Amazon accounts.

So like the root account or your master account is one configuration you have your staging configuration, your core configuration, your data configuration.

And what I love about this is you have like a strict shared nothing approach.

Even the good history shares nothing and you share nothing has kind of been this holy grail to reduce the blast radius.

The other things like when your web hooks fire.

They only fire on the dev account.

And because we have these strict partitions there's no accidental mistakes of updating the wrong environment.

And every change is explicit.

Now there is a great quote in a podcast that just came out the other week, whatever the change log and they interview Kelsey Hightower on like I think the top was like the future is model.

And this is the constant battle of bundling and unbundling and bundling and unbundling is like basically, I guess you get anywhere you go to consolidate and then expand.

And then you expand and you realize that that can work well you consolidate and you expand it and you say so.

But my point here is more like one of the things he said, and that was like the problem with microprocessors is that it requires the developers to have a certain level of rigor.

I'm paraphrasing my own words.

It's asking it in an organization that wasn't practicing it before.

So how are they going to get it right.

This time by moving to microservices.

I want to find the exact quote somewhere maybe somebody can post office hours if they have it handy.

But that was it.

So that's the thing here.

What I describe here.

This is beautiful.

And when you have a well oiled machine that is excellent at keeping track and promoting changes to all your environments and no change is left behind, then it works well, but this is an unrealistic expectation.

So that's why I'm we're considering the Hasse corp. recommended this practice now under using multiple workspaces repo and under this model when you open up a pull request kind of what you see is OK, here's what's going to happen in production staging and anywhere else you're using that code because inadvertently you might have drift either from know human monkey patching going into the console or that maybe applies failed in some environments.

And that was never addressed.

So now you have diverged that way.

That's a valid error or maybe you have other Terraform code that is imported to state or something.

And it's manipulating those same resources.

There's been bugs and Terraform dividers and all that stuff.

So I want to see when I open up a board with what's going to happen to all those environments that that's what I really like about this workspace approach.

That's the opinion we keep getting like we'll see people that broke it out in a similar way that I can do Adam cat home experts where you'll have like the different end of US accounts and then it's free and it's interesting to me that you're like, well, now we're going to try this multiple workspaces so tight.

Where do I go.

Where do you go from here.

And you're right to feel frustrated and you know the ecosystem is in frustration towards you.

No no I know, just in general.

I would say like you know in software development more or less the best practices for how to do your pipeline some promote code is very well understood.

And we're trying to adapt some of those same things to infrastructure.

The problem is that we're different.

We're operating at different points in the software development lifecycle and the capabilities at our disposal are different.

So let's take, for example, infrastructure infrastructure.

But if you listen to this podcast you talk to us.

I always talk about this.

It's like the four layers of infrastructure you got your foundational infrastructure you shared services.

So you've got your platform and then on top of your platform you've got your shared services.

And then the last layer is your applications, most people working with this stuff.

Assume layers one through 3 exists and don't have to worry about it.

It's a separation of concerns.

All they care about is deploying their app.

They don't have any sympathy or empathy for how you manage Layers 1 through 3.

But if you're in our industry, that's one of the biggest problems is that we have to maintain stability of the platform while also providing the ability to run multiple environments for developers and stuff like that.

So my reason for bringing up all of this is that like Terraform as a tool doesn't work like deploying your standard microservice your standard go rails node app or whatever.

Like if you to play.

Note app, and it fails, you can roll back or you didn't even need to expose that error at all because your health checks caught it and you're running parallel versions of all that all that's offshore et cetera when you're dealing with infrastructure and using a tool like Terraform, it's a lot more like doing database migrations without transactions.

So there is no recourse.

So how do you operate when you're dealing with this really critical stuff in a place where you have poor recourse.

So you need good processes for that.

And let's see here.

So that's why we're still trying to figure this out.

I think as relates to DevOps what is the best course of action.

There's been Atlantis.

There's been Terraform cloud a lot of people are using Jenkins to roll this out.

Some people combine things like Spinnaker to it.

You can orchestrate the whole workflow, because the problem is you're standing CCD systems and the answer for this.

I don't think.

I don't think there is AI don't think anybody's just nailed it yet.

We're all so that it's a fair question if you go back to your cat home experts and you can click on one of those accounts.

I'm just curious now are you still referring to the cloud as opposed to modules or do you build modules within here.

No It's really all just config.

Both both.

So let me describe that a little bit more of what our strategy has been kind of like food for thought.

So So here I have one of these eight US accounts and under control, we have the projects.

So this is how we had kind of like the microservices of Terraform.

So it's not a monolith it's micro certain microservice architecture.

So to say, if I look in at any one of these we've been using a pattern similar to what like you'd get with grant works, which is where you have a centralized module repository.

So here we have the Terraform route modules and cloud posse.

So where you pull those root modules from really doesn't matter like you could have a centralized service catalog at cloud posse which works well for our customers because their customers are implementing.

I mean, our customers are implementing our way of doing.

Now if you know depending on how much experience and opinions you have about this stuff you could fork out modules you could start your own.

And what we have typically is the customer has their own modules.

We have our own modules and you kind of pick and choose from those here what we have is we're using environment variables with Terraform, which this pattern worked really well in Terraform 0 at eye level but in Terraform 0 12 for example Terraform broke a fundamental way we were using the product.

So when you run Terraform admits dash from module and you point out a module, you used to be able to initialize your current working directory with a remote module, even if your working directory had a file in it.

So that works really awesome.

We didn't need any wrappers we didn't need any tools.

But then Terraform 0 12 comes up and they say, no, the directory has to be empty.

Oh, and by the way, many Terraform commands like Terraform output.

Don't allow you to specify where your state directory is.

So that it could get the outputs.

So most commands you can say Terraform plan in this directorate Terraform applied this plan file or in the strategy Terraform it is in this directory.

But other commands like Terraform show.

I think and Terraform graph and Terraform output.

Don't allow you to specify the directory.

So there's an inconsistency in the interface of Terraform and then they broke the ability to initialize the local directory.

So anyways, my point with this is saying perhaps there is some confusion on our side as well, because the interface was changed underneath us.

So going back to your question under here.

So then if you go to this folder here and you go to like Terraform root modules here a bunch of opinionated implementations of modules for this company.

So here's the idea as the company grows in sophistication you're going to have more opinions on how you're going to do EMR how you're going to do Postgres how you do all those things.

And that's the purpose of this.

And then a developer all they need to know is that look, I can follow best practices.

If I point to that.

And then you DevOps team or people stakeholders in that with that label.

So to say, can be deciding what are the best practices for implementing those modules and then you've seen quite possibly them.

Here we have our terraforming modules.

Now our modules.

I want to preface this with.

We have these as kind of boiler plate examples for how to implement common things, but they're not necessarily the canonical best practice for implementing.

So our case root module here implements a basic case cluster with a single of scale group basically a single notebook.

Now that gets complicated right because starting companies they're going to need to know pool would use other.

And they're going to need a high memory node pool they're going to need a high you pool and how you mix and match that.

That's too opinionated for us to generalize Yeah.

Well, that's all I can say is yes.

Now, I guess.

And then what.

Yeah, and that's kind of what we're going through now just figuring out what works for us deciding on the different structures of everything and definitely taking advantage or looking at what you guys have already done and looking at a lot of things.

And just reading all over.

So yeah.

Well, that's a good read it posts.

You know everyone happens to blog and medium.

So take Jake out.

Exactly And think that's the other thing is just finding documentation on zero that 12 compared to zero 9/11.

And you know, refining my Google searches and only searching the past like five month type of thing.

That's a good point.

Yeah, there's not.

You can you can end up on a lot of outdated stuff, especially how fast and stuff.

So you know I was reading a blog from July of 2019 and blindly assumed that they were talking about doing about 12 when in fact, they were talking about zero 9/11.

So yeah, but I got to move on to the next question.

I see Brian Tyler joined us.

He's over at the audit board, and he's been working on setting up Prometheus on with the effects on ephemeral shortly clusters.

Kind of an interesting problem.

Those of you attended last few calls this Prometheus best thing has come up quite a lot.

I want to just kind of tee.

This question up a little bit and frame it for everyone.

So we called upon.

So we've been supporting a lot of different monitoring solutions, but we always come back to that yet Prometheus with refined eyes pretty awesome or Cuban community provides a lot of nice dashboards and it's proven to scale.

So one of the patterns we've been using that others here have used as well.

It works pretty well is to actually host the Prometheus time series database on each of us.

And I guess your mileage will vary and if you're running Facebook scale.

Yeah, you're probably going to need to use Daniels or whatever some bigger things.

But EFS is highly scalable.

It's POSIX compliant and it works ridiculously well with Prometheus for simple implementation.

The problem that Brian has is his clusters are totally ephemeral like when they do.

Roll out.

They bring up a whole new cluster and deploy a whole new stack to that.

And then verify that that validate that that works.

And shut down the old one.

And with Prometheus any offense.

Well, we've been using is the EFS provisionally and with the F has provisionally it'll automatically allocate a BBC system volume claim on your yet this file system.

Problem is those system ideas are unique.

They're generated by the company's platform.

So if you have several clusters how do you reattached any of this system.

If you're using the yet this provisional well the kicker is if you are doing it this way.

Well, then maybe the DFS provision provision or isn't the right tool.

You can still use CSS but the provision isn't going to be the right.

And instead, what you're going to need to do is mount the S file systems to the host using traditional methods amounting DFS fought for your operating system.

So if you're using if you're using cops you're going to use the hooks for when the service starts up and add a system hook that's going to mount that DFS file system to the host OS.

And now with Kubernetes what you can do is you can have you can mount the actual hosts volume that's been mounted there into your container and then you know what you're naming you are the decider of the naming convention at that point how you keep an secure and everything was Brian, is that clear.

I can talk more on that.

Yeah, no, that makes sense.

Yeah, it's unfortunate that I have to go that route.

But sometimes you don't get the turnkey solution.

Yeah, I mean so before the FSA provision or we were using this pattern like on core OS with Kubernetes for some time.

And it worked OK.

So I just wanted to point out for those who are not familiar with the cops manifest and capabilities that the cops manifest.

So under cloud positive reference architectures is kind of what we've been using in our consulting engagements to implement cognitive clusters with Cox.

I'm mostly speaking here to you.

Mike, who is looking for inspiration.

This here is doing that bolt poly repo architecture, which I'm undecided on at this point from being totally gung ho.

So we go.

So we.

So we go here now to the cops private topology, which is our manifest for cops.

What I love about cops is cops basically implemented a Kubernetes like API for managing Kubernetes itself.

So if you look at any C or D They're almost the same as these.

But then there's this escape hatch that works really well is that you can define system.

These units at startup that get installed.

So here you go.

Brian you'll create a system d unit that mount that that just calls exact start to mount the file system the file system.

Yeah Have you guys looked at offense CSI driver.

No So maybe there's other maybe there's more elegant implementations like I pasted in the chat.

But it was what I was looking into next, which I believe could solve my problem for me, just because I create the PVR itself.

Then I can define the path for that person simple.

Oh, OK.

Yeah If you can define in a predictable path, then you're around it.

Yeah, it's just so long as long as you're in the territory of having random ideas then yeah.

I believe, if I'm green that persistent volume myself instead of doing the PDC route, then I can create the volume.

And then when I'm provisioning Prometheus I can just tell it, which volume to mount instead of going and doing and creating a PVC through Prometheus.

Operator OK.

I think that's the route I'm going to try first.

If not, then obviously, the Cox system D1 is the one that for sure will work.

Yeah, that's your that's plan B self.

Andrew Roth you might have some experience in this area.

Do you have anything to add to it.

I've never used a CSI driver if only used.

I've only used DFS provision prisoner in that deal.

Our very own certified cougar Kubernetes administrator.

Any insight here will.

Your initial thought was the tree my have to have decreased itself be attached.

And then have to continue from there.

The CSI driver before.

I've only just started to mess around with it.

But not enough experience to really say anything much.

Playing around with rook sort of when it's set.

All right.

Well, I guess that's the end of that question then unless you have any other thing you wanted to add to that, Brian.

No thank you.

Yeah Any other questions.

We got quite a lot of people on the call.

I just have a general question.

It's worn.

So I know in the company that I work for we use a lot of the love you guys repost.

I know a lot of it is revolving around AWS.

Now do you have any or do you plan on doing any repos for like maybe Digital Ocean.

That's a good question.

But fun fact is that we use a bit of Digital Ocean kind of for our own pieces to keep our costs low and manageable.

I don't know if where would we're going to really be investing directly in Digital Ocean because of who we cater towards which are more like well-heeled startups.

And they need serious infrastructure.

So I think Digital Ocean is awesome for pieces.

But I wouldn't want to based my $100 million startup on it.

Cool I do have like a low com and also we actually do use that data process of using the telephone cloud with the workspaces.

How how's that working out for you.

Anything you want to add.

Like a for use case or firsthand experience so far.

And pretty good.

You know anytime we have any issues with a certain, I guess stage, we just have to do a pull request requested that particular stage.

OK You just go from there.

And it's pretty simple.

You know because I've only been involved with a company with a little over year.

So which of the repo strategies are you guys using with Terraform cloud the repo strategy.

I guess, depending on the stage or so like we have stages like proud of you 80q a.

So like maybe 80 and support they all branched off from the master branch.

OK So you're using the branching strategy, then where were you a team would be a separate branch, not in git repo it'll all be on the same branch in the git repo.

OK So anything that's merged into master any structures then or using the official approach that they describe up here.

Once again, it sounds like a single branch like a master of the branch from that.

And then merge back to yes in the long run.

Yeah, but are the environments is you a team is you just a workspace or is you a team a project folder like this.

You said you a team will just be a workspace.

OK So yeah, you're using that strategy.

The first strategy we were talking about here multiple workspace.

I wanted to expand on that in that we have an older Terraform repo that has a few of them that are using workspaces but we have a lot of stuff that we're doing tarragon to manage it.

And I haven't sat down really to think about it.

But is there a best practice for managing workspaces VA terabyte.

I was exploring that the other day just in terms of EPBC and out of the box.

Terry Grant does not support workspaces.

It's anti pattern for them.

And I would say that based on what I said earlier that what they used to be the official anti pattern.

Like you don't use workspaces for this but even hashi corp has done an about face on this.

So terror.

I'm not sure if they're going to be adding support for it or merging or even addressing this topic.

However, I was kind of able to get it to work using.

And I don't know what the term was in tarragon but I think the equivalent of hooks right.

So I had a pre had a well, it was a prerequisite Yeah, for either for the plan or the net or whatever.

I just use a hook to select the workspace.

And then the or create the workspace.

If it did any if the selecting failed.

And that seemed to more or less work for me.

So the challenge is you need to call this other command, which is Terraform workspace, select or Terraform workspace new and using the hooks was one way to achieve that.

You see here in a terminal window.

I might still have that question on those using Terraform workspaces.

And I haven't been keeping too up to date with Terraform 12.

Have they made the change.

So that you can interpolate your state as to your state like location or not.

It's still not a thing that's the amount of things I can talk to that as well actually.

Another vehicle that kind of.

So the OK.

So for everyone's clarification.

Let's see here.

That's a good example.

So the Terraform state back in with S3 requires a number of parameters like the bucket name the role you need to assume to use that bucket key prefixes all that kind of stuff.

And one thing that we've been practicing a cloud policy has been this thing like share nothing right.

So here's an example.

Brian's talking about.

So you see this bucket name here.

You see this key prefix here.

You see the region.

Well, the shared nothing approach right.

You really don't want to share this bucket especially if you're having a multi region architecture.

And you to be able to administrator multiple regions into that one region is down you definitely have to have multiple state buckets right.

However, because Terraform to this day does not support interpolation basically using variables for this.

That's kind of limited how you can initialize the back.

So one of the things that they introduced in a relatively late version of Terraform like Terraform from or something was the ability to set all the back end parameters with command line arguments.

So you could say terraforming minute.

And I think the argument was like config and then back n equals.

Yeah bucket equals whether I know something.

And then.

But then there was still no way to standardize that.

So that's why a cloud posse we were using.

There's this thing that you can use.

It's going to type it out here.

So Terraform supports a certain environment variables.

So you could say export TF Kawhi are was kind of weird t if Clyde plan args I think it was equals and then here you could say like dash BackendConfig equals bucket equals like my bucket.

So because you had environment variables you could kind of get around the fact that you can use interpolation.

Are we clear up to this point before I kind of do my soul searching moment here for me, this is clearer.

Yeah OK.

So I'm just one voice.

Eric, are you trying to shoot your screen.

Yeah, you might need to.

It's definitely being shared because I see the green arrow around my window.

You might need if you're in Zoom you might need to tab around to see it.

Yeah, I can see it in here.

Yeah zoom is funky like that.

I see it now.

Yeah So you see here.

I might be off by some kick camel.

Some capitalization here somewhere maybe.

But it is more or less what you can do to get around it.

So you can stick that in and see you can stick that in your makefile.

You could have users, you could have your wrapper script.

OK And then the last option.

Yeah if you're using terror ground terror Brent will help set this up for you using bypassing these arguments for you in your terror.

I grant that each sealed.

So there are options right.

But a lot of these are self-inflicted problems based on problems that we have created.

Like I said self click flick the problems we've created for our self right.

The problem we create for ourselves that cloud posse was this strict adherence to share do not share the Terraform state bucket across stages and accounts always provision a new one.

This made it very difficult when you wanted to spin up a new account because you had to wire up the state in managing Terraform state with Terraform we have a team of state back in module, we use it all the time.

It works, but it's kind of messy when you've got a initialize Terraform state without Terraform state using the module.

So it creates the bucket.

The Dynamo table and all that stuff.

Then you run some commands via script to import that into Terraform and now you can use Terraform the way it was meant to use with this bucket those created by Terraform but it's real.

And you see you see the catch-22 here.

And if you're doing this at every account is that idea.

So has she.

So grunt works took a different approach with Terry Grant and Terry Grant.

They have the tool Terry Grant provision.

The Dynamo be back in.

And the term the S3 bucket for you.

So it's kind of ironic that your whole purpose with using tenure reform is to provision your infrastructure as code and then they're using this escape hatch to create the bucket for you.

And so.

So let's just say that that was a necessary evil.

I'm not.

Know I get it.

It's a necessary evil.

Well, on the topic of necessary evils let's talk about the alternative to this to eliminate like all this grief that we've been having.

Screw it.

Single state bucket and just use IAM policies on path prefixes on workspaces to keep things secure.

The downside is yes, we gotta use IAM policies and make sure that there's never any problems with those policies, or we meet we can weak state.

But it makes everything 10x 100x easier.

It's like using.

So one of the things Terraform cloud came out with was managed state like.

And that's free forever or whatever you believe.

But just having the managed state made a lot of things easier with terrible cloud.

I still like the idea of having control over it.

We Terraform in S3 on Amazon where we manage all the policies and access to that.

So that's what.

All right.

So when you're done using workspaces together with the state storage bucket the other thing you gotta keep in mind is using this extra parameter here.

So workspace key prefix.

So if you're using the shared S3 buckets strategy, then you're going to want to always make sure you set the workspace key prefix so that you can properly control IAM permissions and that fucking based on workspaces.

So a workspace might be dead.

A workspace might be prob somewhat thank you very explaining that perhaps Teflon cleared up some confusion when it comes to workspaces.

But you said, where do you keep the state.

But if you divide that all overtime rules and policies it can be done by keeping it in a single state.

The single buffer could I should say one of the things Why we haven't liked this approach is that.

OK Let's say, OK.

And I just want to correct one thing I was kind of saying it wrong or misleading.

So the workspace key prefix would be kind of your project.

So if your project is case, the workspace key prefix would be e chaos and then Terraform automatically creates a folder within there for every workspace.

So there'll be a workspace folder for private workspace all to fit.

So there that.

Now why we haven't liked this approach is the.

So I am is a production grade where we're using I am in production, and let's say the master This bucket is in our master AWS count and we're using I am there and we're using I am to control access to death or control access to the staging workspaces or control access to some arbitrary number of workspaces while we're actually doing is we are modifying production.

IAM policies without ever testing those IAM policies in another environment by procedure like we're not enforcing.

You can still test these things.

But it's on your own accord that you're testing that somewhere else.

And that's what I don't like about it is that you're kind of doing the equivalent of cowboy.

I am in production, obviously with Terraform as code, but nonetheless, you almost need a directory server for this sort of thing.

Yeah Yeah Yeah Yeah.

That's interesting.

Is there.

I am directory integration.

I haven't looked trying to do that.

But Yeah.

So sorry once I got some comments here by Alex but if you update the reference version and dev but.

But then get pulled away, and it gets forgotten later environments come back to it's like default.

So the then coming back and being like, what if we dropped the ball on requires some fancy diff work and just tedious investigation.

I kind of want a dashboard that tells me what version of every module.

I'm referencing in each environment.

This doesn't cover everything but doing cool stuff like this is just messy, so Yeah.

So Alex.

So Alex siegmund is actually one of our recent customers and one of our recent engagements and I'm not.

So one of the problems that we have in our current approach that we've been doing, which we've done up to this point has been that you might merge pars for dead and apply those in dev but those never get promoted those changes never get promoted through the pipeline.

And then they get lost an advantage.

And that is the whole problem with what I argue is that with both the poorly repo approach that we've been taking.

But it's also the problem with the directory structure approach that's been the canonical way of doing things in Terraform for some time.

The proud directory the dev directory.

All of those things have the same problem, which you forget what's been ruled out.

So that's why I like this approach where you have one PR and you'd never work.

OK There's variations of this implementation.

One One variation one is that PR is never merged until there is an exit successful applied in every environment.

So then that PR is that dashboard that Alex is talking about.

He wants a dashboard that tells what version has been deployed everywhere.

Well, this is kind of like that dashboard when that PR is open.

Then you know that there's an inconsistency across environments.

And based on branch based on status checks and your pull request you see success in that success in staging.

And no, no update from production.

OK that's pretty clear.

Now where it's been updated versus this approach where you then merge it.

And then you need to make sure that whatever happens after that has been orchestrated in the form of a pipeline where you systematically promote that change to every environment that it needs to go after that.

But now the onus is back on you that you have to implement something to make sure that happens in Terraform cloud.

They have this workflow where it will plan in every environment.

And then when you merge it can then you can set up rules.

I believe on what happens when you merge.

So when you merge maybe it goes automatically out the staging, and then you have maybe a process of clicking the button to apply to the other environments.

What's nice about this is you'll still see that that has not been applied to those other environments.

And you need that somewhere.

So whether you use the pull request to indicate that that's been applied or whether you use a tool like Terraform cloud to see that it's been applied or whether you use a tool like Spinnaker to create some elaborate workflow for this.

That's open ended.

Let's see.

I think we've got some more messages here.

You've removed the need for such a dashboard by making part of your process ensuring that it's all repositories or environments.

Yes So I'm still not 100% sure.

OK, awesome.

So Yeah, Alex says that this alternative strategy we're proposing eliminates some of these needs for doing that because in the other model the process is basically the onus is on you to have the rigor to make sure these things get rolled out everywhere.

And especially the larger your team gets, the less oversight there is ensuring that that stuff happens.

And so are one thing I'm personally taking away from this is to try the work space recommended thing on Terraform docks first, and then you share and report back and report back.

All these pitfalls.

Yes also I am.

And to be candid, I have not.

I have not watched or listen to it yet.

And you're going to find detractors for this, right.

But that's good.

Many rich ecosystem conflicts and Anton Anton the thing go is a was it is it prolific is it forced them or I forget what the conference is he has just done a presentation on why you want to go on selling everything I said and saying, why you have to have the directory approach.

So that might be a video to watch.

Is it good.

Anton is a great speaker.

I know I'll look, I with the I actually met this guy upstairs.

Yeah Yeah.

He was there last year.

Yeah super nice guy.

He's awesome one.

Yeah, I really like.

I like him.

I met him up in San Francisco and reinvent I think this guy goes to like 25 conferences a year or something.

It's a.

So you were saying he kind of has the right combination even with zero.

That's 12 to stay with a director.

I think so.

So he is a very he's a very big terabyte user.

And Tara grant is a community in and of itself with their own ways of doing things.

So I therefore, I suspect this will be very much at promoting that because in the abstract it was something like, why you have to do this this way.

I'll find it though while this is going.

Any other questions you're I have a bit of a question that kind of is a little bit higher level.

But my experts whatever form is that when you use it, it's kind of very low level.

It's a fairly abstracted from the API.

And you have, of course, you know the built in kind of semantics that has you gives you rails as it were sort of like you know, this is how we just say transitions.

So we do this.

So we do that.

And it's kind of like you know operate inside of that construct.

Yeah What's your experience with four thoughts around using higher order constructs like what's available database TDK for example, in some of the things you could do with that in a fully complete language.

Yeah Yeah.

It's good.

I like the question.

And let's see how I answer it.

So this has come up recently with one of our most recent engagements and the developers kind of on that team challenged us to like, why are we not using TDK for some of the stuff.

Let's be like, let's be totally honest.

That like scene k is pretty new.

And we've been doing this for a long time.

So our natural bias is going to be that we want to use Terraform because it's just a richer experience.

But there's a lot of other reasons why I think one can argue for using something like Terraform and not something that's more Turing complete like SDK or blooming and that's that requirements don't always translate to awesome outcomes or products.

And the problem is that when you can do everything anything possible every way possible you get into this scenario of why people hate Jenkins and why people hate like Groovy pipelines and Jenkins because you develop these things that start off quite simple quite elegant.

They do all these things you need and then 25 people work on it and it becomes a mush of pipeline code a mush of infrastructure code.

If we're talking the c.d. k right.

And things like that.

This is not saying you can't use it.

I'm thinking more like there's a time and place for it.

So if we talk about Amazon in the Amazon ecosystem.

For example, I like to bring up is easy.

It's easy s has been wildly popular as a quick way to get up and running with containers in a reliable way that's fully integrated with AWS services.

But it's been a colossal failure.

When you look at reusable code across organizations.

And this is where Kubernetes just kicks butt over.

Yes Yes.

So in Kubernetes they quickly Helen became the dominant package manager in this ecosystem.

Yeah, there's been a lot of hatred towards helm for some security things or whatnot, but it's been incredibly successful because now, are literally hundreds and hundreds of helm charts many written by the developers of these things to release and deploy their software.

The reason why I bring up helm and Kubernetes is that's provided proved to be a very effective ecosystem.

We talk about Docker same thing incredibly productive ecosystem.

And so with Docker Hub.

There's this registry.

And you know security implications aside there's a container for everything.

People put them out there your mileage may vary and some might be exploitable but that's part of the secret.

Why doctor has been so successful.

It's easy DSL easy distribution platform and works everywhere.

Same pattern like going back in the days to Ruby gems and then you know Python modules and all these things.

This is very effective.

Then we go to Amazon and we have easy yes and there's none of that.

So we get all these snowflake infrastructures built in every organization to spec.

And then every company, every time you come into a new company at least as us as contractors you know two environments look the same.

They're using the same store tools stack.

But there's too many degrees of variation and I don't like that.

So this is where I think that the six part of the success of Terraform has been that the language isn't that powerful and that you are constrained in some of these things.

And then the concept of modules is that registry component, which allows for great tremendous usability across organizations.

And that's what I'm all for.

And that's like our whole mission statement that cloud passes to build reusable infrastructure across organizations that's consistent and reliable.

So back to TDK question and the answer that I gave the customer was this.

Let's do this.

Let's continue to roll out Terraform for all the foundational infrastructure, the stuff that doesn't change that much the stuff that's highly usable across organizations.

And then let's consider your developers to use TDK for the end for the last layer of your infrastructure.

What I'm talking about there.

And I'm not sure at what point you join.

But in the very beginning, the call.

I talked about what I always talk about which are the four layers of infrastructure basically layer one, layer two layer through layer 1 is foundational infrastructure layer 2.

This your platform layer 3 are your shared services layer 4 are your applications, your applications go for it.

Go nuts you may use CloudFormation, you know you server framework like if somebody is using the service framework, which is purpose built for doing lambdas and providing the structure other rails but for lambdas use it.

I'm not going to say because we use Terraform in this company, you're not going to be able to use service that's not the right answer.

So the answer is that's going to depend on what you wear at where you're operating.

Yeah, I really I really like that for Lamont.

I miss that from the beginning of the call, But that really makes a lot of sense because you want to have your foundations a little bit more rigid you don't want to have that much that you described earlier.

And that's where I think at a lower level the tie constructs that that Terraform gives you the high opinionation, I should say that makes sense, because you can only do so much.

And moreover you have a pretty mature kind of you used to be with Terraform it know you'd have temporal plant and then Terraform Apply could be quite different.

But I think this equipment has become much more mature at this point.

Yeah And they and they really do a good job predicting when they're going to destroy your shit.

Yeah And yeah.

And they have and they added just enough more functionality to HCM to make it less painful.

Which I think is going to quell some of the arguments around Turing completeness.

And then the other thing I wanted to say related to that is like the problem we had in pre h CO2 all the time was count of cannot be computed.

That was the bane of our existence.

And one of the top one of our top linked articles in our documentation was like, all the reasons why count cannot be computed.

Now that's almost we don't see it as much anymore.

So I'm a lot happier with that.

The only other thing I was going to add and I'm not sure I'm 100% on this anymore.

I was alone.

Well, I wasn't 100.

So I was maybe 50 60% before.

Now maybe 30, 40 but I was wondering like maybe maybe HDL is more of like a CSS language and you need something like Sas on top of it, to give a better developer experience.

But for all the reasons I mentioned about CTE came my concern is that we would get back into this problem of non reusable vendor lock kind of solutions and unless it comes from hashi core you run the risk of running afoul of kind of division.

They see for the product also.

Alex siegmund shared in the Zoom chat don't Alex keep you posted to the suite ops office hours channel as well.

Yeah, this is the.

This is the talk that the Anton Banco did at DeForest them and the recording has been posted and he just posted it is LinkedIn.

I'll look it up after this call and share it.

I do think actually though boy, this has been a long conversation today.

I think we already at the end here.

Are there any last parting questions.

No more just to thank you.

Thanks for calling me out earlier.

And then taking the whole hour to talk about her farm.

I appreciate that.

Well, that's great.

I really enjoyed today's session.

As always so lets see are you going to just wrap this up here with the standard spiel.

All right, everyone looks like we reached the end of the hour.

That about wraps things up.

Remember to register for our weekly office hours if you haven't already, go to cloud posterior slash office hours.

Again, that's cloud posse slash office hours.

Thanks again, everyone for sharing your questions.

I always get a lot out of this, and I hope you learned something from it.

A recording of this call will be posted to the office hours channel and syndicated to our podcast at podcast.asco.org dot cloud plus.

So see you guys next week.

Same place same time.

Thanks a lot.

Thank you, sir.

Public “Office Hours” (2020-01-29)

Erik OstermanOffice Hours

Here's the recording from our DevOps “Office Hours” session on 2020-01-29.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Public “Office Hours” (2020-01-23)

Erik OstermanOffice Hours

Here's the recording from our DevOps “Office Hours” session on 2020-01-23.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours.

It's January 22nd 2020.

My name is Eric Osterman and I'll be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator.

We help startups own their infrastructure in record time.

By building it for you and then showing you the ropes.

For those of you new to the call the format of this call is very informal.

Michael my goal is to get your questions answered.

Feel free to unmute yourself anytime if you want to join.

Jump in and participate.

Excuse my dog.

He's having a fun time downstairs here.

We all these calls every week.

We'll automatically post a video recording of this session to the office hours channel as well as follow up with an email.

So you can share it with your team.

If you want to share something in private.

Just ask and we'll temporarily suspend the recording.

With that said, let's kick it off.

Here's the agenda for today.

Some talking points that we can cover just to get the conversation started.

One of the announcements for all of us using excess was pretty good is that the master nodes have now come down and cost.

It's ugly.

It's a 50% reduction across the board.

Again, this is only on the cluster itself.

Like the master nodes.

It has no bearing on your work or nodes themselves.

So there's a link to that.

The other good news is Terraform docs has finally come out of release and that they have an official 8th photo release.

It's actually up to eight that won already this morning that supports each seal.

2 If your team isn't automatically using Terraform docs I highly recommend it.

It's a great way to generate markdown documentation for all your Terraform code automatically.

Some people use pre commit hooks.

We have our own read generator that we use it for other people use it to validate your inputs have been set descriptions and stuff like that.

Also, we launched our references section.

This is if you go to cloud policy slash references.

This is links to articles and blog posts around the web that have been written about how to use some of our stuff since I think it's interesting to see how others have come to use our modules.

This is a great resource for that.

And the first like a technical topic of the day that I'd like to talk about.

If there are no other questions would be using open policy agent with Terraform and then I'm also curious about firsthand accounts using Jenkins X. So with that said, let's take this off.

Any questions are you sharing the right screen.

Oh my god.

I do not share my screen.

All right.

Let's do that alone.

Let's see here.

Sure I am not showing the right screen.

You see my notes.

All right.

The magic is gone.

All right share this window.

Hey you.

All right.

Thanks Ed.

There we go.

All right.

Well, if no other questions I want to share a status update on behalf of Brian Tye use the guy over at audit board or Sox up that had the challenge with those random Kubernetes notes getting rebooted.

Well, you finally got to the bottom of it and everything else he did to try and treat it.

Although it wasn't actually related to the problem.

So they rely on shoot not Datadog.

What's that monitoring tool.

Start off as an open source project tracking all the traffic in your cluster.

Andrew help me out here.

Sorry what is this doing.

Yeah I think cystic is what they're using.

So assisting for monitoring.

So they have cystic installed on all their nodes in the cluster and was finally able to find the stack trace that was getting thrown by going to the AWS web console.

And looking at the terminal output of one of the nodes that had rebooted.

We thought we'd actually done this got to actually do it.

So this is where the actual exception was final cut.

The problem was a null pointer exception insisted running on their nodes.

That was causing the nodes to crash.

So it was actually their monitoring platform itself.

Assad's platform albeit nonetheless, that was causing it to crash.

So they so they got that fixed and now they're nodes they're not crashing anymore.

They have a cluster that some people spun up period that they just kind of gave everyone access to it like room.

You know this is just kind of a playground whatever, again.

And which in theory sounds nice, but what's happening is.

People are not being good stewards of using the cluster properly and setting resource limits and requests and everything.

And so it's causing like starvation on the nodes and it's causing not just the one team to have issues, but like anyone using the cluster to have issues and everyone's going what's going on with the cluster.

My stuff's not I don't think my stuff is hurting the cluster.

And it's everyone pointing fingers at each other.

That's all it's turning into.

And it's just a giant.

So So what are your thoughts on how to resolve that.

Don't do that.

Well, yeah, but I mean, I agree that it should everyone should play fairly with each other.

But maybe there are some pretty simple thrones games like don't give anyone don't give anyone cluster admin.

And if a team wants to use the cluster as a sandbox create of create a namespace you create a service account for that namespace that has whatever access namespace they want.

Create a limit range on that namespace so that if they don't set a request or a limit that it sets default then they don't have access to the entire cluster they just have access to the namespace itself.

And we get people bitching about.

Well, I need to set you know AI need to set a cluster role or whatever.

It's like, well, if you're on cluster.

Yeah OK.

I agree with that.

That's what was basically going to be my recommendation exactly.

You said they're the only other augmentation to that.

But I think it's going to take more effort would also be to deploy something like one of these open policy agent based agent solutions that require limits to be set that require all of these.

Now for the roles that can kind of be sold perhaps if you had get UPS related to this cluster, you can add the roles but that has to go through this different process.

Right anything that is global or the cluster or the next cluster is going to be.

No one has access to it.

And if they want something in the cluster, it goes through CSG like Jenkins or whatever.

OK gotcha.

You know that.

So they register you know so they say I have this app you know I'm going to push up.

I'm going to push up new images to this Container Registry you know and then set up harness or flux or Argo or whatever on the other end, you know and it requires kind of an ops team, which is not super DevOps.

But at the same time, it's like if I give access to the cluster to everyone.

It's just going to cause tons of problems.

So like, yes, I have an ops team.

And you are the dev team.

But like I want you to come work with me to set up your Argo stuff.

And then we'll be in communication on Slack and we can join each other standups or whatever and do that for you know DevOps collaboration versus here's access to the cluster do whatever you want is a free for all.

But are the consequences.

So one of things you said just jog a memory I met with Marcin.

He's a founder of space.

Lift shared it in this tweet UPS earlier this week.

And one of that was pretty neat with what he's doing is two things that come from experience doing get UPS.

One is addressing a big problem with Terraform cloud and Terraform cloud.

The problem is down to a good way for you to bring your own tools.

You have to bake those into your git repo and if you have like 200 Git repos that all depend on a provider, you have to like bake in 200 binaries and on repos and yet no fun.

Or if you depend on other tools and use like local exact probationers and stuff like that.

A fun way to do that.

Terrible so in space left.

He took a different approach.

You can bring your own Docker container.

Kind of like what you have Andrew and what we have geodesic with your tool chamber.

But the other thing that he does.

And this is what jogged my memory with what you're saying is sometimes you need escape hatches.

Sometimes you need a way to run ad hoc commands.

But we also want those to be auditable.

An example is you mess up the Terraform states and you need to unlock like the Terraform state is locked somehow.

You need to run the force unlocked.

How do you do that.

In order for the other example is you're running an actual real live environment and you need to refactor where your projects are.

So you need to move resources between Terraform state.

So you need to use the Terraform import command to do that.

How do you do that in an auditable way.

So what he's done is he.

I forget what he calls it here.

I think he calls it tasks but he provides a way for you to run one off tasks as part of your CI/CD city.

The other thing he supports is a Sas product right now is you delegate roles to the Sas and then those roles allow this to run without hardcoded credentials which is the big problem right with Terraform cloud as you've got a hard put your credentials.

And tear from the open policy agency.

I posted a bunch of links this week in the Terraform channel related to the open policy agent.

I confess it was it's been brought up several times before blaze I know has been doing some audio seized on this one of our community members, but it was my scenes showing me his demo how he's integrated into space.

Lift using.

OK that got you really excited.

So I started doing some research on that.

And then I saw it's actually pretty straightforward.

How you could integrate this into your CI/CD pipelines.

So if you look at these two links here.

I'll post them as well to the office hours.

Channel Nazi era.

I know I forgot to also announce that office hours have started here in Libya peeing everyone that office hours started.

Yeah, right.

So office hours.

So there's the link to the open policy agent.

So in here they provide an example of basically how this works.

So here's an example of a typical resource that you define and Terraform code and then what you do is you basically generate the plan as Jason using this argument.

Now you have this Jason plan and then that makes it very easy to operate on the output of this in the open policy agent.

So they have some examples down here of where you can now describe what is described the policy for this.

So here here.

It has an example of looking over scanning over the plan.

Let's see what this is.

So number of deletions and resources of a given type.

So here.

He's counting the resources you could have, for example, a policy, which is that, hey, if there are no deletions that reduce maybe the number of approvals that you need or something to get this PR through.

So then someone else in the community posted a link to conf test, which seems even more rad because it adds support for lots of different languages or types.

So it supports out of the box.

Jason Hammel but also they have support for ACF too experimental.

So now you can actually do policy enforcement on your Terraform using totally open source tools.

So that should play well with like your Jenkins pipelines code fresh pipelines sufferable Andrea of you are giving this a shot at all.

I am not able to do much experimentation and learning lately and trust me.

So no less that Carlos.

This sounds like something might be up your alley given the level of sensitivity of some of the things you do operating at a hedge fund.

Carlos idea with you.

And I was muted.

Yeah Yes.

Sounds very nursing.

I haven't seen it until now.

Cool Yeah.

You are you see.

Let me know whether I re posted those links into my companies slack in my Terraform channel.

And somebody said open source told everyone that they heard of.

Oh, yeah.

Yeah because sentinel is Ohashi corpse enterprise offering.

So yeah, there's an open source sentinel was saying that what I was linking was open source sentinel.

Yeah, I know the equivalent of basically open source sentinel.

Gotcha exactly.

I hadn't heard of central I didn't.

Yeah So sentinel is an offering of terror from enterprise on prem.

I don't believe you can have it on the cloud version that host an version.

I believe like the way that opa has kind of created this standardized format for creating these policies using this language.

It's called Rigo right.

Oh is that the name of this.

We have opa opa language is called Rigo.

And the fact that opa has created this standard.

It's going to it's going to be it's going to have to be something super awesome nowadays for me to go with something that doesn't support opa.

Yeah like, you know, I'm looking at said and it's going to be like this you know proprietary thing that works.

You know, for this one aspect.

And that's kind of it versus what's the reason I went to what to Terraform in the beginning.

Well, it's because it was standardized like you know if I go AWS, Azure GCP whatever you know it's all kind of I can use Terraform.

I don't have to learn some new tool.

But I'm feeling is that same kind of mentality.

Now Yeah.

Yeah, I think so.

If for kind of testing or validation of all these formats and testing for us because certainly I mean, let's face it, the tools that we use move bleeding we fast and many of them lack sufficient validation on what they have or features.

But now, like let's say say helm file, for example, using this would be very easy to express.

Now some policies around helm files and some of your best is basically codifying your best practices.

Now are unhelpful in doing the same for Docker file et cetera.

So Yeah like what.

So for helm file what would be a good best practice that everything has been solvable flag that should be enforced in all of your home files.

What else would be a good one that you're pinning image tags or something.

I'm struggling right now on my head.

I was prepared for it.

But yeah, I think we could find some things to validate and help us install a little flag.

Oh so how file has moved in the direction of mirroring the interface of Terraform so like Terraform has a plan phase and an apply is now in Terraform the plan phase actually can generate a plan and then you can execute that plan in hell and file.

It has it something analogous to the plan.

But not an artifact like a plan file.

So what I'm referring to is Helm.

That's right.

So helm Def shows you that the potential changes without making those happen.

Well Ben helm file has two other commands.

One is apply and the other is destroyed in those mirror Terraform.

So the apply flag honors the installed flag versus sync.

I don't believe sync.

So when you're in sync I don't think it'll uninstall things.

But apply will uninstall things when you run home file.

Interesting that's not what the documentation is saying.

But maybe it more.

OK I let's get that right.

I might be wrong.

There is a nuance like that.

Let's see what the documentation is saying that the difference between apply and sync is that apply runs Def and if Def shows that something is different than it runs sync.

Am I getting that right.

So you applied the helm file apply sub command begins by executing.

If that defines that there are changes sync is executed adding the interactive flag instructs helm file to get your confirmation vs. sync sync sink says sink all resources from state file.

Yeah So I don't think it mentions it.

And it doesn't work like that.

It doesn't mention anything about that installed flag.

I suspect before consent either I could be wrong.

Maybe maybe it doesn't work that way.

So you know what I said I made.

Maybe I am misled or maybe the functionality has changed since we started using it.

But the idea that apply is intended to be used as part of your c.I. workflow.

We're getting in town file.

And I'm starting to try to come up with some best practices because know some of us.

And now have more experience with home file than others.

And so we're looking at things like, is it a fight to clean up on fail flag for example.

Is it a best practice to have that on or just leave it as a default. False like.

Yeah Well, so you're guy I would be careful where you about that in production and I'd be I'd be recommending and perhaps and staging.

But it depends.

Maybe not at all staging environments maybe just on your like demo or preview type environments.

This just came up today.

So let me explain a little bit more.

So as we know with home if you do an upgrade it will bail on you if the previous releases failed.

So then you can't move forward.

And if you add the force flag for example, then it will uninstall that failed release and then reinstall.

But that might be pretty disruptive if you're running in a production environment, especially since a failed upgrade or a fail deployment doesn't mean a dysfunctional service.

It just means the surface is in an unknown state.

So this is why you might not want to clean up resources if it'll help you debug what went wrong.

However, in a staging environment or like preview environments where you're deploying frequently and you don't want your builds to constantly fail, especially when you're developing, especially when you know things might be breaking and unstable, then I like the force flag would and possibly the cleanup flag, then to just keep things moving humming along.

Even in staging, I would like I would totally agree with that for dev but in staging.

Wouldn't you want to have like.

My thought process for staging is like, all right, I want to have whatever production has now.

And then run the exact same command that you're going to run in production and see what happens.

Yeah, I'm so sorry we're just overloading the term staging like every company does.

Staging in your context is like a production.

Yeah, we have something that we saw staging to us is an account stage where you will have environments with different designations so one designation would be production and production should be almost identical to production.

But then you have unlimited staging or preview environments.

Those are also running in staging or staging accounts.

But they have a different designation.

So it's just that different companies use these terms differently.

And that.

So we're saying the same thing.

We're just using different terms Eric can correct me if I'm wrong, but it almost still deployed to a field.

The play if there were previous deployment deployments that ass right.

My understanding if the most recent deploy failed the upgrade will fail unless you have the forced flag.

But I don't want to claim expertise on that.

I've been using helm files.

So long.

So help file kind of redefined some of those behaviors and use the force flag on helmet file.

The first thing it does is an uninstall of the release if it has failed to ensure that this seat successfully sells.

But the raw home functionality.

Maybe if somebody knows for sure, please check in I'm almost positive.

That's the case.

Just because I have instances where the initial deploy will fail and then and then I won't be able to deploy to that again until I do like a delete purge.

Right I will I will have long running deployments that will fail and then running like another deploy does work.

And I don't have to like delete that.

Delete that.

So you're saying when there are multiple helm releases you're able to do it.

But if there's only one and release.

You can't.

So for a particular release.

If it's been deployed successfully before and the mostly deploying as long as it was successful before.

OK that's worth that that might be the case.

I believe, at least in my case, I haven't seen that.

But definitely if it's the initial deploy of a released and it's never been deployed before and it fails, you cannot redeploy it until you kind of until you purge the that release.

Yeah So maybe what we're seeing to see my screen here.

And I think it's the atomic flag that helps with this magic here.

Last nice.

By the way Brian, I shared just before you joined I think the resolution to your issue with the random node reboots.

Oh, yeah.

Yeah, just for those that ever have to run into this is to check that easy to system logs.

We'll have all of your kernel logs versus checking on the box, or guess ship it yourself by think just if you're on a date.

Is there a ship for you.

She module is.

There's cool any other questions or insights here.

I need to come up with a demo of infrastructure as code using just my local computer.

So self-contained like a demo.

Yeah Any ideas.

There's like I could do Vagrant with you know spinning up a couple of virtual machines.

I love to use Terraform because that's what I use for.

For real infrastructure as code.

And so if I'm just showing code.

I'm going to show Terraform code, but I need to be able to do a demo of like, hey, look at me spinning up actual infrastructure.

But I won't have access to a native US account or anything because multiple people are going to do this.

I assume you're saying, what does it specifically need.

Siri woke up here.

Does it specifically need to demo kw s functionality or demo.

I see any kind of I see wonderful food for thought.

If it jogs your memory because I know you're working on dad's garage you're equivalent of dude s sick one of the truck true prototypes.

They did here.

It was just AI think it was a Yeah.

So I did a prototype of working with Minikube but then it also works with Docker for Mac and I assume most of your company would be had we'd have Docker for Mac installed right now.

He would think, OK, maybe then the value of what I'm saying is moot.

But what I was going to say is if you're running dog for Mac or probably dogs for Windows 2 you can just check off this box enabled Kubernetes.

That's not a bad idea though.

And that you do to play circuit you could do Terraform code that just spins up Docker containers.

Yeah Yeah.

Because all I really care about is showing like you know look here is code that spun up you know it spun up a bunch too with this exact diversion put up with this exact amount of memory and CPR you power that I told it to set up you know and it spun up three of them because I told it to spin up three of them you know.

And now I'm going to kill them all with one command.

Yeah, that is in that's infrastructure as code right there doctor.

Instead of virtual machines.

But the concept is the same.

Faster demo is lighter weight.

And as you know, there's a doctor providers you know you can also skip what I said about cougar genetics and you could just do plain vanilla doctor as well.

You provide the context of this demo is it like a lecture learn for your co-workers it's for an intro to deficit gops class.

So we've got a section on infrastructure as code and the, the customer wants a live demo of the infrastructure as code.

I did a.

So this demo does include AWS.

But I did talk on using Terraform to deploy and apply a simple web API on this young group in the US and then do a looping deployment of that.

So that's kind of fun.

I hope it's open source I can send it to you.

But it does require a database account.

But I actually was going to do this for my co-workers and provide and we'll just create a dummy database account provide them the credentials and then just ask them to do the Terraform from destroy after.

So you could technically do that right.

I guess it depends on the size of your class.

Yeah, that gives me some ideas.

Like, I could do something like I could do something like you know given the way that you guys currently handle your infrastructure.

How would you handle doing OS version upgrade.

Oh, well, you know we'd have to SSA into each and every one of them, then know apply all the commands and shepherd them through the whole pets versus cattle thing versus I could be like, I'm going to change this variable and then redeploy.

That's a good idea.

I like that.

Well, I did also which was really cool too when I did our people that was really cool when I did my demo was I deployed it to two different regions and two different environments all with like the same Terraform and yet.

It was just a search of a variable and when I did that demo everybody appreciated that.

Yeah, that's going to do well, you have identical environments.

But one is staging and one is of the product one might have t to X large is the staging one might have t to my.

Other than that Alice is wearing the same.

Yeah let's call that reminds me of one of the posts at one of our customers shared with us only because it was a nice summary of most many people's experience with Terraform like when you start to where you are today like when you start you start with small project.

And then you realize you need a staging environment.

So you clone that code in the same project.

And now you have two environments.

But any change is in the blast radius of all of those.

So then you move to move having separate projects and all of that stuff, you find that linking to Terraform channel what was the first time I heard of the term a terror list for Terraform monolith material.

I love it.

Yeah, it's not a perfect description of it.

Everyone knows what you mean, if they've ever done it.

Yes speaking of terror let's.

I was thinking about the wave Yeah.

You had showed.

The way you do the terror form infrastructure for your clients with like each client gets a GitHub organization.

And then there is what's under the organ.

We're going to like do you have is it just one big Terraform Apply or whatever for like the whole thing or is it no go out.

Yeah, it's all decomposed.

So basically, our thing is that we've been using it.

So here's like an organization.

And then each e.w. us account represents basically dedicated application and then so therefore, each one of those has a repository but then you go into each one of these repositories.

And then they have software just like your applications have software.

But your applications.

What do they do.

They pin at library versions.

So that's what we do here.

So you want to run the write the same versions of software in all these environments all these office accounts and we do that just by pinning like to a remote module.

So here we can throw a remote module.

So when we made changes to this directory that triggers a plan and apply workflow with Atlantis there's a plan applied just for the cloud trail directory or whatever except for the whole staging got Catholics did.

Exactly So basically, you open pour requests that modify any one of these projects.

And that will trigger, you can modify multiple projects at the same time.

There's nothing no multiple applies to do that they exist.

Well known in multiple plans multiple plans multiple tries to do that.

But all in the same pull requests and all automatic.

Right OK.

What does that look like.

Does it Lantz ask when you submit a pull request does Atlantis say you know this affected 3 Terraform projects.

And here's the plans for all three of them is that what it does.

Yeah So it would.

So you know the pull request was opened automatically or as the site was open manually by a human a developer.

Then the plan was it kicked off automatically.

And then we see this out here.

This happens to be output from helm file.

Not Terraform but we because we use Atlantis for both my file.

And terrible when we see it or what's going to change.

And then somebody approved it.

And then we can surgically target what part of that we want to apply for.

We're going to say Atlantis apply and it'll apply all genes catch a so one thing to note about Atlantis, which is really nice is that it locks that project that you're in.

So to developers can't be trying to modify it in this case cube system at the same time.

Because otherwise, those planning to apply those applies will be under each other's changes and they'll be at war with each other.

So Brian any interesting projects you're working on.

I am starting my Prometheus initiative.

Oh, nice.

Very cool.

Well, are you going to take a Federated approach or as in like a remote or remote storage.

No So basically, you have a centralized Prometheus but then each cluster runs its own Prometheus and the centralized Prometheus grapes.

The other ones.

That possible.

I'm not sure yet.

I actually just started yesterday looking at the helm file for.

But yeah I'll be honest, I'm not stupid familiar with Prometheus.

It's just I know that it's a lot more lightweight than what we're currently running, which is the Stig is third party.

But takes up takes up 30% of our CPU on our 4x 2x large is on a clause of the military boot and it's just for a mining tool is just asking for too much also is memory intensive as well.

So Prometheus obviously is on the other side of that spectrum where it is not a resource hog which is why I mean, yes.

I mean, it takes a brute force method to monitoring right.

It looks at like a packet inspection and everything happening on that box.

It's but I guess it's a testament to how fast sea views have come and how cheap memory is that they're able to get away with doing that.

What does.

But it's still a problem when you're doing things like you are saying, yeah.

Yeah, I have you guys gone the Federated Federated approach.

What would you say about that.

So we are we're starting a project in the next month or so.

And with a new customer.

And that's going to be doing Federated Prometheus.

The reason for that that cases, they're going to be running like dozens or more single tenant dedicated environments of their stack for customers is basically an enterprise that's so having dozens and dozens of Prometheus is in for fun as would just be too much noise and too hard to centrally manage.

Also scaling that architecture.

If you get if you only had central Prometheus would be dangerous because at some point, you're going to hit a tipping point.

And it's just not going to handle.

So with a Federated approach basically, you can run Prometheus because Prometheus is basically time series database.

In addition to its ability to harvest these metrics.

Now it can use like a real time series database like influx TV or post scripts with time series TV or whatever.

And so forth and a bunch of others.

But the simplest way, I think is just to think about Prometheus basically, it's a database, you know we can offer.

So then what you can do is you can run smaller instances of Prometheus on each cluster with a shorter retention period.

So you can still get real time data in, but you don't need to run massive deployments of Prometheus, which can get expensive because I'm running for methe s with less than 12 15 gigs allocated to the main Prometheus operator is required before for any sort of retention more than like a week or two.

So So in the Federated model basically set up like on a shared services cluster.

What we call core one for me at this instance that then scrapes the other ones.

And it can do that at its own pace and you can also downscale the precision in the process of doing that.

If you need to.

Plus if you do need absolute real time, you could still have Grafana running on all the clusters.

And you can have real time updates on those environments, different ways of looking at you guys use the remote data remote storage.

We don't.

So it's kind of it's been in our backlog of things to solve.

And there's been there's a few options of it.

The one that keeps coming up is Thanos where we look at the architecture.

I mean, it suddenly it increases the scope of which you've got to manage right.

And when is your monitoring system.

It's really important that it's up.

So I understand and appreciate the need to have a reliable back, but also the more that you the more moving pieces the bigger the problem is if something goes down.

And then what monitors the monitoring system and all these things.

So we've gotten away with using EFS and as scary as that sounds is actually less scary these days because DFS has basically x compatibility and you can schedule.

You can reserve.

IOPS on DFS.

So the problems that we've had with Prometheus have been related to AI ops and then we just bumped up you pay to play.

And it's not that much more to pay to play compared to engineering side.

So we just bumped up the AI ops and all our problems.

One way.

The other option.

I think Andrew's brought up before is that you can also just allocate more storage and then you get more high ops, which is great because it leaves it.

First of all, it just gives you more credits.

So if you still have first of all, IOPS don't need that higher baseline on of credits you're getting is not high enough given the amount of data you have just store a bunch of random data and the amount of money you a knowing that I got your signals pretty bad Andrew at least for me, is anyone else hearing the feedback a little bit.

Yeah So would I be able to use.

So you know I have a ephemeral cluster type situation would I be able to use ACFS with an outcast cluster.

Oh, yeah Yeah Yeah.

Yes cluster that ties back into the same ACFS.

Yeah Yeah Yeah, you could have a static Yeah.

Yeah, this file system.

For example, and just keep reusing that and unique.

Yes, your family.

Yes clusters.

Awesome Yeah.

That's really nice about CSS is it does not suffer from the same limitations of being it EFS is cross availability zone.

Yes versus IBS, which is not one.

So the issue.

People run into frequently when doing persistent volume storage in Kubernetes is let's say you spin up Jenkins and you have a three node cluster across three availability zones because you like high availability will who Jenkins spins up in USD one and then for some reason Jenkins restarts.

And next time Jenkins spins up in USD 1B.

But that EBS volume is in one a still Jenkins can't spin up because it can't connect to the b.s. volume.

Yeah So your host DFS does not have that problem.

Yeah In fact, we've changed the default.

I forget what the term is this default storyboard class.

The storage class for cabinet is to be cast and make it explicit.

If you want PBS like before certain situations.

But the that's worked out really well for us.

Plus it simplifies backups too, because you can just set up AWS backup on your DFS file system.

You don't have to worry about backups of every single volume provisioned by Kubernetes.

So it's a lot easier just to back up the whole file system.

We don't do that because it would be backing up random crap that we don't need.

But you can definitely do that.

Yeah, and if she's a shameless plug.

If use the cloud posse in DFS module.

It supports database backups.

Well, that's good to know.

Thank you.

I probably wouldn't have looked the VFX.

It sounds like the right thing to do.

But we were running it right now with so far we have maybe three months of retention in it.

And it's fine.

Obviously, we have to wait and Seattle goes up to six months.

But I don't see any problems right now.

Once we address the memory issues the meat is we've been running for way over six months.

And we're fine with it.

It's about.

I hope it's about ops right.

As long as your AI ops handle you know, if you are in production and your Amazon, of course, you shouldn't use EFS or your Google or whoever.

That's dumb, but if you're a small you know if your traffic is small go for it.

And don't listen to all the people saying, oh, you should never run Postgres on EFS because it's the NFL.

It it's fine.

Don't worry about it.

Try it.

And if it causes issues, then figure out something some other something else don't add Nazis don't add complexity.

Right off the bat until you have tried the less complex option and determined that it's not a good option.

Yeah, I think that's a good way of putting it.

Are you guys using the CSI driver for EFS.

I'm using the original or whatever that's you that.

I don't know.

There's a tool called the effect provision here that you deployed your QB native cluster that just provisions perceptive volumes for you given consistent volume claims and it works great.

I just posted what I was looking at in the office hours channel.

OK So you guys are using the offensive provision here a fester Virginia.

I think we have a health file for it too.

Thank you.

Thank you.

Good to know.

Yeah if this provision and here's the file that we used and work straight.

We've been on it over a year.

No problems.

And so you guys are testing out coubertin entities federation.

No, we have not on the federation root for Kubernetes just for clarification.

When I was mentioned federation that was for Prometheus, which is unrelated.

I often look into that or Prometheus sandwich.

Protip with the effects if you're going to go into Governor cloud for any reason, use Governor cloud WF cloud E does not have CSS.

We made that mistake and it has cost us.

Well O'Brien by the way, where I pretty much learned about what this looks like for the Federated Prometheus is from your pal Corey Gail over at gundam so they can definitely shed some light.

They're not doing it under Kubernetes.

They're doing it like on bare metal.

But they can tell you about why they did it.

Yeah And I know that they just started that Prometheus initiative like maybe half a year ago.

Yeah, even that period, which is really cool.

Yeah Cool.

So as any other parting thoughts here before we start wrapping things up for today.

Any other interesting news announcements you guys have seen.

On Hacker News don't you.

Apparently friggin what's it called Whatsapp.

Check if you're Jeff Bezos don't use Whatsapp.

Yeah, that more then that sounds too good.

Also I thought apple backups were encrypted.

But now they're saying they're not encrypted.

I have started using Keybase heavily and I'm in absolutely in love.

Awesome but keep AI mean, that's great for chat and validation or whatever.

And maybe with AWS can creations and whatnot.

But it's not going to help you back up your iPhone right.

I mean, I don't have an iPhone.

I'm using it.

Give me some Keybase has file storage.

And it even has encrypted get repositories.

I saw that.

Yeah What's interesting and underrated are they're exploding messengers.

If you're sending secrets once you get them to exploding messages like like Mission Impossible.

They literally explode in the UI.

I haven't seen that.

How does that work.

I mean that on your message.

Oh, yeah.

Sure enough.

Let's go.

Let's go.

I never use that.

I just don't really use it for the check because none of my friends are on it for me.

It's what we use in-house to send like some of our secrets.

Yeah It's one of our best practices.

Well, we start with a new customer.

I always recommend that they set up a team under key dates were for those situations where you need to share its secrets that don't fit into like the best practices of using a password manager or the best practices of using like how she caught ball.

Just kind of a one off.

Yeah, for the one off things right.

Because a lot of these other tools don't support direct sharing of secrets.

To individuals in an encrypted fashion just put your passwords in Slack.

What could go wrong.

Seriously they cannot get access.

I cannot get this stupid doctor provider to work or forget.

All right, everyone.

Well, on that note, I think we'll leave it.

Let it be.

We'll we're going to wrap up office hours here.

And thanks again for sharing everything.

I always learn so much from these calls reporting on this call is going to be posted in the office hours channel.

Do you guys next week same time, same place Thanks, guys.

Derek

Public “Office Hours” (2020-01-15)

Erik OstermanOffice Hours

Here's the recording from our DevOps “Office Hours” session on 2020-01-15.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

There we go.

All right, everyone.

Let's get the show started.

Welcome to Office hours.

It's January 15th.

My name is Eric Osterman and I'll be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator.

We help startups own their infrastructure in record time.

By building it for you and then showing you the ropes.

For those of you new to the call the format of this call is very informal.

My goal is to get your questions answered.

Feel free to unleash yourself at any time if you want to jump in and participate.

We host these calls every week will automatically post a video recording of this session to the office hours channel as well as follow up with an email.

So you can share it with your team.

If you want to share something in private.

Just ask.

And we can temporarily suspend the recording.

With that said, let's kick it off this Office Hours.

So here's some talking points.

I put together one announcement I'd like to make.

Just today.

I think hashi corp. announced that the Terraform providers are going to be distributed as part of the Hasse corp registry.

So this is like what is available today with Terraform models.

The caveat here is right now, it's only official providers they are working to open it up to third parties and other providers to publish their providers.

I guess for the security implications of this.

They're just being very careful about how they go about that.

Anyways, this exciting news because anyone who's tried to use third party providers.

It's been a real pain.

So if in the near future, they fixed that I'll be very happy.

Another public service announcement.

We said this last week.

Just want to share it again, because I'm pretty sure it's a bunch of companies are going to get bit by it.

The root certificates that effect RDS are raw and documents are expiring in March.

If you're using certificate validation with your instances that's going to break unless you improve upgrade your certificate bundle for those instances or fruit for the nodes that your clients run on those certificate bundles need to be updated.

All right.

So the next thing we have a question from here in our community posted and office hours.

He has also some feedback because he asks in Cuba and Eddie's office hours and got some answers there too.

So we'll see how that reconciles with what we recommend.

And then a demo.

If time permitting, we wanted to do this last week we ran out of time.

I just want to show you how we've been able to use CloudFlare as kind of like this Swiss army knife reverse proxy port injecting H2 MLC ss into sites that we don't necessarily control.

All right, let's start with you here.

So the question you'd asked was how to provide a module or a tool to our application developers to provision the databases for them, including the deployment of the secrets inside of communities and in office hours hours right now in that channel.

If you check.

There's a link there to a Terraform repository where it looks like you started kind of like EPBC with putting something together.

Well, it's basically for me, it's in our infrastructure as code repositories right now.

And we want it to or would like to move SAP basically to the application developers.

This is a private project of my the similar is pretty similar to the infrastructure that we use at work.

So you would go to the committee you just have modifier telephone from file.

Yeah, this is Bono.

All right.

That's another one for post.

I think Costco's Hamilton fire down on the bottom.

So easily these two would be one file and we could give this to our application developers.

And I can just maintain those and currently we get roughly around 2 to 10 requests for requests on our infrastructure scope repository and make some people.

And we still have our own people.

So currently kind of gets to the point where it's cumbersome to maintain all these requests and we were thinking about just like putting it out.

So I asked the same question in the communities office hours where a lot of knowledgeable people on.

They basically said, well versed in their developing experience because the environment is either.

So I don't know much about op's application therapy in general.

And yeah it just reduce their productivity from the application developers, because I didn't know the centerfold.

That's the feedback that I got from them.

But I'm happy to know what if you are doing something somebody.

So there's two things at stake here.

There's one is kind of like your stated approach or objective that you're trying to do right now.

And this is one way to do it.

What you describe here is kind of like how I would automate the production of perhaps your production grade backing services for how low to fire for those of you who don't know how notify or is a project that year working on.

It's a helm.

It's a tool to visualize differences between helm releases or helm charts.

And here is kind of like some of the infrastructure code to deploy that.

So what we have here is some Terraform code that provision that generates some random secrets that provisions a post gross role in the database, and then writes those secrets that were generated into Kubernetes into a specific namespace with those values, including the database, you or I in this case is what he's writing there.

So hold that thought doing this and Terraform is great.

And at some point, you're probably going to want to do that like for your production grade services.

This is what you're going to do.

Well, we don't recommend though, is for preview environments or pull requests environments or what we sometimes call unlimited staging environments.

We want to use Terraform in this case in this case.

Well, we would be using is as part of your helm chart you would have the ability to enable backing services that that chart depends on.

And then that makes it very easy to spin up a fully self-contained environment within Kubernetes just by deploying that chart into a release that.

Then the chart itself can generate those random passwords using the Rand function, whatever it is in home templates and it's easy.

OK So that solves kind of like I think the problem actually that you're trying to address right now.

But it doesn't solve kind of your initial requests like, how is this the right approach that we should take for safe production environments.

If I reframe your question.

And so that was the way I first, they actually interpreted it when I was senior question and it's something that we've been thinking about to a cloud passes like, how can we move the dependencies for backing services of your applications that we need to provision with Terraform closer to the applications themselves and not in like some centralized infrastructure repository.

Because you don't want to have to open one PR in the infrastructure repository like most companies have.

And then another PR free application coordinate that whole process.

Right the main problem because we only on regulatory push our infrastructure is cut to production or even to staging and application developers might be faster than us and want to stay up as a service to staging already.

And they have to wait for SRV to dodge a gift with a message.

That's a good point.

One that I didn't bring up.

So you're also talking about not existing cert not just existing services, but also net new services that are deployed.

What's that process.

Yeah OK.

So be the approach that we are starting to lean towards.

But I can't say that we have fully achieved it.

I think there's some others here on the call that have maybe Dale has done it or Andrew Roth maybe if he's on.

But would be to have the Terraform in each one of your microservices.

You have a Terraform application folder in there and that would have the deep.

Backing services that it needs.

Now what you describe here.

Yes, this is a perfect application of what you just what you have in here.

This should be captured inside of a Terraform module.

And then that module will be version separately and of itself.

And then you might want to have some automated tests related to that separately.

But then when you want to instantiate that.

How do you do that.

Well, that's where you're now in your application full.

Now in your applications repository let's say how modifiers your application repository there'll be a sub folder in there called Terraform and that's where you're going to invoke that module as part of your ICD pipeline.

The question is then what do we do about the integration secrets.

How do we get those there.

So that's going to work well in this case is that depending on what you're using for your CI/CD I'm going to say, let's say you're using Jenkins if you're using Jenkins it's going to use the standard credential store and then your pipeline will indicate that that's the credential store, it's going to use.

And then it can just go ahead and position this stuff from scratch.

But the challenge is if you have some credentials that can be programmatically generated and some that needs to be user generated.

That's the challenge.

And that's very often the case like you're integrating with third party guys and you developing a new service.

How do you get those secrets in there for the first time.

I don't have a brilliant answer for that.

We usually just see people kiddos in as part of that cold start.

Well, this kind of stuff your itself with time from very able.

We just put them in well.

And just put that very terrible variable in top secret configuration.

So that's how we do this kind of stuff that's almost an.

And then that also empowers the developer to set that environment variable in your CI/CD.

OK Yeah.

So this is how we do.

We are doing that or they are using basically the services and just to jq and try to get it somewhere else.

For example, Google Maps.

Someone wrote a best script to generate Google Maps secrets.

OK I don't have Terraform support.

But yeah if someone has ideas about us.

I would be super happy to know how you are doing this kind of stuff if you're doing it.

Mm-hmm It's more like exploratory right now.

So also I'm curious.

I mean, anybody interject in the answer I gave is was there some disconnect I missed there, which is the part that you're struggling with.

It's more like I wanted to know if people are already doing it and what our experience was with it because, well, that's the feedback that I got from the committee's office hours.

Yeah, I tried it.

And developers didn't understand why I'd now have to approve infrastructure as code as part of their deployment process.

So I had to prove they did it this way.

They approved basically the terrorism plan and the terror from Kenya to execute it.

And then the application promotion got executed.

And larger.

So you doubled your approval go gates because they moved to infrastructures called into I think it's going to largely that the organization culturally how they approach it.

I kind of react to that.

I this is this kind of like, oh keep other people's problems as it relates to code reviews.

I kind of reflect that.

Don't you want to grow you know like career wise and learn these things.

And I think requiring code owners in your repositories so that certain when certain files are changed that an approval comes from one of those people.

I think that's a way to mitigate some of that.

But that can be all bundled into the same code review process.

Perhaps it slows it down a little bit initially.

One other thing I wanted to say is.

So the challenge here is that we're spanning tool sets.

Right like if all of this works just through coo cuddle.

So to say or like the Kubernetes API it be a little bit easier versus here where I'm sorry.

Go ahead.

I'm saying long held versus here what we're doing is we're straddling tool sets we're straddling a Kubernetes tool set and Terraform toolset for deployment.

And this is where I think that now.

Now, if we look in this repository you're technically using Digital Ocean so it's a little different here.

Yeah And some of our answers would be probably more obvious focused.

But this is where the e.w. s service operator is also, I think, really exciting is the ability not necessarily this project.

But this technique.

This badges.

So you have a Cuban ID.

Oh, interesting.

It's been archived.

But basically, you have a CRT that you deploy that defines that part of the infrastructure.

So that you can stick within the same toolset to deploy.

That's cool.

I actually today found cube farm, which basically gives Cuba need to see ID for telephone providers.

I posted in the office also.

Oh, I will not exploit get net but I think it could also off my issue where I said, it's a second link not the first one where people now see I want to switch, or maybe it's the same coupon coupon looks like this is they've gotten a lot more in.

Yeah, I guess I have not seen this one because ranchers started with one as well.

You know.

Yeah Yeah.

I have not tried it out yet, but it seems like you get a C see d for your telephone providers and can then do the same stuff.

Mm-hmm Yeah Q4, they also do this.

Let me double check the OK.

So it is just I thought Jim form did more than Terraform but OK.

Never tried it yet.

So I. Just want it today like a couple of hours ago.

But I will take a look into it as anybody kick the tires and in Q4.

Yeah nice.

That is the first one she heard about it.

Yeah or it was this based from the one from Capgemini or is that a for I'm not sure.

I did not investigate too much, to be honest.

Other products as well.

Cute folks keep farmers from like app code.

Yeah Yeah.

That's what it was.

It only went to the website saying absolute they do a handful of things I haven't looked them too deeply.

Interesting looking products.

Yeah, thanks for sharing.

Yeah you nailed it.

That's kind of like what I was thinking is.

I think that's where things are going to get easier and easier.

There's more of this is that we can use communities as the scheduler for everything to do.

Cool well I didn't want to embarrass Dale here a little bit.

Dale just got his certified Cuban his administrator exam K And so pass that.

Congratulations Dale.

Glad to.

Glad you accomplished that.

Great Thanks.

And that means anybody else is serious Cooper related questions, ask Dale any other questions see here got something that chat that someone actually do the Cuban writers application developer certificate.

Oh the.

Now I Katie Yeah.

Is that next on your list.

This one actually came about a little bit.

Maybe I think to go more into the security side of things as well.

Data science.

I'm not sure.

I may be doing them watching.

Yeah by the way Dale posted the resources he used to pass that exam in the humanities channel in suite ops.

Check that out.

Yeah, the first link for Linux Academy really does it good.

Or if you want to brace yourself itself, but the one on you dummy itself actually does this practice test that are pretty spot armed with a set of questions that actually go with it.

So are doing both of them you pretty much cover everything.

I have a question does Michael seem Michael a long time listener.

It looks cold wherever you're at in New York City.

I finally kind of gotten the weeds with this product.

I'm talking about was just started here, which is the rule, you are all service within a code fresh and so you know one of the things I do is I was to create a branch context read your book to create a project.

The project has its share configuration.

So acts like a meadow namespace and then what I'm doing is because we have essentially microservices that need to as an application must not support.

So what I've done is I've used Helen file right to basically install all of the applications as they are from the release branch of the latest basically, from our sort of the next branch of the charts.

Then basically running it again with just a diff image tag.

What have you.

I'm trying to figure out the best way to do that next part do I. I tried to do help while apply with just the you using a selector and with a namespace.

But that would seem to do just it all just ended up being reverting back to all having the same image tag and not like overwriting it with that diff.

I'm not sure how to make it a command line.

I have to write a file.

Unsure about what I needed to do in that circumstance if that makes any sense.

Yeah let me summarize then my own words also add some more context, I think for everyone else.

So see you're using ranchers distribution community rancher.

This concept of projects and you're using code fresh as UCS platform.

You want to create preview environments or like these unlimited staging environments as we call them.

And to do that.

You've defined your entire application architecture inform form as part of multiple helmed charts or releases and all of those charts the releases are in this health file.

And now you're deploying that file as part of your CI/CD pipeline.

And when you build a new image you kind of you want to update that service.

Now, the part or this is the part where I lost you a little bit is the problem like, why don't file apply wouldn't be working for you.

Well, I mean, I think I was running a health file sync at first while I biggest because I guess initially, I'm doing this in the model regrow itself.

Right like so it's not all of the list right now at the moment weren't being published because we still kind of build a pipeline.

So they weren't officially published.

We don't have a version.

I mean, no question when asked the versioning system for this.

I'm trying to figure that out to be able to create canary builds that kind of stuff.

And whatnot with you know I have other tools I use for my libraries that I obviously like NPM libraries that use like auto and semantic versioning, but unsure how to be able to make that work here.

Aside from that what site was a question you asked.

No, I didn't.

So I struggle to see where I didn't understand the problem you're talking about with drifts or patches or things like that.

Were you using tools outside of the home or outside of the home file to do the patching or changing images.

No I mean, you know what I mean.

I just use code fresh to create an image tag.

And then I don't like save that file.

I just kind of pass that as a set you assets as a set to the second.

Like yes I do the home file plan, which applies everything.

And I really do help upgrade with the diff.

You know for the one service that I'm trying to see against the overall and trying to.

I was unsure a what.

Which one to run like Darren helm file sync a second time he'll apply the first time I went.

I'm not entirely clear which I should be running in order to be able to your fly.

It gets to think.

Yeah Yeah.

Yeah, I understand the confusion and so part of the confusion is like hell fire as the tool has been growing really rapidly accepting a lot of cars and things like that.

So sink was the original command that helm file came out with.

But then to kind of more closely aligned with the pattern of Terraform where you have plan and apply and destroy the whole file implement that the same things.

So from what I recall I think helped file apply honours that installed flag inside of your releases so that you can maintain the state of weather services installed or not installed while helm files sink.

I don't believe we'll uninstall services based on that installed flag versus otherwise.

So that's that.

Now what you're doing.

I can't I can't say it's wrong.

Like the two phase apply process.

Well but it's different than what we do.

So what we do is that configuration like 4 4 held file, you would use environments.

For example, in the environments if you like.

You don't want to rewrite that file if you want that dynamic and that can just refer to a go template macro for interpolation function for get in right to get there.

That's exactly what I'm doing actually to be clear.

OK I was doing that itself.

And what I found was just weird stuff like music.

Initially I'd like I have environments like with ephemeral and broad and QA which is where my static aesthetic values live right.

So that's where like you know the main thing is that the image tags is what essentially, I wanted.

All right.

Now, if I tried to float up the end up everything there then everything would have the same very value that I set.

But that just sounds like the schema is wrong if you're held a file.

So like that.

And I'd like, OK.

So not everyone is familiar with how file Alma file is a tool that helps you describe all the helm chart releases that you want to have in your cluster.

And the problem with helm is right that the values file is static.

It doesn't support any templating any interpolation functions.

Anything like that.

So the tool help file came out.

Let you describe now what helm releases you want to have and lets you as a developer kind of define your own schema in the form of what they call environments.

The problem with helm and helm charts is that every Helm chart every developer has their own way of describing how to install this application.

So it looks like a mess when you try and stitch these things together versus the helm file.

Now you can define that schema.

All right.

So in the schema your case, you have somewhere where it says image tag.

But it sounds like you only to find that in one place.

It sounds like when you want to achieve this you want to define the image tag for each service.

In this modern reboot what does difference.

What I'm trying to do for the ephemeral right is not only define I have a default. So falls back to like the next tag.

Yeah which is what we tag everything with.

Once it goes away.

What then I want to override just a single image tag for a single service.

Now what I'm actually doing now because I couldn't figure out how to use it twice.

So I'm just doing the whole file sync on everything.

And then is running the helm itself.

I guess the one released and upgraded that I say.

So Yeah Yeah Yeah like that can work.

Yeah, I think maybe if you're able to perhaps after the call or even during the if we have time.

If you can share some portion of that home file that you have been better at answering.

Also the other day the other concern is that using the next tag while it makes a lot of sense to humans.

It's also really hard to really have assurances of what's running and what we would typically do is we.

So we always tag the images with multiple tags and code fresh makes that very easy.

So Yeah, we may be tagged with next.

But we always tagged with the commit shop and the commit shop.

I'm doing the commission with the code fresh branch normalized thing in front of it.

So yeah.

Which is what my name spaces while I also create a new namespace each thing.

So exact that the image tag is that.

And that's for the federal environment.

And then once it gets promoted to the next environment.

It has that plus the next day.

Yeah, I'm only doing next because it was like, how to like how other than pulling down the latest release from code fresh from the registry.

Once I do that, then it will be that I can use that instead.

So let's see here.

I don't know how I can quickly.

And I go upstairs will pick and go to the computer.

I'm on mobile right now.

OK Yeah.

I don't think of an example.

I can pull up.

Well, the reason we fast enough for this to not waste a lot of time on the call.

I don't mean to bother everybody's questions is probably go while I'm getting upstairs and changing my zoom to the computer.

Sure thing.

Also here.

Are you using helm file as part of your process or not.

Yeah, but not us.

Michael's planning it to do.

So we have basically each application has a helm shots.

And then we have basic infrastructure, which is deployed by hand file.

So basically, our set manage our ingress.

All that stuff was two part time file.

But the other stuff is not thinking about what he's trying to achieve.

I think he could use jq to see which tech is currently report deployed.

And then check basically, if there's a difference.

And then only deploy the same file apply the changes that are actually out there because I think and sync and apply.

Difference is only that there's.

So my understanding if we look at this example of that.

I add here is that you'll have many releases of related to this one monetary one.

One could be bold for example.

And then there's an image and then there's the tag here.

And then he has multiple releases.

So here's bolt and then down here will be another one like refiner.

And let's say he's reusing this syntax everywhere where he has the image required environment variable image tag.

But he has that repeated everywhere.

Then how do you target just want changing the image tag for 0.

Maybe I understand.

So that your problem is that it's not deploying maybe if because you're using the next tag and if you just target one service.

How do you just target that one service to update.

I think the disconnect here is that if you were using to commit shy instead.

And then deploying like refine it with that commit shot.

Then that one will update.

But not the other ones because what you can do is you can combine required and with like a coalesce so you can say like Grafana.

Image tag.

Otherwise default to the default image tag.

So that lets you override individual tags for each individual environment or individual releases.

But I am guessing he is in the staircase right now.

I will I will pause that temporarily.

I'm here now.

Sure can share that.

OK go to office hours.

Was that my thing or should I share my screen.

If we want to let this go.

Yeah, we can.

I'll start right.

I'd rather not you know share my code on a thing because.

OK if you don't mind, I'm sure, you can also just as a reminder, these are syndicated.

Yeah, that's OK.

I don't mind that.

I just you know.

So you see my screen now.

So Here's what I have now.

We have a model repo obviously.

Each of the services up here, and then I just a ploy folder in which case, I have the charts themselves.

Yeah And then I have the releases.

And then I have appetite.

Yeah Yeah.

This is kind of what I'm doing with this right now.

Yep this looks good.

Now show me where the image tag is to find the image tag right now itself would you define for a environment.

Image right now is here.

So define OK next.

And then I override this.

But there's also set in my view, in the chart itself, which is.

Yeah So it's just the that I would do would you want to do this replace line 16 with a go template interpolation.

And there, you're going to look for the value of articles image tag and if articles image tag is empty, then you default it to next.

I would get that by clear.

I'm just doing this right now.

So it would be.

Yeah articles knows the way articles.

No, no, no, no.

So put like required and not require them.

I'm drawing a blank on the exact function name right now.

I think it's getting this.

Or is this just.

OK And not.

And space like this right.

Exactly no.

Yeah, I would stand you know standard environment syntax.

So be or.

Exactly So like that.

Yeah But what I would actually do because you want to surgically target one of your releases not all of you.

So now what I would do is I would change that to be a prefix that with articles image I got it makes total sense.

And then whenever I'm going over something if all value is to be next.

OK So behind idea if you would basically do a pipe and then default next.

Yes, exactly right.

I gotta say.

So the default is here.

That pipe pipe default next is a pipe.

I thought it was the other one I think.

I planning for a pipe.

So sheer pipeline and a string with next guy makes it work.

Yes each of my services in that way, I can.

Exactly So now you can surgically target each one of these with the tag you want to deploy it, which would be in your case, probably the commit shop is right on.

Well, thank you.

I appreciate that.

That's an enormous help because I know that you would do that.

If I do this.

Now I can just run the helm file just once right because yeah you just run it once, then you call start to export or whatever, just set that environment right also.

But to test this, I would just like to just export a ticket image tag and to have high Def and you get a Def and a church only chose this.

Let it change.

Yes it's changed.

So I said export Oracle's image tag.

And then do that locally.

Yeah Yeah.

OK that actually would solve many of issues.

I'm Morgan Brennan to double double the point.

So thanks, guys.

I appreciate that.

So oh come on now stop my share.

But I appreciate it.

You know I was looking forward to this like all week sounds like you finally have it is you that's like hamstringing things that Secretary has a question regarding to get UPS if there's no one.

Yeah, one quick question.

Do I get rid of the initial values that are in charge themselves only use his values.

No you don't.

You can leave those.

It doesn't matter because values that gamble will override whatever you define in those charts.

So if you make your chart something deployable by default, I think that's generally the best practice.

And now what I was doing.

Also thank you.

Next question.

Yeah, I feel that if anyone has any like experiences with pipeline that's code managing with many multiple development teams.

I'm open to hear it.

And I'm at a point now where I'm at, 20, 30, 40 repos and know there's old pipelines code that I have to redo and I'm struggling just working with some offshore teams and they won't let me touch stuff.

They won't accept my peers.

And it's driving me out of my mind.

Well, I'm kind of curious like if anyone's strategically worked around that other than myself.

I'm working on like using outside libraries and some of my pipeline is code and you know you know I keep on thinking I just want to do my own dedicated branch to do that.

You know.

That just seems wrong to me, seems like an empty pattern.

So I'm curious if there's any other.

Experiences out there.

So I can go on talk about it if you want.

So I had this.

I had a similar issue.

We had outsourced one of our projects and the team was really afraid of.

If I do the pipeline stuff and they had a very weird schema.

I think I a similar issue right now, whereas I deployed to production into a branch and has three different benches for environments correct.

Yeah, we are multiple branches we're doing per environment branch at the moment.

So one of the issues that annoyed me the most was that I never knew which features were deployed in which branch.

And so I Justin also had what I was saying goes to mass and we use tick UPS to promote this was the only way for me to manage it.

But also you could have the deployment process in a separate repo and have just updating the array will come across or put up the available talk attacks pushed into a different repo out of the cope update.

And then manage the deployment process in this secondary repository.

OK That's why I'm starting as a viewer toward.

So at least I know I'm not feeling like I'm totally off my rocker and go on that route maybe.

Eric Yeah.

So if I also know that low end under wrath has some experience with this.

I think I might have misinterpreted the original question.

If you want to just restate the problem again.

Well, so I've got a lot of pipeline as code in the repos as it probably should be.

It's supposed to be, and I'm to the point now where I've got several many multiple repos that I'm helping manage as a singular DevOps resource amongst multiple teams and certain teams and projects are more sensitive and they will not allow me to make pipeline changes easily.

It's just seeming like for although I am in a release you know I'm going to try to maintain a cadence with the development teams in some cases.

I need to make changes to their pipelines to energize and improve things.

And then I'll let you know I guess like a better way of describing it.

And it seems like there might be a better way to do so.

OK So they're concerned about changes in flight to kind of these pipelines that could reduce the stability of deployments.

So don't blame them honestly, you know I don't know.

I mean, these things break relative you can break pretty frequently.

OK So one other one other kind of procedural thing or organizational thing is adding code owners right so that when changes to the pipelines happen members of their team that are set as code owners fought for those pipelines have to improve on those changes.

So that's one way to enforce it without disabling kind of like ability to do pipelines as code.

So that's one thing.

And then.

So pipelines.

Yeah, that those themselves have their own lifecycle almost right that you need to be able to validate that that's not breaking things and you have are you are you a what are the common types of changes to the pipelines.

And are those pipeline kit.

Can those changes be validated, but by using preview environments.

And that's the changes that I'm looking at.

So I mean, we put these things together about a year ago.

And there's a voluminous amount of pipeline delivery code and things like that that I personally wrote that I'm not even proud of.

But it was needed at the time right.

Yeah So you they're saying, you know and that is we all know the more could the have and your pipelines the less simple it is and less important is the more it's Yeah breakable.

Right Yeah.

So I mean, you know I'm so I'm trying to simplify things down and make it declarative as possible and going back to some of these old still working pipelines and trying to like remove unneeded elements and things wrong along those lines.

So I am unfortunately stuck in Azure DevOps in this particular case.

But I'll use it to its fullest to do things like pull out pipelines into its own repo and reference that repo externally.

And I greatly simplified a lot of the various repos pipeline code.

But in the same regards I'd point to an outside repo for the code.

So that's better or worse, that's what I'm doing.

But I want to do this more for some of the other for some of the more delicate pipelines and I just I'm just making them tripping up and working with some of the developers on these things and maybe it's just me.

And you know.

But I just wonder if anyone else has dealt with that in any other way.

I keep on thinking, maybe I should just have a dedicated branch for pipeline code you know in each repo that they can't mess with and that sort of strips the control from the team.

And that's not that don't feel right.

You know some way to break other things.

So I don't know.

That's a sort of deep coda.

Are you able or using get up or using another piece.

Yes it's the area of DevOps repositories.

Oh, OK.

I'm not familiar with those.

It is a lot of hassle oh in the audio environments.

OK Yeah.

So the code owners might not apply to that.

Right on.

Maybe you in their branch policies and whatnot that I've been slowly been able to get enforced.

But yeah.

Code owners is they've got something similar.

Yeah, people notify you when there's changes.

But then getting them to approve.

It's a different story.

So what maybe I can have a private conversation with you because it's maybe been over the scope for this conversation or for this talk right now.

But I had an accountant exactly the same issues that you had this application team removed the info as he pipeline stuff that I introduced.

And because I have like 50 to repositories that I'm working on.

Yeah, I wasn't aware of it.

And then they said, well, we are not able to put any more because I restricted the UCLA access.

And yeah.

I wasn't even aware that's a push differently now maybe.

So if you want to 52 plus repositories at the same time right.

Gets hot, especially if you have 120 injured.

Yes stuff.

All of really smart people you know it's not like you know.

Yeah, they'll find a way around things.

Yeah, I'd like to chat you on this.

I appreciate that.

Thank you.

All right.

It's really nice to hear from I I was going to wait to eat or was asking if we were still going to do the CloudFlare demo and I do want to do that.

I was going to give him a second to join here.

I'll get CloudFlare teed up in the meantime.

Any questions.

No, I love puzzles.

Well, I guess if we're just I'm trying to play GitLab on queries and it's going to be publicly facing.

So I'm trying to do it like I wanted to be an exemplary deployment and I've been using their help chart or I should say the instructions for their Helm chart.

And I'm thinking, wait a minute.

There's got to be a better way to do this.

They just seem to be lots and lots of little moving parts.

Like oh you wanted to s three.

Then you got to go over to this demo file and tweak that.

Well, you want to use those three with IAM roles.

Oh, hang on a second.

Now I run it.

So suddenly I find myself trying to configure k I am.

It's just it's not that everyone, or is it just me.

It's not just you.

OK I will.

You'll you'll see all these beautiful demos of my I did.

It's that easy.

But the reality is once you get to like operationalizing something for production, you've got to worry about emeralds.

You've got to get those roles into your services.

You gotta choose if it's like, OK, in this case, you need an S but it'll keep you've got a provision that lucky you got the back.

So that's why we have so much freaking Terraform stuff to support coober neti stuff because there's so much that Cooper needs can't do.

That is like cloud platform specific.

Yeah, this is where you know it's just that we're a little bit early in terms of where things are in humanities but like the operator that terror that cube form that was shared earlier by a peer that looks interesting or like a service operator.

The idea here is that if you can get more of this stuff all in to Kubernetes you're not bridging tool sets.

Yeah then it gets easier because you can have a helm chart that says, hey provision this Terraform code and take the outputs of that and stick it into this community secret.

And then, you know, blah, blah, blah, blah, blah.

Do all this other stuff in Cuban is.

I don't know that many people.

I don't know anyone doing this, though really in production right now typically multiple pipelines that have steps that this part doesn't Terraform this part does the OK robinet is and trying to make this turnkey for net new applications doesn't.

I'll just man up and Yeah, I don't know anybody disagree with what I said there.

I would love to be a rock.

OK So here is CloudFlare CloudFlare workers have gotten a lot of attention CloudFlare workers are a little bit like running lambdas on the edge like AWS lambdas.

I am not like a super expert at doing this stuff.

But I think that's a testament to workers.

But because you don't have to be very hardcore to get a benefit out of it.

So let me go about how we've benefited from using workers.

And I think these are some common problems that others might have too.

So like you know we launched our podcast at a podcast club posse and that's through buzz sprout and buzz sprout offers branded kind of micro sites for your podcast.

Lots of other services do this too.

For example, get review what we use for a newsletter that also offers branded sites.

Thing is how do you have all these different micro sites that they never give you the ability to customize the menu on them.

They never give you the ability to change the CSS on these things yet is your brand.

You want to have control over those things.

So how do you do that.

Well CloudFlare workers makes this pretty awesome because you go over here like to your DNS settings.

And when you set up like the DNS for look load here for a second.

All right.

So let's take a look at the DNS for our podcast.

So here is the podcast.

No, that's the record.

Yeah here's the podcast.

So you see it's a C name to the buzz sprout a year old.

But it's proxy.

So these all requests are going through CloudFlare.

So if I go over here to work here.

I can now set up a rule that says podcast star first goes through this work right here.

And let me just make sure unfortunately, there's no really good secrets management thing here.

So let me just make sure that I'm not going to leak any secrets by opening up this example.

Yeah muted or I hit mute somehow.

Thanks for pointing it out.

All right.

So what we have here is an example of where we're going.

We're rewriting the content on the fly that we get back from the upstream requests from CloudFlare.

So what I'm doing here.

The objective is that buzz sprout.

They offer this podcast stuff.

But they hijack the canonical link.

So that they say buzz Broadcom is canonical for my podcast yet.

I have a branded site that's not cool.

So what did I do.

I just configure the CloudFlare worker to rewrite that content on the fly.

So here I rewrite all a tags linked tags meta tags on the fly when I see these patterns I just replace it with my podcast.

So that it gave me full control over the content there.

Which is cool.

I can also do things like rewrite the rewrite the site map site map XML to 0.2 links in our own site as opposed to their site.

So that was some of the examples there that we did.

Another example.

That's really cool is.

Let's see here, if you need to dynamically install headers like content security policies.

You can do that on the fly.

For example, we're using the Slack in plugin.

So the slack in no JSF for Slack invites.

We could go modify that code.

But then that's just more overhead that we got to do when we just want to do simple things like maybe at the end or so years.

The ability that we can do that on the fly for those requests.

Very simple little script streaming optimizations.

No, I want to go to slack archive redirect.

So many of you run single page application space you deploy those to service like s three, you're limited in what you can do.

So if you wanted to do a redirect it's maybe a little bit harder to do so very kind of pattern is you have a stub file that has a metal refresh equip in there and that made a refresh does the redirect but that kind of sucks because you first see a blank page and then redirect and it's not good.

So here's an example where we fetch the upstream.

And then look for that tag.

And then we do a rewrite automatically at 301 and serve that back to that.

We see that rejects matches.

So that's how we split up our outside.

We actually see the console logs.

Can you see someone somewhere.

Or so.

Yeah So what's cool is the way they've done it.

I mean, if I go here to I got a change domain.

In this case sweet UPS and then go to.

So here's this route I've configured everything to go to slack.

Archive redirects I can launch the editor.

I'm going to open up a slack archive redirects I'm going to go to that domain here.

Mobile are still all the console logs and everything.

My debug output I see here.

OK So this makes it really good that you can just iterate very quickly, you can update the preview as you're updating your code here without deploying it and seeing it.

So it's a safe kind of developed mode.

The downside is they don't provide any access to logs period.

Cloudflare it's not a server.

So what you got to do is you gotta integrate this century or some other third party service where you can basically post your exceptions in your log events and stuff to some third party service be related to this.

There's another similar feature that's existed in CloudFlare for a long time.

But it was implemented using workers.

I believe a behind the scenes.

But not now.

I'm just truly in love with it for these reasons.

So if we go to apps.

They have a marketplace on CloudFlare with all these hacks.

It's like a marketplace of hacks like the different blades of your Swiss army knife and there's a few in here that are really special.

So if I go over to install apps one is you can inject a site.

Now onto any page, you can add CSX onto any page, you can install a Tag Manager on any image, you can add each email to anything age.

So you can see how you can really do anything you want at this point with any page any site that's part of your network.

So like if I go now to cast that cloud posse I have a navigation menu here that's not provided by them.

That's provided by CloudFlare that drops me over to the newsletter.

Would you look at that.

I got a navigation menu that's not provided by them as provided by CloudFlare.

So all these things.

Let me stitch together all these different properties provided by different services that micro sites and get review.

They had a really bad CSX problem that made the stuff look like doesn't look beautiful.

Now Don't get me wrong.

But it looks atrocious before.

And I was able to inject mounts yes to clean this stuff up.

That's called the vertical wow.

This is amazing.

Yeah So CloudFlare is pretty rad in that sense.

But there are other examples.

I'm not going to say names of services that do this, but there are certain sites that can go to a pay up if you want to have a branded hostname.

Not anymore.

If you use CloudFlare you can change the origin of your requests and you can serve it under any domain you want on your site.

So you can do your own branded versions or micro sites of services like that.

Oh, wow.

This is under the free offering too.

Yeah you basically.

And distinctly CloudFlare.

I mean, God bless them you know, they are really fair in their pricing it's not that they have also overage pricing.

So you can start for free.

And then they have a reasonable rate that you can pay per million requests or something like that.

I hate these services that require you suddenly to you go from spent $50 a month.

Then they want a one year commitment at $1,500 a month.

How does the basic fee and compare to front say that when we're done.

It's just a basic TDM capabilities.

How does that compare it to something like we crowdfund.

Well, so this cloud front is pretty easy to beat that.

That's probably the worst CDM functionality wise that I've used.

Now it's gotten a lot better with the edge lambdas on cloud front, but the fact that any change you make to cloud front takes like 30 minutes to propagate.

That's a massive business liability right.

Like oh, we made some mistake an honest mistake.

OK this is down for 30 minutes.

I don't like that was pretty rad with CloudFlare is changes take seconds or less.

Most of the time.

And they manage it to Terraform or manually.

I confess we personally are not for these things.

But CloudFlare is investing a lot into their Terraform providers.

And I believe a lot of this stuff can be done with their form.

They also have a command line tool I for get what it's called, but they have a command line tool to manage this stuff and be more command line driven perhaps somebody can correct me maybe.

But I don't believe CloudFlare has scoped API keys.

And that's kind of like scary.

So Stephen Dunn fought for companies that want to use Terraform with CloudFlare and control kind of the blast radius of changes is you can actually use proxies with Terraform right.

So then there's a company called b.s. very, very good security and very good security is pretty rad because they'll be your man in the middle proxy and that you tokenized is basically that request.

You can use dummy credentials.

And you can scope requests to certain parts basically certain API endpoints.

And it will inject the tokens only for those requests only from IP is that you want only using credentials that you set and those credentials.

If x still traded or leaked are useless to anyone else.

So Yeah Terraform with very good security and CloudFlare is kind of like the pattern there.

We're almost at the end of the hour here.

So I'm not going to do a demo of this, but it's interesting.

We can do something arrange it a better demo of this.

The idea here is like you can create a route for like API that GitHub you can whitelist a certain IP range.

And then when these conditions are met.

For example, the path is such and such, then you can do replacements on fields in that request.

And since most service most things support the HP proxy convention, then you can just have this happen transparent.

All right Erik do you know anything about the kind of phone data protection laws that go into effect beginning of the year.

The GDPR stuff.

Well, I don't know how you call it in California.

But I think it is the equivalent of the same.

Yeah, we haven't been doing that much.

We haven't yet.

We haven't you know buy our customers been asked to help them with any of that stuff too much.

The cool thing, though, is with CloudFlare you can implement that across the board automatically.

You can install those retarded little pop UPS.

You have to have now on sites that say we're exporting cookies as if you didn't know.

And those cookies store information.

Well, with this if they click No CloudFlare can automatically always drop it.

So you can't track them.

Yeah There's some issues.

A bank in Germany implemented Taillefer last weekend and didn't update the data processing agreements on the back page and didn't inform the customers because they're around the details attack and basically counter.

They did so as to ask termination thing.

And now all the customers are pretty pissed off because they gave closure over all their banking information.

But that's the thing that think that happen happened Germany, and it's a pretty big GDP violation when it did, they screwed up the cash rules.

So they were cashing private data or not.

But I think just we'll see initial things that happened.

But they didn't inform the customers that there is now a middleman.

You have to do.

You have to have data processing agreements with the companies and they have to be accessible for your end users.

Yes And they did not do that.

And yet.

Now that 30 is that is really interesting.

So I don't know.

Yeah somebody else is using CloudFlare and you can share more information about that next time.

Maybe it really is.

All right, it looks like we've reached the end of the hour.

And that about wraps things up.

Thanks again for sharing.

I've learned a lot this time and that some others got the problem solved.

So that's pretty cool caught a recording of this call will be posted in the office hours channel.

We'll also be syndicating it to our podcast.

So see you guys next week same time, same place.

Thanks a lot.

All right.