Public “Office Hours” (2020-02-12)

Erik OstermanOffice Hours

1 min read

Here's the recording from our DevOps “Office Hours” session on 2020-02-12.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here:

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours is February 12th 2020.

My name is Eric Ostrom and I'll be leading the conversation.

I'm the CEO and founder of cloud policy.

We are a DevOps accelerator.

We help startups own their infrastructure in record time by building it for you and then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to unmute yourself at any time if you want to jump in and participate if you're tuning in from our podcast or YouTube channel, you can register for these live and interactive sessions just by going to cloud posterior slash office hours again, cloud posse slash office hours.

We host these calls every week will automatically post a recording of this session to the office hours channel as well as follow up with an email.

So you can share with your team.

If you want to share something in private just ask him could temporarily suspend the recording.

With that said, let's kick things off.

So here are some talking points that we can cover to get the conversation going.

Obviously first, I want to first cover any of your questions.

So some things that came across or came up in the past week since we had our last call.

Terraform cloud.

Now supports triggers across workspaces.

John just shared that this morning.

I'll talk about that a little bit.

The new ADA of US clay is available with no more Python dependencies.

However, I'm still not celebrating it entirely based on my initial review.

Also this is really wise is quote that was simply put in our community yesterday or some things like you can't commit to the overhead required to run something you're introducing a weakness into the system rather than a strength as they'll quickly end up in the critical path.

So that was the way that Chris Child's said something and I want to talk about that some more.

See what reactions we get.

But before we go into those things.

Let's see what questions you guys have.

I have one thing when you're going through tariff for Terraform cloud can you also go through just your general experiences with Oprah.

And I were looking at using it earlier this week or just having a little bit of some pain doing so.

Yeah, just some general experience that would be useful.

I can give you some kind of like armchair review of Terraform cloud.

We are not using it in production as part of any customer engagements we've done our own prototypes and pieces.

So I think the best thing would be when we get to that point if the other people on the call that are actually doing it day to day.

I know John bland has been doing a lot better from cloud.

I don't let me paint him on suite ops.

Let's see if he can join and share some of its experiences.

Do you guys know if you can continue using remote state with S3 with Graham cloud.

I couldn't figure out how to do that.

Well, you should be able to let me explain.

Mm-hmm It's a good question.

But IM not 100 percent of what you put into it.

So yeah.

So I Yeah, I cannot speak from experience trying to do it.

What were your problems when you tried.

I mean, I assume you had the best credentials and everything hardcoded.

And if you had that provider block or that back in Setup set up it was airing or it requires that validates that you have a from Workspace back in.

I personally came to find the place where you can even put Intel from cloud the crates.

I would be in environments settings.

So there's the build up using it as an environment variable.

I know by guy I exactly you have to do that for every single workspace yet as retarded as it is.

Exactly we don't like that either.

No awesome.

John's joining us right now.

So John has spent a lot of time with Terraform cloud.

So he can probably answer some of these questions or you and Mark.

Welcome howdy.

I is going to have you mark have you gone to play with Terry from cloud at all yet.

No, I haven't even browsed the docs.

OK Just curious.

But Brian, your.

You've been dabbling with turn from cloud a little bit or.

Yeah, just taking it out.

It was because I was working on data from provision provisioning of my EFS housing on a kill two birds with one stone.

Yeah And dabbled with it didn't love it.

So I probably am just going to do my on provisioning code fresh.

Yeah, it's a little bit more intuitive for me, especially because I used her from CLI workspaces.

Yeah, it it'll be a lot easier for me to implement something that's driving reusable if I were to just do like a cut.

I could fetch that already does the right like workspaces commands for me.

Yeah, I'm hoping that maybe in a couple of weeks or a few weeks, maybe we can do a revised code fresh Terraform demo on this.

We did one about a year ago or more.

But this time Germany on my team is working on it.

And we want to kind of recreate some of the constructs of Terraform cloud.

But inside a code fresh.

So that it integrates more with all the other pipelines and workflows we have there.

On the topic of terror from I would want it, I would want it.

So like I go back and forth on my decision to use Terraform workspaces.

I love the fact that it was so easy to use the same configuration code for so many different environments and I've been able to take advantage of that.

Why didn't love was having to kind of act together way to get all the back end to point to different S3 buckets and different AWS accounts.

I'm curious if anybody's ever worked with that.

Plus if they ever switched off of it to go the other route where we kind of have configuration per her database account that might be less dry.

Such using tigre.

OK, let's.

Yeah OK, let's table that temporarily.

I see John just joined us here.

Let's start with the first question there on first hand accounts and experiences using Terraform cloud.

I know John has spent a lot of time with it.

So I'd love to for you guys to hear from him.

He's also a great speaker.

Well, thank you.

I've seen the check in the mail.

I actually I've done a lot.

Whatever form cloud as the primary c.I. for all of our Terraform and generally, I like it mainly because it's a little malleable like you can use it for like the whole Ci aspect or you can run your c.I.

I mean, you're Terraform in fresh air anywhere.

And it's just your back.

And that's it.

Instead of having S3 buckets everywhere all your state is just stored.

They're really easy to do remote data things that I saw terraforming it is really easy to do.

They have a provider that gives you poor access.

And I did see on the agenda there the talking points the workspace, the run triggers actually did it video or that be wasn't that only to already.

Well, yeah, except now.

Yeah, I've wanted to just play with it non-zero models were recorded and it's decent.

I think they have some improvements to do to be able to visualize it.

But we actually do utilize I forget who it was speaking Brian.

I think we do utilize multiple AWS environments and from our Terraform scripts where we set it up.

We actually have each workspace control or we tell each workspace, which the environment is going to use.

Now this is using access key and secretly preferably we'd have something a little more cleaner that was a lot more secure than just having stale access fees sitting around.

So that's one gripe I didn't have with it.

But in general, we've had a lot of success running it in Terraform cloud water.

OK So you're leveraging like the Terraform cloud.

I don't want to say it's like a feature chair from cloud.

But the best practice a chair from cloud using workspaces or using lots of workspaces and terrified and how has that been.

Because while workspaces has existed for some time it wasn't previously recommended as a pattern for separating you know multiple stages like production versus dev.

How's that working out.

It actually worked out really well, because locally where you set it up, you can set up locally.

Sure Yeah.

So I mean, the reflection here once again.

There you go.

There is a difference between tech from cloud workspaces and from CLI workspace stuff.

Yeah, sure.

So the this is just my little sample account that I play around with the tutorials.

But if this was set up locally in the CLI and because this prefix is the same.

I can set my prefix locally and my local workspaces would be called integer and separated.

And so locally.

It maps directly to the local CLI actually.

So I can say Terraform workspace, select integer.

And now I'm on that.

And I can see a plan and it'll run not playing right writing on Terraform cloud.

I don't have to have variables or nothing locally.

It'll run everything in there unless it's set up as a local because you have multiple settings here.

Would you be able to do that right now.

Sorry to put you on the spot.

This is exactly what we're trying to do.

And if you're saying that it's actually much easier than I initially thought then I might reinvestigate this.

Let's try it.

But thanks for roll.

It's lights up for everyone else.

Maybe if you just joined.

John has been put on the spot here to do a unplanned for demo of Terraform cloud and workspaces.

Possibly even the new beta triggers function.

But on a different sort as it's set up there and working.

I can definitely walk through the triggers.

Peter would be especially useful for us as well, because there are scenarios where we run multiple turns from the place where you can sleep to reports, a serious long shows.

And we can also get close to Yum It want to have like five minutes and then we can talk about some other things and come right back to this.

Yeah let me get a few of my things to worry about just the connection and all that sort of stuff.

Yeah Cool.

Let's do that.

And we'll just keep the conversation going on, other things.

See what we can talk about there.

All right.

Any other questions.

All right.

So I guess I'm going to skip the Terraform cloud talking point about triggers across workspaces.

I think that's going to be really awesome to get a demo.

Basically to set that up as you decompose your terror lists into multiple projects.

How do you still kind of get that same experience where once you apply changes in one environment can trigger changes in another environment.

And that's what these triggers are now for moving on AWS has announced this week, that there's a new clay available.

I'm not sure how new it is per se, but they are providing a Binary Release of this clay.

I suppose it's still probably in Python they're just compiling it to write code might.

The downside from when I was looking at it is it's not just a single binary, you can go download somewhere there's still like AWS clay installer.

So they're following like the Java pattern right where you still got to download zips and sell stuff.

Personally, I just I've gotten so spoiled by go projects which distribute single self-contained binary.

And I just download that from get up to this page.

And I'm set to go.

So has anybody given this new client, a shot.

Now you mark calling you out.

All right.

Don't buy it yet.

Cool And then there was one other thing that came up this week as somebody was asking kind of like you know I think the question the background question was like alternatives to running bolt and if it's worth it to run bolt and Chris Chris files responded quite succinctly so thank you.

We've heard this said before, but I thought this is a really succinct way of putting it.

And that's like if you can't commit to the overhead required to run some new technology like Cuba and 80s balls or console you're introducing a weakness into the system rather than a strength as the quickly end up in the critical path of your operations.

And I think this really resonated with me, especially since we run this DevOps accelerator and our whole premise is that we want our customers to run in and take ownership of their infrastructure.

But if they don't understand what they have, and what they're running, then it is a massive liability at the same time, which is why we only work with customers ultimately that have some in-house expertise to take over this stuff.

And also Alex just Yeah getting some thumbs up here from Alex Eagleman and Bryan side both agreeing with this statement actually.

Yeah, that actual response to the that's what's the response to something I mentioned.

So the original person asking a question that came about came at then and also get.

From what I understand.

And I just thought that maybe I'd remind them you know like maybe want to just give centralized management a shot the volt really is going to want to sound like I'm pushing it that much.

But the reality is if you look at a hash record that created Terraform created volt they make most of their money from both.

They really do put a lot of product hours into featuring sorry.

Why didn't the feature set that product.

So really, it's a mature solution.

Yeah 100 percent when it comes to houses response.

It's very true.

What happens is actually, a lot of the time is if you don't commit a lot of people they like they take the route token, and then they distribute it to everybody and it becomes more of a security hold than a security feature really.

Yeah And really, it's reminiscent of terminators as well.

In my opinion, we really need like a large team of people putting energy into that to actually make full use of it.

So it's not a burn anymore.

It's actually something that can help you pick up velocity exactly like you want to take these things when it gives you an advantage a competitive advantage for your business or the problems you're trying to solve.

Not just because it's a cool toy or sounds interesting, but Yeah, those are really good summary.

Thank you for it.

For a peer.

Just secrets management.

I would.

Probably doesn't have all the bells and whistles of a vault obviously.

But I went with it to be a secret manager.

Or you can use parameters store does it much easier to maintain.

Yeah Are you making copious use of the lambdas as well with Secrets Manager to do automatic rotations.

Not yet by but definitely something that I wish I had time for.

Yeah, because it also requires application changes right to get all right, John, are you.

You need some more time.

No really.

All right.

Awesome Let's get the show started.

So this is going to be a tear from cloud demo and possibly a demo of triggers across workspaces, which is a better feature and terrifying cloud.

So this isn't going to actually give me a plan, because I don't have the actual code for these repos locally on this computer.

But this is the time to show how the workspaces actually work.

So essentially, you set up your back in as remote hostname.

You don't really have to have it.

That's the default. What organization and in this case, I'm saying prefix.

So if I actually change this in the say name to random.

And if I did in a net on this.

It no.

Yes, I need to.

Yes, there because I already initialize.

That's why so by setting the name essentially, it's supposed to.

What did I miss.

They weren't just a minute ago, I promise.

It's always this way.

Let me clean up this dirt one.

But by saying a name I kind of found this by accident.

I didn't mean for it to do it.

But they go it'll actually create the workspace for you.

So you technically don't have to do anything to create a workspace.

It'll do it for you.

In that case, it doesn't give all the configuration there.

But if you utilize prefix here instead of name just wipe it.

What it does is basically create to Terraform cloud and says, hey, everything with this as a prefix is going to be my workspaces.

So in this case, I can say, let me select the integer workspace.

That's awesome.

And so if I do a workspace listed, you'll see that same list there.

And then you can do select separator any one of those.

You can also.

And I'd have an alias for Terraform by the way.

So that's why I'm just saying to you.

You can also get your state.

Of course, if you have access to that.

So we can say show we can pull that locally.

And so it'll output the actual state here.

And then my favorite part is actually planning.

So I don't have any variables everything.

Mind you this workspace doesn't have a lot anyway.

But it's actually running this plan on Terraform cloud.

It's piping the same output.

It's common just like you would normally expect.

So it's piping everything to my local CLI here.

But for console.

But it's basically this.

So you can see the output matches.

But the beauty of this part is I can have all of my variables in here completely hidden.

Any secrets that I want and none of my developers ever see them.

They never know that they exist or anything locally, but they can play an all day and do whatever they want.

And so this is destroying because I don't have the code.

So it's like, well, it's gone.

This random resource integer, but that's the quick run through of utilizing those workspaces locally.

I mean, it's really just this.

And I have a tee up bar set with my token locally.

So that's I have that work to also go back to the tower from cloud UI that we had the settings.

The variables because just because it came out the second a little while ago.

Brian was asking about environment variables you see there that bottom.

Brian Yeah.

If you need obvious credentials you can stick with me.

OK You're not using a dubious provider right now.

No, I'm not going to put them off.

Yeah, no.

OK And nothing precludes you from using the obvious provider.

So long as you still provide the credentials.

Yeah Yeah.

So you know your random workspace.

I got created.

Do you have to go.

Do you manually go in and add the eight of his grades for those I actually Terraform the entire thing.

So that's all done through Terraform so basically terraforming Terraform cloud.

So essentially I'll generate a workspace and that'll help my general settings there, and then I'll just you the last two or three variables.

And you can do environment variables this way too.

So you can kind of tie this in too with like your token refreshes and things of that sort.

Especially those I've mentioned or.

So there's the ball provider and you can actually tap into a ball here actually gets you a token key from your AWS or however you want to do your authentication there get your token from AWS your access key, et cetera.

And then plug that into Terraform cloud.

So that way it's all automated.

And you're not just wasting variables in there.

Do you have to run your own vault or do they run that for you know you could if you're running your own.

So I would assume assume someone doesn't have all.

How do you plumb anybody's credentials and or ask.

Yes token generation in.

So we just came.

So just utilize came to mark them all as you like.

You don't want to put that stuff in code right.

And so use came as encrypt the values manually.

We built a little internal tool to do it.

But encrypt those values put them in code and then once the workspace actually runs, it'll actually create manage update all the other workspaces.

So in essence, you have one workspace that has all the references to all the other work spaces are supposed to create and it'll configure everything.

And so in that one, it'll decrypted came and then add it to the project or the specific workspace as a environment variable.

And so there is when you say came as you're using SFA we do use that system to store that the product of the commercial kitchen blob no.

Now we just encrypt the value in games.

But we actually we actually use it as a Sim for farm gate secrets.

But this repo here is where I actually have a video of it where I kind of walk through how to do the full thing with Terraform your own workspace and then using their remote data as well to pull from.

And so the pipeline feature that was playing with earlier essentially this repo.

I mean, this workspace is going to trigger this one it's going to trigger this.

And so the way it's set up and they definitely say do not use this in production yet.

But these run triggers here.

So you can specify all your workspaces that you want to actually trigger something here.

And so anytime they trigger oh it's a loop because I already have that one set anytime they trigger they will actually trigger the next one.

When we delete these real quick, and I'll just show a click Run.

And so if I cue this one where it finally kicks off.

There we go.

So that's going to go through the plan.

And this is just going to generate a random integer.

It has an output and all it does is output the results of random integer.

And so once it finishes the plan is actually going to show me which or any workspaces that it will trigger next.

In this case random separator so if I go ahead and Confirm and so this one is applying if I come over to random separator nothing's happening here.

I'm not I haven't hit anything haven't pressing buttons.

This appliance finished and there's random separator that was triggered automatically.

And so you can see like it's essentially going to go down the line there.

The good thing is it will tell you here that the run was triggered from this workspace.

And it was triggered from that run.

So you can kind of rabbit hole your way backwards into finding where and what actually triggered that one.

And so if I confirm and apply on this one that someone is actually going to trigger the last one, which is random pat.

Now, pull up the code real quick as well so that when finished and random pat is here.

And there's random pit running.

Quick, quick question.

Do you guys ever use it because I know what the VCR is integrations.

You can actually kick off a plan and apply from GitHub for example together is that.

Yes exclusively.

Yeah And so this is these repos are actually tied up to get up as well.

You do the confirmed circuit collaborative send them to the UI here do it through the UI, you could tie it in like there's a CSI you can tie it in and do it through any c really.

And so there's the end of the pipeline.

But as you can kind of tail like you will rabbit hole right like you're here.

And then it's like, well, was generated from here and you go back there and it's like, well, one was generated from another one.

And so then you end up having to go back there.

So a good visual tool would be really useful.

Jenkins blue ocean or something where or circle C I kind of chose you the path of something would be really useful, but it's kind of interesting.

I'm sure that's coming.

Yeah So this code just to show this real quick isn't using the remote state.

And so I set up variables manually for this demo, but utilizing these variables.

And it just uses the remote state data to get the value from the integer workspace.

And then the pet basically uses to remote datas to get the workspace state for both the separator and the integer workspace and then it just uses it down here in the random bit.

So it's decent.

I'm liking where they're going.

Yeah, I think this has some potential, especially to minimize the configuration drift and simplify the effort of ensuring that change is promoted through multiple workspaces.

And the good thing is like if you saw on separator I actually had to confirm and of course, that depends on your settings.

Of course.

Because you can tell it if you want to auto apply or manual apply in this case, I'm set to make makes sense.

But it goes auto.

They would have just cheered it all the way down the rewind as many.

And so the practicality of it is like if you separate your networking stack from your application and you update your networking stack and for whatever reason, it needs to run the application form as well.

You can kind of automate that now as opposed to where you ran that one.

Now the one person in the company that knows the order thing can go in and manually hit q on something else.

So yeah.

So I think there was some questions in the chat here.

Let's see.

Alex Sieckmann asks, how do you handle the chicken and the egg problem with bootstrapping saying AWS account and then Terraform enterprise to have creds and such.

Actually a good question.

So it would have to come from some somewhere right.

So like especially if you set up like AWS Organizations.

And you had like your root account that you were set with you can utilize that Reed account.

And you can actually Terraform it obvious orbs and then once that new account is set up, you can assume role and those sort of things in order to access that other client.

I mean that other account.

But there is still some manual aspects of that right.

Like you have to search your email address and then that email address is your root account.

And then you want to kind of lock that down.

So you can do use like some service control policies and things of that sort.

But there's still a little bit of a manual piece to bootstrap a full account.

That's the part that really sucks and we go through this in every engagement right.

Because if you don't reset the password and have MFA added anybody with access to email of that root account of that sub account or for that matter can do a password reset.

And take over the account.

Yeah, exactly.

So there was a question about automated destruction.

So terrifying cloud actually requires you to set confirm destroy.

So what.

So if you do automate confirm destroy set to 1, then yes, you can.

You can delete from trip cloud.

But you can't cure or destroy unless you actually have that environment variable z So you can set it like and have it as a part of your workspace and then you destroy will actually destroy it.

Nice cool.

But yes, that aspect of the chicken and egg is something that is definitely something that could be cleaned up on either side, just to help the bootstrapping especially for the clients that have like 78 IBS accounts.

Yeah which isn't as abnormal as it sounds it's one enterprise.

Yeah Any other questions related to Terraform cloud and put this in queue.

Not sure cost value.

Opinions now it's way better than before.

It was rough like multiple Tens of thousands of dollars for the Enterprise version.

And so now it's actually to where you can basically sign up and utilize it now for free.

You have to keep in mind that it is still a subset of different features.

But it is really good.

And it is.

Obviously, if you're a small team and you don't have $100,000 to spend that yet.

But I figure that it is a subset.

Yep And so the main things that you do miss cost estimation is actually pretty cool.

It will tell you if you're starting up like the T3 micro how much that's going to cost or involve large it'll kind of give you those pieces as an estimate.

But it kind of helps.

And you can utilize sentinel which is basically poly policy as code utilized sentinel and say no one can create a project.

If the cost is over $1,000 or whatever.

Or you can say, hey notify somebody or whatever already requires approval.

And so syncing it was actually pretty useful.

And then, of course, you get the normal sample.

And this is the private install a small sample clustering as you go up.

But you can.

But funny how it goes from unlimited worth.

Everything else is unlimited and unlimited workspaces the enterprise no matter limited anymore just 100 plus.

But I mean, really, this free up to five users is pretty much all you really need unless you are on a larger team than movie roles and the role basically plan read right now and admin support.

But the private registry is actually pretty cool too.

I think, as I said, as a profession need to push back on enterprises that try to make you pay for security here the MLS ISO is pretty much the only thing controlling the keys to your castle and I don't think that it's right for people to hold security as a tool for making money.

Yeah, that's like always there 1, 2 right.

Yeah, but I hope we get the industry aligned with security as a first printing like the first class it isn't all products, not just for if you're willing to pay.

Yeah, there were other.

We've shared this before like the SSL attacks website, go to ss no doubt tax, then it says it all.

Yet it's funny.

It's the wall of shame and the price increases with pleasure.

Areas things I need to add terrible glad to know.

Exactly base price pressure.

So price.

It's just insane.

The gouging that goes on.

Look at HubSpot my Jesus.

63 percent increase 586 call us.

Well, you want to factor that's going to cost you.

Yeah Yeah.

Two factors another.

Yeah Well, I mean that comes usually with whatever you picked as you say.

So Right but I don't just a cloud offer to factor.

It does.

Yeah So I have one set up here just normal.

I use all the networks, but then again, I'm also I have my day jobs account on theirs.

And it's paid.

So maybe that's good.

Yeah, maybe that's where it comes from maybe that's not over for.

Any other questions on cloud for a small team.

Do you guys think $7 a month is worth it for just set to no.

It depends what kind of roles do you have in place for like your instruction.

My team is a team of one right now.

So there is no like actual rules automated but obviously being proactive about it.

Just when I'm speaking of infrastructure.

But as the team grows.

And I think we're growing our security function here too.

I think a lot of security engineers I'm talking to, they're doing it manually where they go into your database console and like check it you know your last $3 are public.

I was saying like we could automate this with a sentinel.

I was curious if I do say a team of five is $7 a month worth it if you have those sort of rules in place.

Yes Yeah.

This is basically like a pre-emptive you can choose to block or you can just walk.

And so in this case, it was essentially a function and they're adding to this resources and they end up pulling it back.

Right And so you can basically take those things.

And as you validate them you can give specific messages that you want.

And basically say yay or nay if it's approved and it'll basically block the run.

So it is a good way to catch it ahead of time.

And you can catch some of those things.

Another thing that you can do as a team of security.

We talked about open policy agent integration with Terraform that can also do some of this stuff and also someone else recommended comm test, which is built on top of.

OK and add support for HCl and Terraform plans as well.

Yeah, there's a little library that's like a Lancer as well to offset this one.

And it's pretty decent too.

And I can catch like $3 an HTTP where so they should be.

Yes And it also provides a way here to where you can take these rules and you can actually ignore it for like a specific line like if you want an ingress here and you don't care about this.

This rule here.

It's just it's a requirement.

You have to have it mean you can ignore it.

But you can tie this directly and with see I can just run to you set up for locally with Docker and so I would probably start there as opposed to going to sentinel because then you do have to manage quite well you have to write the Central Policy you need to manage that.

And then you assume a lot of that risk at that point to you know all you have to develop all those opinions on what you mean.

Right well that makes a lot of sense though, when you have like cyclops that focus on that if you're the 119 and suddenly just adds to your plate.

Can you share this link to get up 45 seconds and office hours channel and the episodes after shooting for.

And let's see.

So we got 15 minutes left or so 15.

There were some other questions here unrelated to Terraform cloud as one to see if we can get to that.

Alex, do you still want to talk about this your Prometheus question.

See I can't.

He is chatting in the Zoom chat.

Looks like Zac helped you out with the answer a little bit.

I guess I'll just read it for everyone else's benefit.

How do you know.

Let's see.

Assuming you have the Prometheus already running the Prometheus operator and you run gipsy deal get Prometheus all names faces you'd set up a Service Monitor.

Oh, this is from Zach.

I did not.

Yeah, I gave him like 1,000 foot view of an answer for how to set up a Service Monitor for Prometheus a custom service running in a cluster.

I wouldn't have answers so quickly if I weren't your candidate the exact same moment.

That's cool.

Yeah, maybe we can.

The essence, you don't have to make today.

Let's punt on the question to next week if you're able to join us and we can talk more about service mind stuff.

I'm curious if anyone else is using anything other than custom rules for you know, if there's any other tooling out there for Service Monitoring or adding you know people have multiple teams, multiple microservices and you know if there's any organizational strategies around tooling.

This in a declarative manner any I can answer how we do it.

But I'm interested also first before I talk about what other people are doing.

We've talked about just monitoring the individual services oh Yeah.

Just Prometheus right.

Hangs in multiple services and you know there could be a thumb roll.

You know that some of them come and go ensuring that generic monitoring gets put in place and teams that they want to put extra and additional monitoring and you know, for items you know that those are also able to be deployed.

Yeah I'm just really struggling with the getting a good template going I thought.

Yeah Are you using helm your.

I am.

Yeah then sorry I missed my computer's not responsive here.

So then yeah, I can kind of show you because this came up recently, for example, with century.

That's the good example, my ship is going to invest one but I'll show it.

So we've talked about in the past that we use this chart called the chart that we developed.

Zach, are you familiar with the chart.

Dude I am so familiar with the model chart.

OK created my own version of it.

So yeah.

Well, thank you.

Yeah So the pattern there that we have.

And then are you familiar with the service monitors that we like the Prometheus findings that we have in the model chart, you know I probably should go revisit that on that.

Honestly, I haven't looked at it.

OK So I will give an example of that here and I'm getting it cued up in my browser.

So let me rephrase the question or let me rephrase.

But let me restate the question and add some additional context.

So in my own words, I think what you're describing is how do you offload the burden of how a application is monitored to the application developers themselves or the teams at least responsible for that service.

In the old school model it kind of be like employer your services, and then you throw it over the fence to ops and say, hey, I was deployed.

Update now those are some archaic system like that.

And monitoring and that never worked well.

And it's like very much like this data center mentality static infrastructure.

And then you have a different model, which is kind of like an Datadog where it will maybe auto discover some of the things running there and figure out a strategy for monitoring it, which is magical but it isn't very scalable right.

Magic doesn't scale.

So you want something that allows configuration, but also doesn't bottleneck on some team to roll that stuff out.

So this is why I think Prometheus operators pretty rad.

Because you can deploy your monitoring configuration alongside your apps themselves.

So we had this just came up kind of like what you said Zack about you were just actively working on this other problem that Alex heads.

That's why I was fresh in your memory.

So this is something that we did yesterday, actually.

So we run a century on prem.

We've had some issues lately with century stop ingesting new events while everything seems totally normal.

So it's passing health checks.

Everything's running everything's green and hunky Dory.

But we wanted to catch the situation where it stopped processing events.

So at the bottom here, we've added using the motto chart, for example, we don't have to create a separate release for this technique but we're doing that here and using them on a chart.

What we do is then we add the Prometheus rules.

So we can monitor that the rate or the delta here of jobs started in five minutes over a five minute period is not zero or in this case is 0.

So that's when we alert k minus 1.

My point here.

Those So let's see are we using mono chart to deploy this.

Do you have something that keeps a baseline level of jobs starting a busy cluster a busy environment.

So like is this generalizable no.

But in this environment.

So here's the thing.

Oh, I think it's generalizable because you could make that Cron job you know it does.

So in our case, we have century Kubernetes deployed.

So we have a pod inside the cluster that is ingesting all of the events from the Kubernetes API and sending those to century.

So you could say that we just buy it by having that installed.

We have our own event generator because Kubernetes is always doing something right.

So we ran when we ran this query we side identified the two times over the past month that had the outage.

So we deployed it, and went live with that.

But I just want it.

So this mono chart though, is this pattern where you can define one chart that describes the interface of a Mike or service in your organization.

This happens to be ours that we use in customer engagements.

But you can add you can forget or you can create your own that does the same kind of thing.

And let me go over to our charts here and see Mike in more a different example that we have.

So here's an example, like a simple example of deploying an engine x container using our motto chart.

And the idea is that, what does everything you deploy to coordinate his needs.

Well, it needs.

Well, OK, if you're pulling private images you're going to need maybe possibly you'll need a secret.

So we define a way of doing port secrets.

Everything is going to need an image.

So we define a way of specifying the image.

Most things are going to need config maps.

We define a simple way of having consistent config maps and all of these things are a lot more terse than writing the raw Kubernetes resources.

But you can also then start adding other things in here like then we provide a simple way of defining infinity rules.

So you can specify an inline affinity rule, which is very verbose like this, or you can just use one of the kind of the macros the place holder ones that we define here should be on different node.

And this is an example of how you can kind of create a library of common kinds of alerts that we deploy.

Now I'm talking.

I'm conflating two things affinity rules with alerts.

I just happen to have an example here of infinity in helm files here to share your screen.

Oh my god.

I can do that again.

But I'm just always used to having my screen shared.

So I. Yeah So sorry.

OK So this makes it a little less handwaving then by seeing my screen here.

Here's what I had open, which was just an example of using our monocytes chart to define the Prometheus rules to alert on centuries.

So here at saying century job started minus a century job started five minutes ago.

And if that's zero we weren't on.

So mono chart itself.

Here are using Monroe chart just to define some rules.

But Monroe chart allows you to define your deployment.

So here's where deploying engine x we're setting some config map values we're setting some secret some environment variables.

Here's the definition of the deployment.

But we also, unfortunately, we don't have adjacent schema spec for this yet.

So you kind of got look at our examples of how we use Monroe chart.

And that's a drawback if I search for this here that we'll find a better example.

Who's so for example, I'm not sure Calhoun is going to help fly the where we use Monroe chart frequently is a lot of upstream charts that we depend on.

Don't always provide the resources we need.

So then we can use Monroe chart much like we used the rod chart to define the rules.

So here is where we're deploying Cam for a k I.

This is a controller that pulls metrics out of k I am and sends them to Prometheus.

So somebody provided a container for us.

But the chart was apart.

So we just used our monocytes chart instead.

So here we.

Define a bunch of service RGB monitor rules to monitor.

In this case kIm so this is complicated like using the raw expressions for Prometheus but I don't want to say that like in your case.

Zack what I would do is I would define canned policies like this that you can enable in your chart for four typical types of services.

OK So that is the route I'm going.

And so with the sanity check means I'm not going the wrong round.

I know it's just it seems like a lot of work.

It is.

But the thing is like so.

But nothing else does.

There's no one else is doing this.

So this is like I say, I don't see.

I haven't seen any other option out there, aside from magical auto discovery of things running for monitoring this thing where applications, deploy their own configuration for monitoring very.

I don't know of any Sas product that does that.

And it's very specific to the team and organization and the labeling that you have in place.

Yeah So.

All right.

Well, I mean, did the model chart is the right route in my mind as well.

I've been going that route.

I call it chart architecture.

Yeah but I'm using that to do a bunch of other deployments and not forget to microservice.

So this will be rolled into it.

So thank you for the answer.

Yeah, no, I want to just add one other thing that came up just for to help contrast the significance of what we're showing here is yes, this stuff is a bit messy.

I wish this could be cleaned up.

And it wasn't as dense, but when you compare this to like let's say, Datadog and Datadog has an API.

There's a careful provider for Datadog.

But I would say that's the classic way of setting a monitoring.

It's a tad better than using nodules because there's an API, you can use Terraform but it's not much better than using nodules because there's still this thing where you deploy your app and then this other thing has to run to configure monitoring for that versus what we're saying here is we deploy the app and the monitoring of the app as one package to Kubernetes using.

Well, we're almost out of time today.

Are there any last thoughts or questions related to perhaps as Prometheus stuff.

I didn't check if you posted anything else here.

Alex and Chad thank you for the Terraform cloud demo.

Thank you.

That was all a demo.

Thanks, man.

No problem.

Was month month this year.

I think it's half on sales next month.

The lies and generalizations which can be hard.

I suppose for most HP REST API.

You could do some kind of anomaly detection or basic five minute alerts but there is Yeah, there's not a general, there's no general metrics across all kinds of services.

So Yeah, that's right, Alex.

So that's what all these other services do like data dogs this thing is they'll provide you some good general kinds of alerts but nothing purpose built for your app.

All right then let's see.

I'm just going to be up closing.

Slide here.

Well well there you go.

There's my secret sauce.

That's what we're doing here.

We're at the end of the hour.

There are some links for you guys to check out.

You enjoyed office hours today.

Go ahead join our slack team if you haven't already joined that.

You can go to a cloud posse slash slack you sign up for a newsletter by going to cloud posse slash newsletter.

If you ever get registered for office hours.

Definitely go there and sign up.

So you get the calendar invite for future sessions.

We post these every single Wednesday.

We syndicate these to our podcasts.

If you go to cloud policy slash podcast.

You can find out the links where you can subscribe to this.

Like iTunes or whatever podcast player use connect with me on LinkedIn.

And thanks again for.

Yeah, for all your input and participation area.

This is awesome.

What makes meet UPS possible.

Thank you, job for that presentation.

And I'll see you guys all in the hall next week.

Take care.

Thank you guys.

Thank you.

Author Details
Sorry! The Author has not filled his profile.
Author Details
Erik Osterman is a technical evangelist and insanely passionate DevOps guru with over a decade of hands-on experience architecting systems for AWS. After leading major cloud initiatives at CBS Interactive as the Director of Cloud Architecture, he founded Cloud Posse, a DevOps Accelerator that helps high-growth Startups and Fortune 500 Companies own their infrastructure in record time by building it together with customers and showing them the ropes.