Erik Osterman, Author at Cloud Posse

Public “Office Hours” (2020-03-18)

Erik OstermanMarch 18, 2020Office Hours

Here's the recording from our DevOps “Office Hours” session on 2020-03-18.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show started.

Welcome to.

Office hours.

It's march 18, 2020.

My name is Eric Osterman and I'll be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator.

We help startups own their infrastructure in record time.

By building it for you.

And then showing you the ropes.

For those of you to the call the format is very informal.

My goal is to get your questions answered.

So feel free to unmute yourself at any time you want to jump in and participate.

If you're tuning in from our podcast or YouTube channel, you can register for these live interactive sessions by going to cloud posse office hours.

We host these calls every week will automatically post a recording of this session to the office hours channel as well as follow up with an email.

So you can share it with your team.

We want to share something in private.

Just ask.

And we can temporarily suspend the recording.

So with that said, let's kick things off.

We have a couple well talking points here and another one in the Slack channel.

Again, if you haven't joined our slide team go to slack dot cloud pass econ again slack dot cloud posse and you can register and join the officers channel there.

So one of the awesome recommendations was from Brian Tye.

If everyone can share some of their work from home tips and I'd like to expand that to maybe some productivity hacks I'm sure this affects pretty much everyone on the call here today.

So I'd like to learn from that.

The other question that we had just ponying up here was how it was a very common reoccurring question.

I think it comes up almost every office hours, but we learn something new every time a peer asks he'd like to have his monitoring strategy vetted for deploying Prometheus operator.

We'll get into that.

And Dale let's see here Dale just posted something Dale what's this about to get his guys working from home.

Awesome So this is by you actually yes.

Knight OK.

OK So let's Yeah let's review that in a second.

As soon as we first do the first order of business.

Any questions today outside of these two talking points that I just brought up there.

Yeah, I have one actually.

All right.

Have you.

Has anyone done Kubernetes ingress behind the VPN.

So you can't use LetsEncrypt.

So we're looking at doing a node port service.

But everything kind of feels like a hacky thrown together mess.

Are you like in a private cloud scenario or I ate of us Governor cloud behind a VPN.

Yeah not my area of focus.

So what.

First of all, what's the problem here.

I'm not sure I am understanding the problem like, why can't you just use regular and Unix ingress in that scenario.

And what's the relationship with the VPN and the ingress.

So because I'm behind a VPN LetsEncrypt cert manager with LetsEncrypt doesn't work.

Gotcha automatic.

I can't because I can't do the automatic verification.

So I'm looking at doing a like an application load balancer and terminating TLC using a certificate from ACM and then I guess doing a instead of a load balancer type service for the Nginx ingress service doing a node port type service for the Nginx ingress and pointing the MLB to the port to the hosts on the port that end indexing grass is on the node port servers.

But it feels it doesn't it doesn't feel very clean.

I was wondering if anyone else had.

Oh, yeah, I've done that.

And this is a much better way to do it.

Can you explain you know the search again.

Yes So cert manager automatically goes up to LetsEncrypt and does the certification.

The certificate generation using LetsEncrypt and the way.

Let's encrypt the way.

Let's encrypt validates that you are the owner of the particular domain that you're trying to going to search for is cert manager deploys a pod that publishes and HTTP endpoint that LetsEncrypt then goes and looks at.

And it's got some token or whatever.

And it's called a sticky B0 1 validation.

But if I'm behind a VPN, you something from the open internet can't hit it.

Can you not use DNS everyone challenges because this only requires you basically to have a public reachable DNS record for the domain and modify that.

We don't have a public reachable DNS yes, it's almost no public.

It's almost there.

Yep Yeah, pretty much.

So I actually will try to do something similar, but some of what I was looking at.

Roger tried to run it globally but committed to the well.

Is it at that you want to use less encrypt.

No, not necessarily.

I just want ingress.

Yeah So take a look at how they do their range.

To me it was with us and start my angel.

I think they actually disable it and use something else to actually do it.

I think that may Patrick steer it in the right direction.

You're not be an external violations or requests for that check.

Oh, yeah.

And I mean, in this situation, since it is a very closed ecosystem why not have your own CIA and cert manager itself.

Can you manage your TKI for you, and you just need to trust that CIA then can you even let vault to generate a search for.

Yeah I mean, yeah your man, or your cell phone can handle that whole process and you set up basically it for cert manager when we were using Cam we set up cert manager as Aca, which would then generate the certificates for I and it's all self-contained ecosystem, then works.

And we have to sell less now let's face it.

There's no nice solution for all of this because of those constraints that you have.

But to me, the your own private dressing setting up that that sounds like that that's something I hadn't thought about setting up cert manager as a CIA that we would just have to tell everyone using it to trust.

Because Nginx ingress will create self signed certificates that are signed as you know to be daddy's fake certificate, or whatever.

And there's so difficult. Maybe you can kind of trust those.

But I think they change.

Like they don't kind of stay constant.

But every employee that ends next will be.

But right.

But if cert manager can be set up as a stable persistent CAA that I can then say, OK, go.

Trust this s.a. that would work.

Yeah, that would make it.

So that I don't have to terminate because I would I would prefer to avoid having to use ACM and terminating to a less set at application load balancer because it adds a bunch of one.

And overhead.

I would rather just wait till I set at Nginx in grass.

And then there's.

I mean, doesn't it.

Yes has all the private ECM stuff.

But I know that's very expensive.

I don't know.

And I don't know if it's in old club for what it's worth.

I'm doing something similar with the lambda right now accessing on prem where we're rolling our own s.a. and the BP BBC through like a VPN gateway site to site VPN.

And then we're Yeah, we're setting up our own CIA using Server Manager.

We're using 80, which was out of my control.

But Yeah we're using 80 to generate the certs.

OK, thanks Adam.

Yeah Now that's good.

That's good info.

Thanks any other quick questions.

All right.

Well, let's get into the first talking point, then which is going to be working from home tips.

I'd like to first open this up to Dale since you've already shared something on this that you've put together.

Let me open that up on my shared screen here.

So we can all see what that's about.

Of course.

My corporate firewall blocks Instagram.

Is this your setup down.

Yeah, it is.

That's why work from home.

So it looks like a Star Trek or something.

Pretty a pretty sweet system.

Only one monitor.

Get good scrub.

Yeah So Yeah.

Do you want to.

Do you want to narrate these slides what's going on here.

All right.

Or office actually implemented.

Work from home policy.

Due to the whole.

19 virus.

So I pretty much have started to put together just a list of thing that actually works for me in the past.

I actually had work from home prior to moving to New York for approximately eight years until a similar tip that actually worked for me.

So even when I got her one first answer did was to control the bedrooms into a little office just in case my girlfriend's here then I could actually set some boundaries as well in front of that space and make it as comfortable as possible.

So I like to be a bit more organized in what I'm doing.

I'm this freedom of words generally.

Even if you're in the office.

So I actually would start off with a natural list.

I do minty like a whiteboard to the side as well to keep a book or an iPod just to keep it kind of keep out things structured with that personal stuff.

I just use like things.

And for work related stuff I like.

That's in Europe just for everyone else's edification things Shazam app.

I do tracker.

It's not just things.

Yeah, there's an well.

Yeah thing that stuff.

So I use a lot of code based applications like everyone is more accustomed to doing as well.

So like Zoom slack jiro you know just to help with the collaboration or office metric does you use arms like generally for everything notifications just meetings as one would not just switch notice things like Blue Jeans are between blue jeans and Zoom and it suits us.

One practice.

I do maintain is getting dressed in the morning not nice.

Again Open a button down shirt but just get a little pajamas.

Take a shower.

Morning routines.

So you can mentally prepare yourself to actually get started.

Even with that.

I tend to set my desktop clear things off, make sure it's a little more organized and get seated.

You know I mentioned earlier, both setting boundaries a time again, because people think because they're working at home, you're available.

Especially if you have a family setting those boundaries making sure that your voicemail actually thinks about you're available at the times like how things sort of make people know go as far as them put it like this posted on northern and on the door with my hours of operation.

So yeah, I think that's a really good one in setting those boundaries nick and then having the conversation with your family so that they know that this is the case that things haven't really changed you just happen to be at home now.

Yeah, no.

And for my office based on this image you didn't even look at that.

I do minimum.

But at Instagram.

I put a lot of what my desktop was before that.

But I work with that system disk from Jarvis.

I switch from a dual much to a single ultra wide.

I invest a lot of time into making that space almost like a replica of what it would be like in an office space.

The music.

My laptop, whatever I would need to get things done so well.

I mean, the office or me back home.

I can still function as I would in either location.

No, I really I really like these arms for the monitors.

So you can move your screen around and get it up.

Especially when you're sitting a lot.

Having it at the right angle for your head is going to reduce some of that back pain and stress in your wrists by proper posture.

I think that's one thing that's not mentioned.

Nothing means that necessarily working from home.

But the prompt the difference between working from home often.

And the office is you have if you're not doing it often.

Yeah, pretty bad desk situation and chair situation.

So pay attention to if you start having pain in your elbows and risk because you're probably sitting in a bad post with that posture.

Yeah, I currently suffer of a slight impingement that I did therapy for and that was related to my posture at the desk.

I put Timothy Dalton like the romulo chair and I will hold my position as well just to keep some level of activity.

I like.

Yeah the I pad thing if you guys have I pad pros.

I'm not sure about the other tablets I researched it.

But I had close.

You can use that as a dual display.

I guess in 10 or 15.

It's natively supported but before that would do it.

That's awesome.

So actually my scream that I'm hearing right now is an iPad tablet.

So it's a great way to get dual displays like today if you already have.

Yeah, it's very useful, especially to see things like that while I was actually in Jamaica I had used as a second monitor as well.

That kind of simulated what I would work normally.

But my whole workflow.

The lowest I slide.

I just spoke about taking a walk was taking that break.

Step outside a lot of people don't realize that they spend so much so many hours indoors.

They don't get the sun the vital produce as much vitamin D, which may also end up with a flu season right.

Also it helps for kind of working through blockers state of mind.

Don't step outside clear your head.

Welcome back.

In and go back at it.

And then the other thing that I tend to do is to overcomplicate so we'll have chickens as well with my direct supervisor that gives the team title.

I also keep like an open zone that shows there's one guy to just jump into it.

I speak to it.

I like that.

Yeah Is that clear like I am.

Actually, that's what I'd like to talk a little bit more about I've been thinking about having as well is like for teams probably so not company wide and probably for maybe project related.

What about just having a Zoom room open that you can hang out in during the day.

You can mute yourself, you can stop the video doesn't have to be a loss of privacy or any of that.

But at least you can quickly hear any water cooler conversation that comes up related to topics on that as they want.

Now that we have that I actually implemented it.

So we use Google meat and everyone every time someone joins into two our like coffee break room as this called triggers a message in our random selection.

You can just hop in as well.

So having this as well, made it so much nicer because like in the beginning, people were just sitting there in their forest and nobody was really talking about something.

And now people joined and now we also have a calendar in mind every time that one Saudi PM and everyone is invited to join there.

So since you announce when it is.

So it's not all day.

You have it at is between specific hours kind of the day and it's open all the time.

You can point it there all the time, all day.

But like we have a dedicated session of 30 minutes where you can go in there.

That's what you announced your slack team.

Yeah you know in a general.

Yeah, we have a keyboard that are running throughout the day.

Ghost puppet because normally you in the office.

We'll just tap each other on the shore and it also helps as I'm getting lies and guess does mentally you're just not feeling alone.

Yeah, that helps.

That's a good tip.

Yeah Yeah.

And you know, this whole thing does wash your hands as good as possible.

Then if you miss it.

But I do have other tips on my Instagram that just love some of the slides about between my coupon.

I use Docker and working from more and more.

I can't stress enough that over communicate.

You That was one of my notes too.

And I think that's a really important one is that I don't think it's hard to over communicate actually and most people are actually under communicating what they're working on.

So people are not really informed on what's progressing, where they're stuck and Yeah.

Any anybody else have thoughts on that.

So one of the things my team just recently started doing.

And we really like it is there's an app on Slack called Dixie that is daily asynchronous standups and you set it up for a particular time.

And it sends each member of the team a message saying, OK, it's time for stand up you know.

And it asks you the typical three questions.

What did you do yesterday.

What are you doing today.

Do you have any blockers.

And it has.

I think it has helped a lot with getting people to write down their thoughts because when we do, we do a stand up call every day also.

But sometimes that can just be.

Oh, yeah, I was working on this other thing.

And I'm still working on it.

That's kind of it.

But getting them to write it down, gets them to go into a little bit more detail and especially with the blockers portion It is much it's much quicker to get blockers resolved when you write them down and slack and say, this is a blocker for me right now.

Someone almost always immediately goes and picks it up like if it like you know a blocker for me is I have this spread request that's waiting to be approved.

And you know nine times out of 10 somebody goes, oh, I'll go look at it.

You know cause it's right in front of them.

I'm curious about this.

And I end there.

This is probably like one of the most common app categories almost that I see for Slack.

I'm curious about anybody who's been using a tool like this for say six months or more and is still young and still sees at least let's say 80% participation in the notifications.

My inherent skepticism based on my own patterns is is like a confession here is that anything that is automated that I know is going to happen every day at the same time.

I tend to ignore as opposed to those things that are infrequent.

So this is why personally, I don't have hacks like that said a reminder every day at the same time to do something because then I just end up ignoring any anybody using this successfully in their company for a long time.

We've only been used there.

We've only been using Dixie since January, but I think as far as participation.

Our messages go out at 11:00 and we have our stand up at 11:30 and most of the stand up is going through the Dixie messages.

So if one of them isn't there you know, it's instantly you know kind of a polite name and shame kind of thing.

Why a why didn't you.

Why didn't you edit your stand.

You know.

But it's.

Oh, yeah sorry.

I forgot.

You know I got busy or whatever.

And yeah we haven't had any issues.

OK with people just forgetting about it because I mean, as long as leadership does it.

I think it tends to trickle down.

Yeah Any other.

Yeah Any other suggestions for working from home.

Brian any of your own tips or hacks you'd like to share or something in particular, you were thinking of when you asked the question in office hours General.

I actually don't have a lot of experience working in the office often.

I only work from home usually when I was sick.

So that's kind of the reason why I was asked the question.

I do like the idea of the coffee break.

I why do we already know this is like.

The office banter that we had at our office.

Yeah, so I think we're going to try today.

Think you guys a suggestion that I realized that I actually am working later into the night.

So because there's not that like a drive home thing that kind of stops you from working.

So I'm trying to figure out what I can do to fix that.

Two things.

Two suggestions on that help me at least one is making sure you set your office.

So I think developers have some different challenges from managers managers tend to live in their calendars and developers tend to just be pulled in every direction.

So it's sometimes harder to read regiment but what I was going to say is like for me on my calendar having definite work hours to find them.

So people aren't scheduling your time outside of hours.

And then the other one is disabling your slack notifications on your phone and on your desktop automatically at 5:00 PM 5:00 already or whenever it is you want your workday to stop.

Sure if you happen to be looking at it, you'll see it.

But at least hopefully it can give you the chance to close the laptop lid at a particular time and move on with your day and focus on family.

Yeah, I'd also make a comment like for the mobile apps like we use Teams internally on our organization and they usually have quiet hours.

Mm-hmm So we'll go on and/or I personally like 6:00 PM I just owe them pretty much gets news that I don't see him till the next morning, which could be a good thing or it could be a bad thing.

But it's definitely helped me when trying to disconnect.

Yeah not use like like uptime notifications and stuff.

Also anything serious like that should be set up with deletion policies actually.

Right So those should be going to page your duty or obscurity or something like that.

So that they escalate using that medium.

If it's urgent but overall, you can set, you can set different settings for different channels too.

Yeah, exactly.

Yes, we will.

We'll have a different.

Yeah, he's got a channel called alerts.

I would totally have different settings for the alerts channel than I would for the general channel or whatever.

Yeah, you can configure those settings.

And if you guys do like an uncle rotation like those of the weeks where you never have quiet hours you know.

So just think about her.

I've taken advantage of the team's feature the mobile app.

I tend to leave mine on because my team likes to just you know it even when we're outside of office hours we tend to like you know, we enjoy talking to each other.

And you know we'll put funny memes or whatever that we find.

And my software VPN particular is a night owl.

So he's up you know, every night at 10:30 doing interesting things because he is just one of those brilliant guys that is a manager.

But is smarter than I am at technical stuff.

And so he'll be up 10 30 posting links to sd 0 set.

So I like seeing that stuff.

But if it gets too much for me at any particular time, I just hit the slack is a snooze button you can say it's news on notifications for four hours or whatever.

And then that's all tend to do.

The other thing is built into OS X is the notifications menu here, you slide up and you have this.

Do not disturb.

It's also helpful.

You also can just add click it.

And then at all unreal.

So like all options for childcare.

Yeah, we'll see what other.

I jotted down some other notes.

Oh, yeah.

One thing that wasn't brought up is white boarding.

This stuff has gotten really good.

It used to be horrible.

You know you see these chicken scratches on the screen that are unintelligible.

But if you have a tablet like an and iPad Pro with an apple pencil together with either Microsoft Whiteboard, which is my personal favorite or Google jam board both of them are free.

You can do really good, high quality white boarding on these that are legible by others.

And you can then literally just if you're using Zoom, you can share share that screen on your tablet.

I would show you an example, if it's interesting.

Zoom even has fantastic white party features.

Now Yes, you does have pretty good stuff.

I would say it's a difference of if this is something you want to persist and work on or collaborate across zunes sessions, something you want to centralize like if you're using jam board.

I mean, that fits into the whole G Suite know office products.

Same with Microsoft Whiteboard.

So it's like you can continue to refer back to them and update them over a series of calls if you need to or even prepare for a call.

Yeah And as we see here.

Let's give it to you.

You just mentioned Microsoft's whiteboard and you're on a Mac.

I'm just hearing about this for the first time that I only just see developed in those 10 and I was I was curious if you use it on your Mac.

So So my point is.

So my point with this.

Why they're so usable is with a stylus.

Right So I'm using the Apple pencil on that.

And it's as good as paper for me to write on there like the quality of my I think the quality of my sketches is just as good as if I was doing it in person somewhere got it.

OK I'll just throw this in there.

I use Evernote has pretty nice.

I'll do that with the apple pencil and you can you can share those sketches.

That's true Evernote has improved their work for sketching as well.

So I haven't what.

I haven't tried to do with Evernote is collaborating on the same sketch with other people.

I don't know how that is.

I know that works well with white Ford and GM Ford.

Yeah, that's a good question because Evernote in general has been pretty poor and collaboration real time collaboration on a single note.

I always get no conflicts in that case.

I got to ask a silly question, but on the apple stylus can you.

Or maybe it's a software thing.

Can you change the shape of the tip.

And the size.

Well, yeah.

Yeah Well, that's on the software side.

So when you're using jam board or whiteboard you can change it from a pencil to a marker to highlighter to pen and different with of all those details and grids.

So it helps you draw and they also have what do you call it.

I think it was called, but they'll auto detect the shapes.

So if you draw a circle it'll make it a perfect circle.

If that's it if you like that.

Yeah, it's like snap to whatever.

Yeah Yeah.

There are other tools you can look at on the profile.

But I put itself like stability and flow.

If you're really good.

The notes is another one.

And they're all tools all there.

But make for sketching flowing from Moscow.

And does the will divert as well.

It doesn't actually have that feature.

You just measure where you can draw certainly makes a perfect circle for you as well.

Yeah What I liked about the jam board though, is like you are a sweet shop you have everything in one place.

Are you guys are you guys performing any interviews during this time.

Or are you guys going to put on all we are very firm.

We do.

I mean, we do remote interviews anyway.

So it's not really miss Messing with it.

I mean, the very final one is an in-person but we could do a remote for that to the in-person is just really do they spell.

You know do they have good hygiene.

I mean, at this point, you've talked to them a bunch of times already.

So yeah.

And you guys like the whiteboard tool.

Possibly Yeah.

We use all kinds of stuff for four interviews.

We've done some of like coding challenge type stuff that it's for some reason our legal department is it's giving us issues with that.

Yeah What the hell doesn't an announcement.

I'm actually, I recently tender my resignation at some so no one boards another company.

So I've mentioned in our whole onboarding remotely as well.

So the next three weeks or two weeks and two days I'll be there.

You mean, there is an on site.

Well, there is a revolt. You catch him.

Congrats on the change.

Thanks interesting times to start.

No Yeah but you've been remote so much.

So gear I wanted to get to your question here while we have some time.

So we you you're pretty much a regular on these office hours or haven't attended many of them.

You've heard our other talks on kind of like the Prometheus architectures.

Right And I also have to answer that a couple of times already.

But right now, I'm re implementing and rethinking.

Like I switch companies.

And we are currently like, OK.

And so that's why I like is it actually still the best thing to do.

It's just something else that I might be or should be looking out for.

So right now, my idea is like one premier just operate a protester which has a short term surge of maybe a week and then move one with long term storage, which will go entertain us, which I have never used before like I have not used it.

I use all the time Elasticsearch for long term metric data.

So yeah, I just wanted to get feedback on it and hear what you guys are doing in terms of this.

For example, which I really liked with a deadly search was that I could have all up jobs that basically would delete certain indexes after like three weeks three months for different staging clusters where the metrics are not that important for me for long term search.

But for production.

I really would like to have some metrics will like forever.

Yeah And I forget who it was there was some participant.

Now this is probably back in December, November and talked about Daniel.

So I don't have firsthand experience on Thanos.

And then there's another one competing against.

And so forget what it is.

Both of them had pros and cons and I wish I could find my notes on that plan.

Was it humor you know.

No it wasn't that one.

Anybody want to fill in while I do some rapid googling for what it's worth.

I took Erik and Andrew's advice on using it for Prometheus and that's where great for us.

I got also working with my ephemeral clusters.

So So the esfs is long live.

But the Prometheus operators are could be short lived.

So nice tool.

The interesting thing about it is it buys you a lot of runway especially since you can provision more and more IOPS as necessary and engineering time and effort is often more expensive than the provision to ops though.

So your mileage may vary in the scale of data you guys are dealing with.

I mean, if your Facebook might be different.

But for most companies.

It's not that intense.

Plus when you'd go the tiered approach the Federated approach with Prometheus and you have multiple Prometheus instances with shorter retention of Victoria metrics was the other one.

Yeah And the challenge with some of these systems is they offload the a to another system that you still now have to manage.

And my concern with having a very complex monitoring infrastructure and architecture is then staying on top monitoring your monitoring systems.

So the simpler this system is I the happier it is in my mental model right.

So for me, the long term search is more like for historical data.

And if something is basically use the class us down what happened five minutes before that like stats for what it is actually meant to be.

And for alerting all the stuff that should be in the station cluster.

So that this will stay as simple as possible.

But a long sought job search should be still there in my opinion.

Without picture metrics or something you.

So actually have found something that I will look into.

So awesome.

Thanks for that.

And yeah, I think I found the original blog post that this might have been the one that evaluate compared Thanos with Victorian metrics and the pros and cons of each and pretty like honest assessment of each one in the trade offs I am going to share that officers right now.

Thanks for that.

Yeah, I shared that as the thread of his question about I should residency.

Cool any other questions related to that or going back to the original talking point or any new questions.

It's really open ended here.

So if you haven't joined before we have quite a lot of people on the call here.

If you have any questions design decisions that you're trying to make in your organization is a great chance to get feedback on those.

And I have an abiding interest in any progress Andrews made with the get lab helm charts.

The fact that we're the ones that actually work.

It works fine.

It's just complicated.

Yeah, I was.

Yeah, I had the same experience.

OK, well complicated like like all of these different things you know like external object.

Yeah, they're like there are lots of moving parts that don't necessarily line up.

I'd love for somebody to probably have a particular, I'm not doing so well the operator I want to get the operator to work because basically, I have I no longer have access to like unlimited data about us like I used to.

So I'm running a sort of cheapo digital ocean cluster that like, well sporadically bring stuff up and down.

So I basically, I guess I just want like a scale to 0.

Get lap server and I don't have any particular like you know like it doesn't have to be any particular object storage or any particular web server I'm pretty fucked.

So if you're not.

If you're not going to use like a of USS 3 get lab a provision mineo used mineo.

Yeah, I mean, you know, it comes mineo is sort of like under the covers of a lot of like little toy projects that I end up doing.

And that's good enough.

I suppose.

And we've Yeah, we've been running mineo on our proud cluster actually.

So we SAIC has this thing called the innovation factory and part of it is the skit lab for people to use because there wasn't really a good centralized get solution that anyone could just go in and use.

But you know that spin in like beta for a year about it because we just haven't had the resources to pour into it to get it ready for any kind of a good sl low SLAs.

And so we started out just using mineo and we're still using it.

And it works fine.

It's backed by esfs.

No problems, other than the other day as in like two like Monday our esfs ran out of burst credits and everything came crashing down like to the point where it would not work at all.

So you like get lab was completely unusable.

So all I had to do was go in and up.

The And we weren't using provision day ops at all.

And so I just provisioned some my ops and it was like it was like a switch turned on.

I mean, it was like that.

Everything worked again.

You know it's.

$80 a month.

That's nothing.

You know, that's an hour and a half of my time.

You know.

So totally worth it.

I'm 100% on board with the effects.

I am not one of the doubters when it comes to, you know all kinds of people say, oh, yeah don't run your don't run your stuff on NFL don't run your database or whatever.

Yeah If you're Facebook.

Best idea.

But we've been running it on esfs.

We've been running a Boston database.

We've been running giddily which is the back service for all your know, command line for get lab.

We've been running mineo on off of VFX.

We've been running Jenkins off of VFX for over a year now.

And no problems whatsoever.

Zero zero problems.

Personally, I missed the first part they're on, where does get lab depend on something like object storage like medium well mostly for like the repositories in where it's been used a generic object storage.

It doesn't require like tell you system.

I can tell you exactly what.

Yeah, exactly.

They do elicit a dependent docs but Yeah Yeah it'll do.

Well, you have to tell it what object storage.

Oh the registry.

Sorry, that's another important part.

So artifacts backups packages planets registry and those are all those all go into buckets into three buckets.

You don't have to use those three.

You can use mineo which is this open source tool that mimics the API of S3 i.e.

That makes perfect sense when you say I was curious how they were doing get on S3 like object storage.

And it seemed like a lot of work to implement giddily itself does not which is giddily is the back service that does all the get RBC stuff.

When you say you know git clone whatever you're talking to giddily that doesn't use object storage that just uses a it's in Cuba that is it Staples that backed by a persistent volume claim and that persistent volume claim is in esfs using it has provisionally.

Any additional questions related to this or new questions.

Yeah the mineo or using exactly so mineo is a tool.

It's minack.

And you can you know, it's open source.

You can go get it on GitHub or whatever.

There's a helm chart for it and everything.

And it's a tool where you can basically host on premise.

And $3.

I they had three protocol right.

Yeah, it's exactly the same APIs Amazon S3.

So literally you can like you could have it.

You can even use something like that doesn't require local storage on the earth for it to do what it's doing obviously right.

Oh no it uses the offense to go.

I mean, it's just it all it requires is a persistent volume claim.

Right to put things in field your own history kind of.

Yeah, that's exactly what it is.

It's played out on the street.

But what's cool about it is tools that use Amazon S3 minute is a drop in replacement for them.

All you have to do is change the u.r.l. that it goes to Amazon S3 is S3 down Amazon native US or whatever.

That's Amazon.

That's the URL for us three.

If you change it toward every year you're mineo is being served at.

Everything works.

It's the it's all the same protocols it's all the same authentication.

Yeah, it works great.

My understanding of the texture of mineo isn't too radically complicated either in terms of components.

And services right.

So deploying it in Cuba and 80s with just one pot.

Yeah which is that's pretty amazing.

Yeah could he be up at all or anything like that per adding more feature functionality to this whole container base.

Sort of abstraction layer and a third, I think if I were to get more advanced on storage right now, it would be with Rooks f.

I think that's tending towards Rooks Steph is is tending to be the de facto favorite child right now for Kubernetes.

I'm actually trying to experiment with my grass spread by coastal wood and distributed optics for the law.

It seems pretty straightforward, simple enough in the face.

It looks when you're using set with that or on which one looks off.

Yeah So got like the external USB drives two works into the raspberry pies and then said that because we're called multiple types of back and providers you're sure you're thinking just stuff, though.

Yeah Yeah.

Rook rook and Rousseff are two different tools that.

And I'm not I'm not you know I know how to spell them.

That's about that's about it at this point.

But since rook is the CMC I've certified or whatever choice opera well for Kubernetes then it's going to get the best support compared to things like Gloucester which most people when I talk to them about Gloucester they say, oh, don't use that as a dumpster fire.

Yeah, it's a dumpster fire.

Totally that's the only one I can testify to firsthand experience.

I think many have actually a success story with bluster.

I'd like to hear this open b.s. stuff is like becoming popular too.

Might be worthy of keeping an eye on him.

Yeah, I think pretty well.

If you don't get a beautiful book you end up using open us.

So either one should be for any ticks.

Tips and tricks for using esfs in large capacity are like is a to have one large volume and just like segregated path based or how do you price.

Price wise, it's definitely better to have one large if s because the amount of i.e. ops you get is directly proportional to the number to the number of gigabytes of data you're storing.

So if you've got you know if you've got 10 CSS instances that all have 80 gigabytes in them the IOPS that you get from each one is tiny but if you have one CSS with 800 gigabytes in it you get a lot more.

IOPS and then, of course, you can provision more throughput till like on Monday.

I went and provisions 20 MBPS.

And I think it's probably more than I need.

But you're able to change it like once every 24 hours.

So I can bring it down.

The house like 90 bucks, man.

Yes Yeah.

So I think one would be good.

But then you have to worry about blast radius concern that's right.

If all of a sudden, a database is going crazy on your first full volume.

Other stuff doesn't work.

So I don't let anyone touch the esfs other than the offense for visionary.

No one else has access to it.

So the only things going into that you have employed using that DFS file system, you might have the noisy neighbor problem.

Yeah Yeah.

If you're sharing a big one like it when it comes to UPS like if you run out of first credits.

Definitely Yeah.

I mean, that's the problem we ran into on Monday.

As long as the first credits, then, yeah, you're fine.

That's what we.

That was one of the things that we ran into with Prometheus in the early days was that the volume was so small that we didn't have a diverse credits.

So we ended up having to artificially we will in our case, we provision more IOPS.

I know Andrew, you've also said, you can just write a 0 0 seal file just garbage data to increase the file size of the Earth to increase the size of the file system.

Yeah, you can do that to you like you've got to just work out the pricing would you expect me to resist paying for it and keep the architecture.

Simple then.

Yeah imaging.

Yeah And I know I didn't do that when it came to actually needing to fix this thing when it all came crashing down on Monday.

I just provisioned for throughput.

Yeah, it's kind of like a provision I have on already.

Yes It's just more expensive than just expanding the disk size.

Mm-hmm So yeah I've actually been considering and it's going to be a battle getting my team on board.

Is there you know they think our audience is God's greatest gift to humanity.

But with the way, I'm not I don't see the industry.

But with the way my industry and the government's face is going they really put a premium on making things.

Cloud agnostic.

So I've actually been doing been a lot more interested and doing some more work on looking at everybody tells me not to run a database in communities.

But I kind of want to run a database and companies.

So you should look at post stock SQL like you never lived until you like upgraded your whole Cocker cluster in like 10 minutes worth of planning and five minutes of execution.

And like Harvey any like even a blip of an outage.

So it does.

Man it feels pretty good.

Thompson So well that Wall Street felt like something else after they have shots.

Borkowski I got released.

I mentioned right.

Yeah, I had happily customized charts.

Yeah well this has got all kinds of databases.

Yeah So this is open.

I've heard of this before.

Yeah And its operators for communities to manage these business logic for managing these services on Cuba.

Oh, that's awesome.

Well, you know, I can't speak from firsthand account.

I just know that that's what their prerogative.

She just my business model.

I bought that site.

And I'm just sort of unclear where they're coming from.

But they make some really great stuff.

To pod disruption budgets help.

I mean, can you set them up to like cap the amount of data storage access.

No, that's more about how frequently Kubernetes can nuke it and move it somewhere else.

Oh, OK.

Yeah Yeah.

So it's maintaining stability of the service.

Yeah over rebalancing pods in the cluster that's beautiful.

I am definitely to check this out.

Well cool.

Yeah, I mean, we don't run production grade databases in the cluster but we've had no problem with staging and other stuff in the cluster.

So why mostly personnel issues like we don't have enough time to really understand that system.

But our address is well understood.

So more of just a it's a safety fallback for us, unless we can actually engineer that test it, build it, make sure it works well and actually monitor it and operate it.

Well, it's a little more nuanced worth.

It's not worth the risk to us to not use it.

Yes that's fair.

If you've got if you've got a team like 10 guys you can you can fit that in.

Go for it.

We've got three guys.

That's not enough.

That's a really good point.

And it also goes back to that wise comment Chris fouls that thinks that you know if you're introducing software like this and you don't have the resources to manage the lifecycle of it.

It's going to be in the critical path and the problem.

That's my paraphrasing his statement, which was, if you can't stand the heat, get out of the kitchen.

Yeah, more or less awesome, guys.

So that brings us to the end of the hour.

Thank you for sharing all the tips from working from home.

Brian, I expect to you to be productive.

Now during the next two weeks as a result of this.

Thanks, everyone, for sharing.

Remember to register for our weekly office hours if you haven't already.

Go to cloud plus slash office hours a recording of this call will be posted to the office hours channel as well as syndicated to our podcast at podcast.asco.org dot cloud posse so you can subscribe using whatever podcast software you use.

See you next week.

Same place, same time.

All right, guys.

But I use.

Public “Office Hours” (2020-03-11)

Erik OstermanMarch 11, 2020Office Hours

Here's the recording from our DevOps “Office Hours” session on 2020-03-11.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours.

It's march 1120 20.

My name is Eric Osterman and I'll be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator that helps startups own their infrastructure in record time by building it for you and then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to unseat yourself at any time you want to jump in to participate.

If you're tuning in from our podcast or YouTube channel you can register for these live interactive sessions by going to cloud posse office hours.

Again, that's cloud posse office hours.

These calls every week will automatically post a video of this recording to the office hours channel on our slack team as well as follow up with an email so you can share it with your team.

If you want to share something in private just ask.

And we can temporarily suspend the recording.

With that said let's kick this off.

So what we have today are a couple of talking points that came up in the last day or so at least in my own dabbling here.

One thing I'm really excited about is that it was just announced yesterday or something that case now has envelope encryption for secrets in case that is that those secrets are separately encrypted with a KMS key.

Not only that the e the Terraform.

Yes OK I linked to the wrong issue here.

That's the JMS module by the W us by the guys at firma ws modules.

There is a reference pull request into the Terraform provider to add support for this it's already supported and then the other thing is the helm too.

I'm excited about this one.

Helm 3, 2 is going to restore that functionality to create namespace as automatically for you.

I totally get your point.

Andrew I saw your comment there by baking gobbler on why it's nice sometimes not to have this functionality.

It are use cases very frequently in preview environments where we bring up environments from scratch that we want to have that namespace created for us and in that case having to do it just reduces the number of escape hatches we need to use to get stuff deployed and that's the nice any other news you guys have seen.

We'd like to call out people around the world are doing a lot of work from home stuff.

Yeah about that.

I'm hoping that might kick off some kind of new wave of revolution.

More working.

Yeah work from home revolution might be cool.

Yeah and create even more problems in the commercial real estate sector.

As if retail store is shutting down enough now.

Next thing we know Google announces they're closing off this isn't going away from home for the past 5 6 years and it's definitely better than going to the office.

Yeah 2 and 1/2 years for me I think it goes both ways obviously.

I'm going right now.

But that's why we have these office hours so we get some of the same banter.

So I've done.

I did two years.

I started an LLC you know my own business.

It was like you know tiny little consultant shop.

I did two years of that and got super lonely because it was literally the only people I talked to all day were customers.

And you have to be like on all the time when you're talking to a customer.

Now I'm going on two years with a team of like 10 people and we talk every day on Zoom and slack and everything and that's been 100% better.

Do you guys have like just channels you can join at any time and talk or hang out or get there or is it always.

Absolutely it was definitely not a random channel and we have like video channels or any like video route.

So to say where people are just hanging out working my team has a couple of different theme accounts and you know someone's usually in one of them but no one not usually.

I'm just curious if that's work for anyone.

One thing we used to do as a part of our helped us team since helped us was global we'd have everybody on a Zoom all day long.

Meaning from like when your shift started to when your shift ended you were on a resume the whole time and like for the first like two weeks you're kind of like, what the hell is going on like what these people are just constantly watching me.

But then you realize it's so helpful because you could literally look up and be like oh like Tom's online like hey Tom.

Can you you know it is just so much easier, especially in a global health team you know compared to somebody sitting next to you.

So I know that is one thing that I've done in the past where you just had this one zoom by d that helped us zoom and everybody was just always and everyone's just muted by default and when you want to I guess that's kind of interesting and it's kind of a take on the Slack channel instead of people protest that at all.

I guess as people join the team like the first week or two is definitely kind of weird but you definitely realize the benefit of it though because like let's say a guy that's sitting next to you is actually at lunch you can just like hey you know I'm in New York you're in Austin or you're in London like I need help with this.

Oh Yeah I got you right now.

And it's just I feel like you solve problems a lot quicker.

Yeah that's kind of the benefit that we got out of it.

And you know I think there were.

Yeah I totally agree with that.

And then it's just like all right if you feel uncomfortable it's like you know just turn off your video when you're not at your desk you know.

Exactly but some people barely have to give up any privacy at all.

I mean turn you turn Video it's like Yeah.

And then if you're not there, then it's like all right you're not available.

But like being on the Zoom it's like hey can I bug you for this.

And like if you're missing it's like oh you might be with another customer or something like it's my team we've started doing like I won't say we're doing XP yet but like we started doing a lot of programming.

And so that's been really nice on Zoom to you know typical XP is like you know two desks two chairs two keyboards two monitors one computer.

That's like typical XP and you can you can mimic it with Zoom.

One person hosts and the other person clicks requests keyboard and Mouse Control and that way they can break in whenever they want.

And it's been nice yet that that goes in line a little bit with what you're you were asking before we kicked off office hours actually like you know what's your protocol for.

Your question was kind of what's your protocol for when it's OK to give somebody the keys to the kingdom.

Like what's the process for that.

How do you determine that somebody is ready for that level of responsibility and trust when they don't want it.

You know I mean over.

I don't know if people are over eager for it.

Then there's this I'm you know it's like how much do you really need.

You know I try to get eliminate those rights for myself where I can.

And you know Yeah typically that overeager disappears.

So it's a little bit like on a need to know basis.

And I think those need to know basis is come up as their responsibility naturally increases.

So I don't think that one needs to give that out automatically or by any compulsory milestone lessons more Yeah for sure.

Cool any questions or interesting observations.

Anybody else hides from the community.

This US bottle rocket container.

What's interesting is that it's been around for a while.

I just stumbled across it today.

They just announced it.

Yeah but I think I've seen some mentions of it.

I'm a little burned out on the like container native os thing just with the number of OS that's out there and then the number.

And then like you know so like cockroaches that's what it was and it's just went well.

Like last week I was on the same with rancher os.

You know what.

When I looked at it when I looked at bottle rocket I was like, man, this sounds a lot like rancher os.

And I was looking at ranger OS and then all of a sudden I found out they're not gay.

You know they're not working on it anymore.

And I was like oh OK.

Yeah so Yeah the timing is a little bit off to come out with another OS when so many of us are getting killed.

Well if you're all in your own Kubernetes clusters what OS would you use.

Well on Amazon I'm just going to use the standard Amazon Linux whatever they ship default.

And if they're going to make this the default fine so be it.

I just I don't I basically I don't want to be concerned with it at the level that we operate in.

Different companies have many different requirements.

Yeah I think certain enterprises maybe like Disney or something require that you must run this version of enterprise red.

But we try not to play that game if we can avoid it.

And companies get into that because they have their own APM distributions or whatever and their own signed packages and their own way of doing it.

Teams that manage that.

Which then makes us even less palatable to pick.

OK we're going to now suddenly bottle rocket which has no historical proof of that, then it's going to stick around for a while.

Now Amazon lets this be really interesting.

I don't know what services has Amazon deprecated in the last 12 months for example compared to like say, Google, or others I don't have an answer for the stops making them any more money well that's Google's thing right.

That's Google's strategy right.

But Amazon has been a little bit more commit haven't you know more relationship commitment based on the.

The concern I would have with bottle rocket is.

I mean just go to the GitHub repo and read the read me and you can immediately tell that the vast majority of their efforts are going to be focused on UK as a native US only.

And so if you come in like there's a comment that got added.

Remember when it got added but it was like hey this would be awesome on Raspberry Pi.

Not a single response.

But it's.

But it's like Yeah.

OK Amazon's making bottle rocket.

And they've already said their first you know.

Very into whatever is going to be for yes.

So unless you're using ks it's not for you yet.

It's going to take a while for all the variance to come out.

So I've been doing a lot of work with the Air Force has this new initiative called DSP and it's got a bunch of different names but platform one is another name for it.

And this guy Nicholas Shalom he's is the Air Force Chief software officer.

He's this guy from Eats from France.

He's you know he's got this crazy French accent and he's but he's brilliant like he's you know he was a he's a serial entrepreneur.

He was a you know multi-millionaire by like 25.

And he's got a ton of patents and stuff and stuff by he's so he's kind of leading the charge on deficit ops inside DSD.

And so he's got this whole initiative going with you know OCI container a native Kubernetes platform and it's completely vendor agnostic.

So it's you know all my efforts lately have been a land up.

No we're not using that we're using native or whatever you know and/or.

So like this whole bottle rocket thing he would just go you know that's AWS.

That's we're not using that.

No way.

If I can't install it anywhere in the world you know including on a frickin' Humvee I'm not interested.

Interesting so basically going for the lowest common denominator across these systems.

100 percent yes.

So even with they're using a lot of OpenShift but they're not allowed to use their they're specifically saying you're not allowed to use any of open shifts special sauce.

You know you can't use.

You're not allowed to use OpenShift build runtime or whatever it's called and all that other special OpenShift stuff you have to use whatever is f compliant is OpenShift making any strides or is it infeasible it rather to have it installed strictly on top of Kubernetes.

Does it always have to be installed at the same level as the control plane itself OpenShift is a distribution and if Cuba is Yeah.

So it is in place of vanilla capabilities it's there's a bunch of different ones out there now there's like VMware can do there's open chef.

There's there was rancher but my understanding is rancher is now building on top of like you can run it ram ranch or on weekends Yeah.

Right ranchers just ranchers just held deployment.

Now you can deploy it anywhere on any creative cluster with what the dumbest namespace ever so cattle system cars.

It's the cattle system.

Yeah I don't know.

And it's like you can't change it.

You try to change it and it breaks everything.

Exactly you know ranchers nice.

I mean use it for a while now and that's what's going on here.

I'm liking ads.

I don't I've never quite gotten far enough along with the to get the value add.

So for me it's the user management.

It's so easy so easy to provision users and say OK here's this is a new team that I'm you know that I'm bringing on and they need access to these five name spaces.

So I'm going to I can hook it up to El dapper or whatever I want and I can say, OK, these five users have access to these five names faces and done.

It seems like that should be a celebrity type of deployment that you can just do declarative.

When you add a new teams you know there's a Terraform provider I can make Terraform for any of that stuff.

Yeah I just wondered how are using Cuban ice to make business logic type of constructs like that.

I'm looking forward to that day.

So yeah speaking of series any new discoveries.

Zack now.

Now I've been knee deep in trying to figure out whether I'm making a huge mistake in pushing out Postgres marriage progress schemas and users and stuff like that through Terraform or not.

So unfortunately most as opposed to as opposed to just having it all in some custom script somewhere you know.

I mean right now I'm working around managed so you can't that it might be my own ignorance when it comes to Terraform but you know what point you draw the line between the actual configuration of the system and provisioning of it.

Right like so with managed postcards were during deployments where we need to have a firewall rule that's going to only allow certain eyepiece through.

So if you want to have pipelines that also do these updates and run Terraform to do these things you need to be able to apply that basically the cic firewall access to do these changes to this managed postscript instance.

And if you want to do that then you have to have some sort of dependency.

So you can't really use the provider for post growth providers you can't have any real dependencies upon.

So I'm just working around all sorts of weird oddball issues like that and realizing how much Terraform I love and I hate at the same time.

Yeah and that was our experience.

Like in the provider thing you're seeing how like you can't provision the database and set up the permissions in the same project.

Is that the case for in general am I. Yeah no that's the case because.

Well unless it's changed recently we had the same problem basically that the provider errors because the hostname doesn't exist because you haven't created yet.

All right.

So it's like day two operations versus day zero of the same thing doing Amazon mq services you have to create the config annex of all config but you can't create it until you know the host names that you want the networks the brokers to be networked together with.

And it's kind of ugly passing in some count flags to make it do something really simple.

And then incremented to make it do something more like what you want.

Yeah it's interesting that Yeah that's the same kind of scenario it's for a chicken and egg scenarios abound in Terraform.

So yeah it's that cold start thing is one thing therefore we don't overinvest in and focus more on what's the day to day operations going to be like adding or removing stuff and worrying about that.

That being said how would you then move forward and do like a pass system where you add another type of craft came out or something along those lines.

Do you use Terraform or do you just make it into some other pipeline from custom script.

Basically comes down what you're describing is a pipeline of operations so moving that into whatever system your organization has adopted This is the problem with Terraform itself is that there's like no concept of an operator unless you consider a provider to be that and dropping into go.

Every time we want to do that isn't what I think would be our solution for that problem.

Write a lot of people in my company would say to use ServiceNow everyone's ServiceNow there's some say service not cool you them fascinating.

Cool any other questions or talking points.

Oh I think someone cheesy had one right one selfishly.

What's whatever his thoughts on it of your certification.

I'll get one point of view.

I mean my.

Yeah so so eight of your certifications are more valuable if you're going to go into working with enterprises which use that whole resum�� filtering type system for that.

The other is depending on the kind of company you want to work with.

So a consultancy like cloud posse we move up the ranks the more certified engineers we have.

So you know that that makes an engineer who passes all the other requirements.

More interesting if they also happen to have like eight of your certifications because we can then move towards like advanced or Premier tiers.

What do you.

I guess.

What do you mean by.

You move cloud policy moves up the rankings so there are different tiers of data.

Yes partners.

OK OK.

OK Yeah exactly.

OK like what is a Microsoft has the gold partner you know.

Exactly OK.

So that becomes that makes you more competitive if you want to work for a consultancy like that other than that.

What I think is great about our industry in general is it's a meritocracy.

It's based on what you've done recently and your accomplishments.

So that speaks, I think more than just your technical understanding of some of these certifications which makes can't make up for experience for sure.

Yeah I kind of see it the same way as your GPA in college right.

Was less once the last time someone asked for your GPA in college was when you're doing a job in his kitchen reason and hopefully hopefully nobody's been asked that question after their first job.

Right so therefore.

Yeah the I have C I like I don't really have any certifications and it hasn't hurt me yet.

And I've worked with a lot of people who have all kinds of certifications and are terrible so you know, just because you have a certification doesn't mean you're good.

I recently went through the process for cloud tossing and you know I've been doing working with AWS since 2006 or so.

So a really, really long time since I was in a private data.

And the questions many questions were very ambiguous to me based on all my experience and working with up us.

And clearly they're asking or looking they're prodding for one answer and that answer is very much based on their documentation so much so that you can often search for it and find that wording but if you learn it more organically it can be like, well, do you it like this way or this way or this way.

And that was my frustration.

So I had to literally without the flashcards and memorized the wording to get it right.

That Yeah because I'm looking into it now just more like I'm probably working with AWS heavily for the past two years and I know how our company uses it and I almost took it as an opportunity to kind of see I think for the associate Certified Solutions architect click associate level of gives you just a big picture of AWS some general you know just studying for it you know I think it's a great way to rapidly increase your exposure to all of that stuff.

And like anything in life I would do it if it's worth it for you.

I wouldn't do it if it's worth it for to reach some love unless your goal is to reach that company that company requires it then that it's that objective.

But if you're doing it because hypothetically this could help your job prospects that's maybe not a concrete enough goal.

OK good enough.

Yeah I think I'd already decided I just wanted to validate my decision.

No I appreciate it.

And it's also Yeah no let's leave it at that.

So case Casey Kent asks the question in chat here.

Question when you can get to it.

When do you think it's necessary to provision another cluster thinking about doing this.

To put data like airflow GTL et related Kubernetes deployments away from production infrastructure.

I could also add some tainted notes on production cluster as well.

Personally, I like per project clusters.

Zack answers.

Personally, I like per project clusters and possibly dedicated stateful data and shared services clusters as well.

So yeah my two cents on this is so I can say what we've been doing.

And then I can say share kind of some of the pros and cons with our approach that I have to reconcile.

I don't think we have the perfect answer anyone does but so Kubernetes is in itself a platform right.

And there's two ways of looking at it.

One is your operations team or how big your company is depends if you even have that but you can be providing companies as a platform where that platform is almost like Amazon is a platform.

So there is one production tier of Amazon for all of us.

Everyone in this Zoom session here we're all using these same Amazon.

It's not like we have access to a staging Amazon.

Amazon is providing Amazon as a service to all of us at the same tier.

So you as a company, could be doing that with Kubernetes.

That means that you're Kubernetes is your staging environments your preview environments acceptance testing data everything could be running on that same platform and that platform would have the SLA of the minimum or the maximum SLAs corresponding to the service with the highest SLAs now.

Well we've been doing a cloud pass he is not doing that approach because ultimately you need to dog food your clusters you need to have environments where you can be testing operational work at the platform layer that is outside of production and if you're doing this in strictly a test that environment you don't have real world usage going on and it's harder to pick up some of the kinds of things that can go wrong.

So while we predominantly do in a typical engagement as we roll out a minimum of three clusters and up to about five clusters and then work out like this.

So we have a dev cluster that's a sandbox account where we can literally do anything that there is basically zero SLAs on that cluster or that environment.

Then we have a staging cluster.

This is as good as production without the strings attached of having to notify customers or anything like that if anything goes down and it allows the team to build up operational competency in a production kind of environment and then we have a data cluster.

So this is kind of addressing your question directly Casey and the data is more for a specific team and that team tends to be like the data scientists.

The machine learning folks to operate in that environment typically needs elevated access to other kinds of data data perhaps that emanates from production or different resources or endpoints.

So that cluster will have its own VPC and its own PPC gateways and be it in its own actual AWS account.

And then we can add better control I am and egress from that cluster.

And then lastly, there's the production cluster.

What I'd like to in the production cluster what I mean by that is it's production for your end users your customers.

But what I've described here has a there's an Achilles heel to this and it's that every cluster I describe is production for some user group.

So the staging cluster is more or less production inwards facing for the company and your cute engineers and you know everything comes to a grinding halt from a QA process and testing process.

If that cluster is offline.

One other cluster I forgot to mention is a core cluster in the core cluster sits in a separate AWS account and is for internally phase iii.

So it's like production for engineering it run your key cloak servers and perhaps your Jenkins servers et cetera.

Your point about running multiple node pools is a good one and I still think that is another tactic to have in your toolbox that you should use.

A perfect example is if you are for some reason running Atlantis we would probably then say run Atlantis maybe in your corp cluster but should it run in a standard node pool.

Probably not.

You should probably run in a separate node pool that's tighter with more.

That's more locked down in terms of what can run.

And then that cluster as a whole.

You really want to lock that down because you have like this pod there that God pod that you can exact into and do anything.

So this is another example of like when considerations when you want to have really separate segmented clusters and environments where the reality of just using I am and are back in all those things to lock it down is, in my mind, a little bit insufficient for it.

So the problem here is that we have this core cluster that's production for internal lab so you know it's OK if it's down a little bit but it should be down a lot.

And then Yeah your staging cluster which is production secure.

You have your dev cluster which is kind of production for developers to do testing and stuff.

So you know the more unstable that is more and everything else is impacted and they end production for your clusters.

And the configuration of these clusters is more or less different.

Out of necessity because they have different roles and does that mean we need a staging concept of staging for each of these clusters and lawless argument.

Yes it's just that we haven't had found a customer that wants to invest in that level of automation related to this there was a Hacker News post one or two weeks ago announcing your GKE these new pricing change for the control plane and you know a lot of people were up in arms over that change and I forget exactly the context that led to this comment that the comment was by Seth embargo and his comment was wait so why are you guys tweeting your clusters as pets.

They should be you know cattle the clusters themselves as part of me reacts like all right you're coming from Google I know you do some amazing engineering over there and you're able to do all of that type of stuff like Netflix does as well they do full on freaking blue green regions and availability zones and all that stuff.

It's just that for most people they don't have the engineering bandwidth in-house to be able to orchestrate do that with regularity.

And the reality is clusters have integration touch points to all the systems that you depend on.

They API keys and secrets are called web hooks and callbacks and all of that stuff.

And orchestrating that from 0 to the end is a monumental task.

My point is it'd be nice if each of these clusters were production like I described, but also had the equivalent of like a blue green strategy that would allow a staggered rollout of each of these environments.

That was a mouthful.

Any questions on what I said or clarifications somebody also shared the link.

Andrew Roth shared a link on how many clusters I haven't seen that link.

Andrew do you want to summarize it.

I read.

I read this link this article a few months ago.

And it really brought home you know kind of the dilemma because the core of the question is you know you can you can do one big large shared cluster and you it's cheaper and it's easier to manage but then your blast radius is really big and everything or you can go all the way to the other side and you can have like clusters all over the place and really you know.

But this table is like so you just have to for your particular environment you just have to pick which little box in this table you're in.

I'll summarize it didn't give an answer so that I read this article as well.

Google also put forth some recommendations around this and they explicitly recommend her team slash project type of setups.

I'll try to find a link and send it out here shortly but Yeah there isn't an answer.

You know it's completely nebulous at the moment.

Well there can never be an answer I think is the bigger point is it's based on you select which of these you're optimizing for and you can optimize for one for solving one of these.

You can optimize for solving the entire matrix right.

We're just getting started with rancher but it's going to I think it's going to make our lives easier when it comes to this kind of stuff because we're going to be able to centralize user management but decentralized cluster management clusters themselves.

So if I have a new team you know I will manage those users in rancher and create for them a cluster all for themselves you know and use rancher to give them access to that cluster.

And you're running rent bill you and you're we are in you're using cops for all the aforementioned reason reasons you mentioned.

I think we're going to get away from cops because Yeah it's kind of not very secure.

We think our key is what we're looking at right now an OpenShift has a really elaborate permissions like Yeah I think the other thing that they were using was mentioning earlier is OK.

He is read your QB daddy's engine OK.

It's a pretty nice offering there.

It's the.

You have your servers and you.

It's how you install communities.

You know it's a alternative to like two bad men.

Gotcha and it's a lot it's a lot simpler.

It's one it's one text config file gamble and it's you know you pointed at the config file and you say Arkady up and bam there's your Cuba data cluster.

And there is a currently in development kind of beta Terraform provider that I've used that worked really well actually.

So I'm excited about it.

Also Iraqi or Iraqi.

Yep so in one swift stroke I was able to use Terraform to provision easy to nodes and then come along afterwards with Arkady to install Kubernetes and then even come along after that with the helm provider for Terraform to install rancher.

And so with one Terraform apply I went from absolutely nothing to a VPC with nodes in it with Kubernetes installed with rancher installed using helm.

Look at that sounds pretty cool though.

That was very exciting to me and it all works perfectly as the health provider been updated yet to support home 3 I don't know but I'm going to be looking into that soon myself.

Yeah I looked into that maybe a month or so ago.

I was curious.

Any movement on that you can have as many things in the Terraform ecosystem move I was a little bit surprised that the helm 3 hadn't been supported already since it was in beta for quite some time before before going GM.

Well if anything it should be easier because there's no tiller.

Yeah it again.

So what to do with the whole get UPS movement.

You guys are very heavy into file.

Yeah do you see yourselves getting away from home file and using something like Argo a point.

All right.

I see.

I can do home.

I can.

Yeah Yeah.

Argos like those kind of the workflow management so Yeah.

Yeah if you need help find less you can define it in Argo.

I would say part of our challenge is that we need to be able to describe reusable components that we can use across customers and implementations and our files kind of that reusable component that lets us describe the logic of how to do it.

I like the things I love and hate about how file and part of the thing is that it's just been the Swiss army knife for us to solve anything.

In the end the end user experience is not that bad.

Once we once we get to using environments I'll show you an example.

Then it's really quite nice.

So if I go to look at quickly an example of what that is project x And let's go to helm files go to like or castle maybe.

So all we ultimately expose is just a very simple schema like this for what you need to change and everything that you know doesn't matter how like the actual and file for this could be rather nasty just like everything else and Terraform can be pretty nasty.

So this is the schema that the developer exposed the maintainer of that chart, which we have no control over but we reduce that to all we know are opinionated version of all the.

All you need to care about all you need to care about is that Yeah Yeah we're doing something similar with home files now we've done get cut.

Yeah get lab key cloak.

Open LDAP Claire looks a little dirty.

I'm trying to get her engagement.

I've been on.

Oh Yeah.

I use it single handedly to construct and weave out whole batch of clusters for our client as a Maid changes on the fly to their requirements.

So it is definitely a good glue tool.

Where I did struggle is bridging that gap between Argo type of applications and help file.

So I mean I was looking at the home file operator amongst other things to ease that transition.

But I never really got that far.

So yeah I didn't notice that there are ways to make home file work through goes.

It's not something that they do by default but it's something that there's custom tools that you can apply to goes.

You need to make it work.

It is undeniably one of my saving graces in this current project.

I was on mobile and the reason why I'm even on this call is because of your home file repo.

Thank you.

By the way of which you know.

So we're working we're refactoring a lot of our health files stuff to support the latest stuff I just show you here and we'll be contributing that stuff upstream later on this year.

I can't see it when but as soon as it's kind of the dust settles we'll get back to that one thing that kind of made my head spin and I'm not sure how I feel about it.

But this conversation jogged this who shared this was a music arcade at one point.

I don't know if I share if I did if I run across it.

Yeah so this kind of make makes my head hurt a little bit to go down this route and this is kind of like this.

This is somewhere like well why are we using text templating to you know parameter drives values for hell and when is this man who can stop and when are we going to use a more formal language to do it.

And how can we.

And it also speaks to presenting a cry for your team to install things and giving you like the ability to do all the testing exposed by going.

I mean this could be done in any language right.

This could be done in Python or Ruby or whatever.

So Dave Dave picked go obviously for this project and what makes my head hurt.

The end user the end result is, I think, something that blaze might have shared.

He says what in the end makes my head hurt is that every app you want to add to the catalogue you literally whip up a whole bunch of go code to generate the actual to install.

So yeah look no thanks.

You definitely solved light.

Yeah you're not templating yellow anymore but the barrier to entry and the maintenance around this.

I really wonder if this is going to be a long lived project Warning anything that puts it into a seal I make it certainly easier to test the waters.

But it doesn't make it any easier to pipeline yellow or certainly doesn't it makes it more interesting.

But you know it's like the UK us TTL command.

You know like why not.

Why would I use that if I have to reform any cable bill.

You know it flies in the face of using kind of get UPS where you want to have a declarative description a document that describes how to deploy it.

I want to have a t-shirt that says declarative nation on it.

That's the underpinning thing that is make me I mean I don't know exciting so I don't think I don't think it make and I could be wrong but I don't think that the makers of arcade are interested in you know production grade type of deployments different horror.

You know they're there cause this is arcade is is strongly correlated with catch up which is strongly correlated with k 3s which is you know so have a Kubernetes cluster up and running in 10 seconds.

You know it's all about get something going as quickly and easily as possible and well hell Yeah.

Arcade install cert manager if that's all I have to fucking do then.

Cool you know Yeah I guess where it's interesting is like what we see here is to me this is now I know there are other examples.

I'm sure you guys can give some examples but this is kind of like our home files repository where we're distributing an opinionated installation of Ubuntu of files.

Any other distributions like that like our home files but using other tooling that you guys can point to.

I appreciate it.

Just so I can get inspiration feel free to share that.

That's not helpful.

That's not helpful based or what I do to anybody else sees another file repository with a dozen or so help files.

Do let me know about that too because I you know that's how we learn from each other is to see patterns that other people are doing.

Yeah I'm trying to get as the other two I'm going to try to get our eyes open source but no problem.

Yeah thriller tools are similar to home pile but none as comprehensive and able in my mind.

So we used helmsman for a long time over a year and then we switched to him file and help file by way is reached almost 2000 stars now.

So it's pretty exciting.

Hey hey if anybody knows Rob all or whatever his name is you feel free to let him know that I'm OK.

I'd be open to working for Datadog now.

Yeah he's not very involved at all anymore and the helm project I signed.

Chime in briefly when there was talk of contributing the project to a native foundation.

But that was the last engagement I saw from him.

I'd never spoken I was so confused there for a second Andrew I thought you were talking but I think were.

No I'm sorry.

No I was just stretching out those.

No I was saying I was saying we have a home file we go you know.

Of not a dozen but maybe half a dozen now but and trying to get them open source is going to be a challenge.

Yeah but I'm going to try to.

Yeah you do.

Hit me up as soon as you if you do get those opens.

Yeah Yeah.

Because we our whole philosophy was when I. So like we've got our biggest one is get lapped by far you know cause get laps home chart is a fucking monstrosity.

Sorry pardon my French.

That's an understatement.

We want our people when they run you know helm file install get lab or whatever the command ends up being with all the defaults that there's a bunch more required parameters.

But once they have met all the required parameters it deploys production ready like it uses an external post stress database it doesn't use you know caused by default it uses a container you know which sometimes is OK.

But for us it's not right.

So what we say anyway.

So what we say instead is we don't care what you run but you'd have to be external.

So if you want to run a container on the side and that's how you're going to say you're going to run your production ready get lab then fine.

But we will not you know turn on the little flag in the helm chart to run the internal Postgres database.

Same thing with like cert manager we don't turn on the internal cert manager.

Same thing with mania.

We don't turn on many.

We make them you know say, OK, what's the names of the S3 buckets and what's the you know if it's not a AWS's three what's the u.r.l. to the S3 host.

Tend to work against you with home files and a breaking amount as well so and and it's been it's been the little experiment has been very nice you know because that's the but it's been the number one challenge when it comes to helm especially with all these open source charts it's like OK fine I see that there's 500 different drink configuration parameters I can set and that's awesome.

Which ones do I set for this to be production ready.

I have no idea.

And there's no documentation to tell me and it's a little bit like that matrix we saw for how many clusters do I need that matrix is going to be different from your or from that and different from word to word.

Sure there's this concept of compose ability you know which is like the property being able to copy something out from one context and paste it into another.

And that seems to be like I'm having a lot of like PTSD around the helm stuff and like Maven like Palmisano you know like course you have PTSD if you've been you know it's like you want to take this snippet of XML and drop it or you know just moving stuff from one file to another and not having the right context if you know that that was one of the bane of my existence and I don't know if killfile simplifies that or makes things more modular.

I have no experience.

I mean I think alpha gives you the opportunity to be internally consistent within an organization or compose ability that way by having a common interface.

But my our files aren't going decompose go with your hand files are now in that same way you almost always have to put your own framework around what your needs are.

So I found that doing things like not enabling default in growth and things like that for a lot of charts certainly helps you know keeping those things completely segmented and in their own model.

So I got to keep things I got to keep things you know generic but he talked about enabling ingress.

So the reason I asked about what's the process for giving people the keys to the castle and stuff is because we gave the keys to the castle too soon to someone then and a bunch of ingress is were created to things that shouldn't have been created and trust issues.

So now I get to look at admission controllers for all kinds of like 4 4 in dresses for STL virtual services for services.

If it's a service and it's not a cluster IP I want to reject it and that's really and so I'm going to use opa am excited.

Yeah Yeah you can do like come up with a little demo or something that really, really not really like that you do.

Yeah Yeah it looks deceptively easy.

I guess I'll say I'll put it that way.

Because I mean you know the open docs are like hey hey here's eight lines of code and that'll do it you know.

So we'll see you know it is always a big gap between hello world and production.

Yeah but I mean it should be very straightforward.

You know because admission controllers look at a particular type of resource right and just do checks on it.

So it should say for all services if type does not equal cluster IP reject other than a small white list like Nginx ingress needs slow balancer but we'll see Yeah.

That's I think this is interesting because I think this the fourth or fifth office hours in a row where opiates come up.

So I think that is an indicator of how relevant that is.

If you're doing anything serious with production on companies for terrible we are.

We've got five minutes left here.

Are there any closing thoughts or last minute questions.

It was nice to finally meet you at scale.

Oh Yeah.

That's awesome.

Thanks for poking in and saying hi that's Todd red 10 for who did you go to any other interesting talk said scam bad one that we're talking about it was primarily around Java apps inside of cubed and using things like grail to minimize startup time using things to compress Docker images down we've got some apps that take 60 to 90 seconds to start up and the prospect of getting that down to two or three seconds is more than just a little enticing is that of milliseconds.

Is that largely the image size or it's not the greatest Tomcat or whatever.

Spring Yeah spring I'm in the same boat dude.

I hear you.

We've got a bite to spring good apps and they're slow and now and to the point that we don't whenever we're setting our limits we for memory we set the request and the limit.

The same and then for the CPU we set the request and we just don't set the limit because startup times get a little better if we just let it consume everything it can.

I've been reading that you shouldn't set speed limits like really for anything because it's different than memory limits.

It doesn't work the same way and you're just artificially limiting things when they don't need to be limiting.

Limited Yeah originally we were kind of trying to avoid all types of over subscription and that's great for memory but it's kind of screwing us whenever it comes to cb.

Yeah because I read I read a lot I read an article Turner emerged with a numerical weather where I would find it.

But it was you know it was talking about how the CPC used scheduler or whatever it's called does the requests and the limits.

You know what it was saying if you've got five pods and they all requests one 100% CPU the CPI you of the system will happily hand each of the pods 20% of its you.

No problem at all.

You know and it it can just kind of figure it all out.

And that made a lot of sense to me.

So I've stopped I've stopped using CPI limits or if I have to use CPI limits because if there's a limit ranger then I'll do like 10.

I think the problem you limit it to me is less quality of service.

And if you want to provide any form of guarantees to service and maintain latency is such that that would be a bad idea to keep it up.

You have a link to that if you can when you find it.

They'd be great if you share that.

I'll try to find it but I was going to say if you looked at caucus for your Java apps not heard of that.

Personally it looks really interesting where I'm here it sounds like it's caucus.

I want to hang out with one.

I don't think.

I'm not sure if spring is is supported but it compiles your Java app to native code.

So I mean like the joke I made the joke I made not too long ago that you know two to three seconds.

No try.

Two to three milliseconds is real.

Like it's as fast as you know I'd go app that's compiled the native code and you can do the same things like install your Java app that's done using caucus onto the scratch container, which is into it.

It's just the bare Linux kernel with nothing in it right like it.

It's literally the smallest can possibly get.

Basically a native Go app.

The guy did mention caucus and to be perfectly forthcoming I had a hard time following some of what he was said because it had a very heavy accent and that's what he was talking about.

IT to the native app.

That's cool.

Well if you tired if you go continue your explorations on that and how many wins can be great if you follow up and share those with us on a subsequent office hours.

I'll do it again.

Sounds like is quite a bit ahead of me already.

He knows the names of the things.

So all right everyone.

Thanks you for sharing.

I learned a few new things this Office Hours.

As always remember to register for our weekly office hours if you haven't already, go to cloud policy office hours and you receive an invite for your calendar.

Thanks again for all your time.

A recording of this call is going to be posted in the office hours channel and syndicated to our podcast that podcast got a positive outcome.

So you can tune in.

However you listen to your podcasts.

See you guys next week.

Same time, same place thanks.

You guys have a good one everybody.

Public “Office Hours” (2020-03-04)

Erik OstermanMarch 4, 2020Office Hours

Here's the recording from our DevOps “Office Hours” session on 2020-03-04.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours.

It's march 4th 2020 my name is Eric Osterman and I'll be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator.

We help startups own their infrastructure in record time by building it for you.

And then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to amuse yourself at anytime if you want to jump in and participate.

If you're tuning in from our podcast or YouTube channel, you can register for these live and interactive sessions by going to cloud posse office hours.

Again, that's cloud posse slash office hours.

We host these calls every week will automatically post a video recording of this session to the office hours channel as well as follow up with an email.

So you can share with your team.

If you want to share something in private just ask.

And we can temporarily suspend the recording.

That's it.

Let's kick this off.

So here are some talking points for today.

They are mostly the same as last week.

There's just a bunch of stuff.

We haven't had a chance to cover because we've had so many good questions.

So if there's ever idle conversation here some talking points.

So before we get to these let's open the floor.

Anybody have any questions problems interesting things they're working on that they'd like to share or ask.

I have a question.

All right.

Go up, down.

All right.

So mostly it can actually be like McFaul took out help developers get through sort of the repeated tasks or build up containers efficiently.

So one thing I started to put more focus on is the way in which we have to attach images.

So I'd like for some of you would go at the get shot at the part of the image taken as well as any particle stutters that you guys find that works for you best for tagging the images.

But the images all the naming convention.

Yeah Yeah.

So while we've been practicing mostly as part of our pipelines is that every push of the repo builds a Docker image and tags it with the get shot.

And the short shot.

Honestly we never use this short hash.

We almost always just use the long commit hash and the way we use that is then for separate pipelines.

So for example, if we have a separate pipeline that kicks off down the road and cuts a release like 3 to 2 to three, what we'll do that pipeline will look up the artifact for that commit shore and tag it.

If there is no artifact then that pipeline fails.

So basically, we decouple the building of images and artifacts from the process of retagging those images with the commission.

Now I see a couple different patterns happen here.

And it depends a little bit of what your continued delivery or deployment strategy looks like in companies that don't practice strict assembler for their releases because they want more like just a streamlined process of things hitting master then automatically going to staging pride or other environments.

Those companies tend to use shores for that instead of using similar.

There is still a way to use ember with that, which is kind of nice where if you do if you cut if you make similar part of your commit history.

So you have like a release file release date gamble.

What we've seen in our pipelines then that apply what's in the release that gamble on merge to master.

So then it'll do that for you, which is kind of nice because then you have a totally get driven workflow for cutting releases a little bit more rigid in the sense that you can't just use the release functionality on GitHub then if you want to be consistent in any of your questions there.

Yeah, you did get it.

So additionally, I must see a full adaptation of replacing the maintainer type Docker file with C using labels which gives you a bit more or still take what can be tied up started to use labels to also get shot within it.

What a reversion as well along with actual builds.

Yeah you see that happening there.

So I think that that is an excellent idea.

If you are able to surface that information as part of your c.I. system and your Docker registry and assuming that it helps perhaps your team or others reconstitute what happened.

So this is a big part of code fresh is code fresh uses these labels and images extensively.

So tagging the image or labeling the images.

If I pass labeling the images.

If what you call it.

If your security scanning see the vulnerability scanning passes labeling it with perhaps build time.

So it almost makes a registry like the source of truth for all that extra metadata about that image and that follows that image around wherever it goes.

How have we been practicing it.

It's not been something we've had we invested in.

But I don't think it's a bad idea.

If it's become a project for you.

Yeah, it just start to look closely at it.

Actually I didn't like build times to it.

Vendor ID.

Oh, yeah.

So exactly.

You're on it.

Yes What I'm going to do.

I'm doing it by passing a bill argument.

Yeah So I'll get that.

Environment verbal.

See I am just.

That's the right way to do it.

And where this gets interesting is then if you I mean, obviously, this stuff is only as secure as your registry and is only as secure as your ability to add those labels or preclude systems or processes for labeling it.

But assuming that that process is secure.

This is a great way to also then enforce policies on what gets deployed inside of certain clusters based on those labels.

I'm not sure.

So I'm pretty sure you're using something like that was a twist lock.

We'll let you do that.

And I'm not sure if there's a way of doing it with OPM right now, but maybe somebody else if somebody else knows the answer to that.

Let me know.

Awesome any other follow up questions to that or other questions.

Who are you doing.

Vulnerability scanning right now deal on those images.

Yes this is quite by the personal side of it that we don't use to the hub or existing cortical testing.

So we admit it, but bill time within our history for other images as well as the runtime, which is something that I like.

Just like everything encapsulated them against other ceilings as well as it also has a kitchen like this with vault to execute the runtime but it doesn't look pretty good.

Well, we have to look into possibly replaced just luck like Claire Falco.

So we're exploring that.

But you know put covered as part of that.

Have you explored the east yas Container Registry scanning and how it compares feature wise, and how effective it is by comparison.

Yeah So this is actually it was clear that they are using clear under the are using.

I don't know the level of control that we do get into everything.

But I'm thinking if we do it in duration.

So long.

So we'll get the reporting aspect of it.

But again, that just be based on just what you're offering what I saw it it didn't seem to be much.

Did you see that kwe has also been open source.

Now Yes.

Yeah Yeah well that compares so one thing I'd like to see in any system like that would be the ability to track kind of the meantime resolution for a CV in the system.

So like you don't want to shut down the service because city is suddenly detected there and caused the blackout.

But you do want to track how quickly that.

How long that persisted and until and when was a result.

That's something that you guys are factoring in as well.

So we've had this year where or communication time varies because we have a private resource that keeps just doing that.

So we've been actually some teams and then we'll try to not limit what's possible.

So yeah.

So without a public plan that we have to put this out there not to build but develop a set of resources behind that, which is must start feeling.

So there's a lot of background noise that where you're Dell.

Any chance.

I know.

I know you're always in a well planned environment.

Yeah much in a caucus space that's like today.

OK How much was a bigoted are all taken by the lifetime squatters there.

Yeah Yeah I know that fighting for the phone booths.

All right.

Well, let's see.

Maybe maybe it quiets down a little bit.

There's a question from Casey Kent.

He's been part of the community for a while now.

Yes, he asks the question on jack.

You touch on the set up and best practices with FFK stack on Kubernetes.

If you have some time.

Sure certainly I can point you in the right direction in that case.

So this is very common question that comes up in the community.

Unfortunately, our office hours notes are not properly tagged on like what we talked about or what to do.

Just be aware of that past office hours have talked about this on how to set it up the.

So the efk stack for everyone.

That's the Elasticsearch flew into and cabana stack.

It's become pretty much the most common open source alternative for something like Splunk or Sumo Logic.

So what we would recommend in this case is, first of all configuring floor d not to log directly to your cabana sorry to your Elasticsearch instances because it's so easy to overload Elasticsearch, and when Elasticsearch is unhappy it's really unhappy and it takes a long time to recover.

Also scaling your Elasticsearch clusters.

So you can send a firehose to them is very expensive.

So what you're going to want to do is set up your fluent d to log directly to it like it can easily stream if you're on the W us, which I think you are.

So if you If you drain too if you send all your logs from fluid directly into cornices Guinness is going to absorb those as fast as you can.

And then you have an excellent option to drain that to S3.

So you're going to want to send those log from isas into S3 for long term storage and you can have all of the lifecycle rules and policies there.

We have some great modules on cloud posse for it like a log storage bucket that helps you manage those lifecycle rules very easily.

So you can consider that.

And then the other thing you're going to want to do is drain it for real time search into Elasticsearch.

So both these modes are supported by the Terraform provider by the Terraform resource for this.

I've taught my head to forget exactly what it's called, but it will write directly into S3 and Elasticsearch.

So then the last thing is for like for certain things you'll be able to use Athena if you want to query the data in S3.

So long as I think your query is complete what is it 30 minutes and then 4 for developers and stuff like that.

They have the build time access to the logs inside of elastic.

Now I want to point out one other thing that we've had a lot of success with that.

I like is that there's a little utility.

It's called cube century.

I think there's two there's two options for this.

There's two open source projects and what it'll do is it'll take all your events happening from the Kubernetes event log it and ship those into century century is the exception tracking tool.

And now what's cool is you see the most common exceptions bubbling up to the top.

The most common events and things happening.

And you can assign those two teams to look into using all the conventions that you have in century.

So centuries also open source or if you're using the hosted version that works as well.

So Casey was at a good overview of the way the architecture for setting that up.

Cool So he says that was what he was looking for.

And we're also two other notes.

I mean, where we're typically using the elect managed Elasticsearch by AWS and that also comes with cabana out of the box.

So if you know the path.

There's some path to it.

I forget what it is something.

But if you know that, then you can just access cabana directly there.

You know, a lot of people speak very highly of elastic code and there hosting of Elasticsearch being more robust newer versions newer releases of Elasticsearch.

So that's a consideration as well.

I just.

And it can be controlled with Terraform like everything else.

The challenge there is if your organization has kind of a blank check to use your services.

Now sadly you've got to go get another vendor approved and maybe that's why you wouldn't use it.

All right.

Any any other.

Oh Andrew, I haven't seen you around.

Good to see you join today.

Where've you been I've been busy.

Well, so How's your.

Any interesting news to share with your projects there side projects perhaps dad's garage.

No, not really.

Not so much other than the.

I got that I got my team on board.

So we're going to work on it.

Oh, excellent.

I was not I was not able to get them on board with open sourcing our work.

But maybe someday Yeah but we're going to build it out.

Yeah I like my company in general.

It's not I wish we did more.

Yeah, I think it's very difficult to go from closed source to open source.

And it makes the in-house counsel very uneasy about that.

But if you can get them to agree that certain new projects will be open sourced from the start maybe components like modules and stuff.

The more clear cut cut and dry path to open space.

It sounds like we're there.

Well, I've got art.

I have our chief intellectual property council on board.

I have our vice president on board.

And it just has gotten pushed over to the back foreigner burner.

And it's really a shame because I'm so passionate about it that every few weeks.

I send out a you know, an email on this thread that has been going back for months now.

And like, hey, what's the status on this.

Oh, now what.

So when I was at CBS Interactive.

I was leaning up the cloud architecture over there.

And that was one of my big drives was getting an open source policy an open source initiative at CES.

And yet, I think it took the better part of a year before we were able to open source one project out of that.

Anybody else have any experience helping your organization open source code.

You have.

It's difficult, but it can be like pulling teeth Yeah, we're able to do any of that at your last place John.

No, there was talk of doing it.

But you know just getting that ball rolling this couple a little utility things here and there.

But you know, to get them to understand the value add of open sourcing is quite difficult. Yeah, I actually this is blaze.

I blaze.

Hi So it turns out Mike.

So I've been working at sumo for the last year, and they're pulling back on their open source initiatives.

Really Yes.

In fact, I am officially looking for another job.

Oh, yeah.

Anybody looking for a community evangelist ladies is your guy here.

That's Sumo Logic.

Yeah Yeah, that's where I was.

Yeah, they have I mean, I sort of get it.

They didn't say as much.

But they wanted to an IPO.

So they're basically just not making any investments that don't have a direct immediate payback.

Yeah, I think it's long term, it's probably a strategic mistake mistake because they want to have a bigger presence in the cloud native environment and the competition is just eating them up.

Yeah, that's an interesting one.

I hope, though, that.

And obviously, this is being reported.

And we shouldn't talk about anything we shouldn't talk about.

So let's just keep that in mind.

But the there open source agent.

I mean, I think it's great that they're the Sumo Logic collector is open source.

Hopefully they continue to invest in that.

I know that a number of companies are frustrated sometimes with missing log events and that the agent can be consuming considerable resources just to consume all those logs and stuff like that.

So I think the more people looking at it, the better.

No, actually they're going to be.

I think there's no question that they're less interested in investing in the collection because they don't really see it as their you know the installed agent.

So maybe relying more therefore on third party agents like fluency Yeah bloated that makes more sense.

Yeah Lee and Prometheus are huge parts.

Although interestingly, they have a relatively limited participation in those projects.

But yeah.

So in terms of things being recorded.

So far everything's fine.

They've been very transparent as far as I can tell about that.

But I think that what would work well for me is someone who really wants to get as much adoption of their product as possible.

So if you guys know anybody who is like super aggressive about developer outreach and making sure that their stuff is easy to use, and it works well Yeah, I'm kind of in a little bit of a bind quite honestly, because I think you know for years at Yale for years at Google three years, it suddenly changes somebody you know it's just not the same.

And I find that if people aren't willing to get into continuous improvement.

And if they're not committed to excellence.

I end up getting into trouble by making suggestions or rubbernecking right away.

Well, I think one thing to look look out for though, is if you do want to work more with open source is look for a company that started that way rather than trying it out to see if they could get more customers.

That's such a good point.

Thank you.

Because in the form they start out that way it's built into their DNA.

You can't really undo that.

But the latter.

It's kind of like instant zero.

Look at that sounds obvious.

Yeah Yeah.

So there.

I think there's a number.

Well, one company comes to mind.

I don't know.

You can check out cube cost the cube cost general see if they're looking for anybody.

They have open for being there and doing some interesting things.

Just hearing a lot of fair winds as well.

Yeah fair ones as well.

Yeah core product base camps coming out with their new hey product for email.

So And I don't think you get more open source than the creator of Rails.

Oh my god.

I would give my left nut to work for base camp.

They've been hiring for a while.

So you might want to take a picture put that on the billboard.

All right.

So any other questions related to cloud parsing repos or DevOps in general or best practices or surveys.

Do you want to get a pulse on what other people are doing.

There's a great chance we've got about 70 people on the call right now.

So haven't been in the last three weeks standing up some Amazon queue isn't particularly around network brokering.

And I tried to use the cloud Osi model.

But I couldn't really control the config file in any way.

And I'm having to write a lot of custom tooling around setting variables and making it result in Excel that doesn't blow up.

So yeah let me talk about that in queue module just for a second.

So all of our modules are borne out of actual engagements customer engagements and then we open source that we have this kind of open source first model where we start the modules open source.

This This the use of active Q was for a enterprise Sas product that we were running on prem.

And it didn't work with the it turned out not to work well with the Amazons and service.

So we had to cut back on it so that you know so therefore, they continue to invest.

We haven't had a reason to continue investment on that.

But I will say maxime on my team we had two weeks ago or three weeks ago, we had 130 open pull request against our tariff modules.

And I think we've gone this down to like 13 or something.

So if you do want to spruce it up.

Do you see any ways we can improve it.

Let us know also in terror.

Let me see.

My guess is that module is still each sealed one not each sealed two and some of the template.

Some of the template file manipulation was really basic right in each cell one.

So if we wanted to do any more advanced parameters of that file you would have not been feasible in each cell and one with a CO2.

Now I think it's totally feasible.

So we could have a better, more powerful config that you could pass there or just provide an escape hatch and that you provide the raw x amount.

That's helpful.

I'm not directly familiar with that module right now.

So I might be misstating some things.

But it didn't clear anything up or there additional thing a feedback you have on that.

And that was pretty much it at this point, I've had to pretty much read everything from the ground up.

And if I can figure out any ways to piece it any of that out of there.

And send it back.

Your way, I'd love to do that.

Yeah, for sure.

I feel free.

And this goes for anyone here.

If you have anything you want to contribute back.

You're not sure about the next steps to start on that.

You can always reach out to me on the sweet op slack do you have to join the black team, by the way.

That's a good chance to promote that for a second.

So if you go to slacked suite ops you can join our Slack team.

And then my name's Eric on there you can find me Eric cool Casey Kent asks in the chat common patterns for machine learning infrastructure for continuously training ingesting data and ETF.

There are there's just a ton of stuff out there.

But it'd be nice to hear what you suggest.

So I can't speak to this personally as a subject matter expert.

I can describe a pretty common architecture pattern that one of our customers is using at a very high level.

But I'm not sure if that's even valuable.

You probably already know to that degree.

What I would say there's the whole suite of obviously Amazon's products for content for training the models for machine learning.

We've not touched or looked at it.

Maybe the people here on the select team have been more with us.

Anybody have some context said I zoned out for the beginning of our question.

But have you checked out completely at cloud plaza has not yet worked in Q4.

Yeah, there's a bunch of different UI or API centric different tools.

I did a sort of Kubeflow workshop at a meetup at some point.

That was the extent of my knowledge and thought I was pretty useful for that beginning part.

And then one of the things is you can plug-in different platforms for how you want to host it.

Once you get the model built specifically models that get retrained to lots like marketing models that have seasonality that you want to run a refit over and over again, something like an investment and composing to make sense.

But if it's model you train a few times, then there's like dozens of different ways to do it.

None of which I've been super excited about.

But definitely if you're not sure where to start with your Q4 itself for a typing Q4 versus then you'll see all the other ways, it seems like a good start.

Yeah, I had one thing I mentioned about conference in town.

Yeah scale.

I think is this week.

Oh, yeah.

Thank you for bringing that up.

That's a good tip.

So if you're in Los Angeles or you're close enough scale is happening towards the end of this week.

I think it starts on maybe Thursday.

Yeah And runs through Sunday.

And then there's DevOps days on Fridays.

I'm going to be a devil these days this Friday at the Pasadena convention center.

Pretty much all day.

So if you're there, please hit me up on Slack and I will find a time to meet up for coffee or hang out.

Are you going to go Todd.

I'm not feeling well enough to go bummer.

My kids at home.

I'll be there Friday and Saturday.

Who that.

Sorry I'll be there got an awesome dog.

Thanks for letting me know.

Dude hit me up on Slack.

If you aren't around.

It's enough.

I mean, you bring up a serious note there, though, that a lot of conferences are being canceled like they're dropping like flies right now.

The conferences and Google canceled their ads and you can Amsterdam just got canceled this morning.

Oh, really.

Yeah delayed Kucinich three months.

Yeah some of exactly some of them are postponing them or postponing indefinitely.

So it's too bad.

I'm going to take my chances and see extreme isn't there.

Probably but the you know bless their hearts.

The scale team works for basically no team no for no team for no pay.

And a very minimal minimal budget for a conference of that size.

Some of the some of the equipment for that reporting is pretty dated.

Eric it's Adam Watson.

Hey not to add anything but Pasadena declared a state of emergency an hour ago.

So just a heads up.

All right.

Hopefully that doesn't need like the messenger.

You automatically have jurisdiction to cancel all conferences and stuff.

So Yeah, that's worth checking out to see if that's going to affect scale at all.

Yeah, just said that that was an hour ago.

Figured I'd float that.

Yeah, thanks Adam for bringing that up.

Nobody shoot the messenger.

On the topic of events.

I think nobody in this group is in Boston.

But if you know any people in Boston.

I'm related to knock at a stream that they try to record the talks as well.

So if I can record them for a friend observed 2020.

It's like a CMC s open telemetry related event that a friend of mine tried to put on.

So it's April 7th.

So my hope is that we can get through the like curve and then it will be back down by then but we'll see.

Worst case, we'll try to figure out rescheduling but the link in the observer shot.

So what's that what's that 24 hours of DevOps conference.

Forget what it's called that that might be our future.

What was that thing.

And that was like in December or something.

Yeah And it was last like November last year all date have UPS.

Yeah, there's a couple of those not related to dev apps that have done the far thing.

Not not my type of conference organizing for a.

I like sleeping occasionally.

Yeah And I do like meeting people face.

Actually I mean, honestly, the reason why I go to conferences is to talk to meet the people and hear their stories less the actual talks themselves.

All right.

Any any other specific questions or otherwise maybe I'll jump into practical tricks for change management and get your feedback there for what you've done.

Let's see here.

No, this came up.

I forget who it was that asked for some ask and asked the community at large kind of what you're doing for change management and change control.

I wanted to kind of inventory those tips and tricks to provide guidance because I think just saying, you know just using GitHub isn't enough just having IAM policies isn't enough just having cloud trail audit logs isn't enough.

So what are the things that you have in place for change control and here's kind of a list of some of the things that came to mind as a common best practices today.

So I guess the obvious thing Like, is to bring up obviously having a version control system.

This is your get out.

This is you get lab or bucket.

This is what allows you, if you're practicing infrastructure as code, then to point to the code that should have resulted in a change along the process here.

The next one being infrastructure as code defining the business logic of your infrastructure and using reusable modules for that.

So there's one thing just to write infrastructure code like raw Terraform resources.

But then I do want to capture that a module like a tier from module or help chart is a discrete unit of business logic, which you can kind of sign off organizationally on that this is how you do things.

And then reduce the scope of change control when you're using reusable components there, especially ones that you've signed off on in the organization automation.

Obviously taking what you have now in source control and having a way of getting humans out of the equation because humans are difficult to automate but source control is easy to audit and thinks that anything that is machine control automation, you can continuously refine and improve and have controls in place.

Pull request workflow.

So basically how you enforce that every change is reviewed and approvals on that and related to that.

Having approval steps within your pipeline.

So you might have all the checks and balances in your get out with branch protections and code owners requiring certain checks to pass and a certain number of reviewers.

But in the end, you might want to have still additional controls that are arbitrary and having the ability to have approval steps in your pipelines is an excellent way to have control over when things change and visibility when they change notifications.

I'm sure everyone.

I'm sure a lot of people here are already sending a lot of this stuff to slack.

One thing that we've really liked.

I was surprised how much I liked it was the ability to add a get up comments on pretty much any comic shot.

And then you had that history there.

So if you have a pull request, you can also comment on that on commits and see when that pull request windows commits and that request was deployed into what environment.

So that provides a nice living record changes.

So as I was talking about earlier is kind of using branch protections.

This is very, very, very, very much key to enforcing when stuff change.

So this is something GitHub supports very much.

I'm not I'm less familiar with get lab in this bucket.

Any users here using get lab and big bucket.

How much of the branch protection functionality do they do they have compared to get a bit assuming open source are paid both.

And if you can make the delineation that be great between a recall.

Yes So my expert with lab is that it does actually have the enforcement.

I think can actually set up by bit by default.

Then it starts with oh you can set it up organizationally.

That's nice.

Yeah, that really sucks that.

You can't do that with GitHub.

Yeah, I believe, get lab.

So I have the most experience with on prem get lab open source and I believe I believe free.

Get lab is actually different.

It gives you more on prem get lab open source gives you you can't merge if the pipeline hasn't passed.

But it does not give you a pull request approvals.

No you've got to pay for it.

Wow you've got to pay for the poor credit approvals.

It's the very bottom tier it's only like $4 per user per month.

Yeah, but you got to pay for the poor credit approvals.

I'm not sure about the get left.

So what.

OK, that's good.

That's good.

Does that does get lab have the concept of code owners.

I think that's a good thing, isn't it.

Isn't that just to get thing.

Know what I mean is while entered.

But it's got to be enforced at the pool request approval.

So code owners relate to approvals.

Yeah Yeah Yeah.

OK I got.

So get lab does allow her branch merge protection controls who can merge you have maintainers, developers and maintainers or no one was a role.

And then you also have control who can push through it.

OK with the same role.

Yeah, that's Yeah, that's correct.

Absolutely Yeah.

So code owners.

Is this where you can basically have a file.

And that file will map a team to a path on the file system of that repository.

So you can say that anything in your Terraform IAM project, for example, has to be signed off by SEC ops.

Example get lab does support code owners in the bottom tier of.

Not freedom.

OK, cool.

So in the starter or bronze tier, which is the $4 a month per user.

So the next step is kind of everything we've discussed so far is a little bit at the mercy of your business solution.

Then we get the ability to enforce policies and policy enforcement has been a really hot topic getting a lot of attention especially towards the end of 2019 and I think it's going to even get bigger.

Now in 2020 with tools like open Policy Agent contest, which builds on p.a. and TFC like the tools at your disposal to enforce broader level policies that make it easier to administer change control at an organizational level are reaching greater maturity still early days, but it's at a point.

It's usable now.

And some great videos and demos out there of it.

In fact, we have one on top second.

A basic example that John whipped up will link to John were you.

Did you want to talk about sex today or does she just share that video.

It's up to you.

OK, I guess she another team to an flat.

Like a show.

OK big question came I forgot to ask.

But yeah.

Can I show a quick example.

Who any users interested in seeing a demo right now of a t sac t opsec is a purpose built static static analysis tool for Terraform to enforce policies on your code there.

And using that together with action.

All right.

We got a thumbs up from Adam.

Yeah, sure.

OK before I do that, I would actually add a line to use version pinning that using sender when it comes to some things like Helm charts is not even enough.

You have to use shots.

But that is a valid point.

Let me just add that to Ken you can you write what you said in the officers channels so I don't forget it.

And I'll update this with that with some of the caveats there because there are some caveats like timber is only as good as the maintainers ability to practice it.

And the problem with like Helm is that many maintainers don't actively bump their members.

So they're constantly squashing their version.

And that's the problem.

Like I could push up, one that one.

And you can use it.

And then I could push up another one that one that one with changes.

And you could add in another good point here is that symbol is not cryptographic fully secure versus using Sean's are.

So it's much.

It has been shown you can if you are really bent on Messing people up.

You can probably find some version of a history to cause duplicate Shaw or something.

But generally, it's secure or as John recommends you can tag if you're using some tagging scheme Yeah.

Plus plus the Shah.

That might.

Yeah So that's about this endeavor to add something to the end there.

So the hard thing was some very especially if you're looking at a repository is knowing which one of those is the specific version that I want and putting the version number in there kind of makes that a little bit easier.

You still have to dig into the specific child.

But it can help by tagging on the shot in.

That's good.

Thanks Yeah.

Thanks for telling me that.

It's more of a security topic than it is a changed man.

Yeah, absolutely.

It's just a question of yourself.

Yeah, I think it's hard to have one without considering the other.

Oh, sure.

Damage control.

So this list here is not exhaustive.

So my this was I whipped this up in about 20 minutes.

So if anybody has you know points out what you know things that basically, I want to add to this with things that you're doing and recommendations you have.

So please, if you maybe add to the thread here.

There's a link.

I posted with the change management in the office hours channel.

You add any suggestions.

There is a threat.

I will try and incorporate those into this.

All right.

So John, are you setup.

Awesome I got to hand over the reins and we're going to get a little nice here.

You know accidents.

This was this is unscripted and unplanned.

So forget it.

We will thank the dental gods for a successful.

Yeah, exactly.

Cheering so I kind of wanted to go through.

I guess I should share here right.

I kind of wanted to go through in terms of what TFC and Teflon are kind of showing the actions here as opposed to waiting for the actions to run and things.

But kind of speaking through that.

So one of the questions that came up in the chat after the video was posted was about using TF land as opposed to TFC.

And so you have sick essentially as static processing static analysis for you Terraform.

So it has a set of rules.

It's not super exhaustive, but it does have a pretty good set of rules of things that you want to watch out for, especially along the lines of like security groups security rules out on the internet.

So I have some basic Terraform here.

This meant to fail.

So I have a spider block that's wide open.

Nothing really specific there.

CDP missing some configuration here.

And this and Azure managed disk is actually set to false.

So in running t of SEC here and expand this a little bit.

It basically looks at my code and determines hey, this site or blog actually should not be wide open.

This actually you should use HGTV as not ETP this one here is actually missing a VPC configuration.

And this one needs to be secured or encrypted given out.

But there are those times where you actually need something like if it's a web server right.

You need to actually utilize this open CIDR block here.

So it has the syntax to where you can actually tell it to ignore one of these one or more.

And you can see it actually is missing that one.

Now Same thing with you.

Laughs thanks you too.

Yes things like that.

But what you see here is that there wasn't a catch on my specific t linked code over here that is utilizing a T12.

So the T12 extra large is a size that does not exist.

Right So if I run t offline here it doesn't catch any of these security issues, but it did catch that my specific instance type is invalid.

So I think there's not a direct one to one comparison to say, hey, you should use Teflon instead of TFC.

I think they both are useful for different purposes, even though there may be at times a little bit of overlap.

But as you can see, I'm using a ws ball.

I just have a shortcut to a B because I don't like typing a lot.

But if I put in like I did wrong.

Am I. It actually is talking to a ws so I like to set my region and all that.

So it's actually talking to a ws and looking to see is this a valid.

So there is a little bit of a cost here in the sense of like speed.

So if you hook it up to like a premium it hook or something like that.

It may take a second, depending on how big your actual tariff on project actually is.

But it's definitely a pretty good.

How well does it work.

If you're using like almost exclusively modules and stuff like that.

I think it's actually operating at the resource level it does.

But they actually do have modules support.

And I started working to kind of get this up and I'll add another video that goes through these fully, but it actually can check a module to see if the actual types exist and actually go into the module to make sure that the resources in the modules actually are valid.

That can have a little pro and con depending on the open source module.

But you may be using.

There may be some issues there.

But the good thing is that they do have this ignore flag that you can use there as well as a full ACL config file that you can use.

And you can tell it to ignore certain modules in here provider of our files variables specific credentials and also tell it to disable certain rules.

But the rules are actually, I think it's 700 or something rules.

Yeah 700 plus rules wow that's a lot.

Yeah Yeah, that's good.

That's cool.

Are you able to add additional rules.

Yeah, they actually have a way to configure it actually didn't go through that part.

But there's a way to extend it beyond just the basic configuration.

Yeah, they give you get an action run that you've teed up.

We're almost out of time.

I think so.

I'm sure we have time to go through the full run.

But I do get some and so I have this one for the TFC SEC which was basically the same thing that we just looked at there.

So now we actually go to the other one.

So let's see a seconds here.

This is one of the passing runs.

It's very simple.

It's not a lot that is happening.

It's basically just running TFC.

But the configuration usage is basically just this.

There are configuration changes that you can add in here variables those sort of things.

But this will pretty much run the 45 SEC on the current directory and let you know whether or not it's passing or failing.

You can do it on the PR as well.

Of course.

But it's very useful to actually get a heads up that something actually happened at this time.

I know I can't see it because it's not signed in.

And this is browser.

But it does give before output.

So related.

Also relate to this you keep your training to SEC repo that's open.

That's public right.

Yes Yeah.

Yeah All of those are public.

Yeah So this is.

So we'll share that officers.

This is the full output here.

So yeah, it's very useful.

Very quick to set up.

Very easy to use, and it definitely will help catch some of those issues that you just may miss.

Where I see this being exceptionally valuable is if you're practicing a traditional get workflow where you deploy on merge to master meaning that I've already lost the recourse to make any corrections by the time you've already merged and you want to mitigate the failures after merge to master.

So I think this is a really nice way to avoid that.

If you're not applying before merge to master, which is like at length.

This workflow.

Exactly And especially with, like TFA land here.

If I just fat finger that.

And it's not like some malicious issue whenever I run it, I find out, oh, I actually have an issue here before I go through and apply it just as useful to find that stuff out as early as possible in terms of like a software development cycle.

Something else that I've looked at in this space.

I haven't had a chance to use it yet, but it looks very interesting is you can use opa to evaluate Terraform yellow and contests.

So the contest builds on opa in a more opinionated way as well.

So that's kind of cool.

The example, they gave us is actually kind of a useful one.

Yeah the example, they give is you know your cut.

You have decided that you don't want your Terraform scripts to be too big and you create an opa policy that says you're not allowed to create more than x number of resources with one Terraform apply and opa runs in your pipeline or whatever.

But before you apply and can actually stop you from.

The term you like to use is blast radius.

You know your blast radius has to be smaller than a certain limit.

Yeah which actually has as I work on it because I work on the first apply where you may be generating it.

It creates it looks at the plan.

It doesn't look at the Apply at like it.

It will create a play a Terraform plan and then look at the plan.

Yeah why does value.

Well, I like it.

I wonder how well it works in practice.

But yeah in principle, I like it.

The reason why is like you using a lot of modules and modules you modules modules and it's very easy.

The plan doesn't care.

Well right.

When you create it when you do Terraform plant even if you're just doing modules.

It's going to tell you exactly what's actually being.

Yeah but it's created the convention.

But usually I'll have 25 resources, at least getting graded if I'm using a module, for example, anything that does something more serious is going to be creating a lot of resources.

So the cold start problem like once it's been provisioned I can imagine that we don't want to see too many new resources created all the time.

But let's finish it up from scratch.

I wonder like I agree with small pull requests.

But even a small pull request can have a big plan.

Yeah, we're almost at the end of the hour, we got 5 minutes left any questions.

Related maybe to the staff for the security of plants of John D. What would you say the learning curve is on over here.

I can't speak from firsthand account.

It's in line with the rest of the industry like to pretend it is.

And it's high along with everything else in there.

I think it's very readable.

But like any Arrigo reads it.

I think the hardest part about it is developing opinions that can be codified.

Into policies, not what like.

We know we want to do it.

What should we actually be doing.

Like that's the hard part.

The language is super readable very easy.

You'll pick it up in less than a half hour or something.

But get that with the same Walker actually you know where you to look like predefined plots.

That would make sense.

Like a use case.

Yeah, that was right where that's done.

Start small though.

I mean, we're starting.

We haven't really used it much.

But the first thing we're going to do is we're going to use it to create a mutating admission controller that all it's going to do is check that every pot in the cluster has a label with a Charge Code so that we can track back for billing Reverend.

That's all at us like that's all it's going to do.

Yeah quite a small win.

Let's go.

And that kind of a policy Agent would deter the use of the policy is a little different, right.

Because you wouldn't be doing that at Ci would you.

Well, you're doing this that that would go into a mutating edition controller.

Inside the cluster.

But you can use that you can use up it that way.

Mm-hmm Oh but opa can be used.

I mean opus plus great.

OK You can use Rigo with opa all over the place.

Yeah, it's I think he's giving some enterprise vendors a run for the money.

We're trying to do policy stuff.

Why I could tell you for sure.

Hershey corp. sends the sentinel.

It's very much so kind of locked in that same vein without Opie.

And I don't use it much.

But probably much for other issues.

OK Just scanning it real quick.

There's the home page open Policy Agent dot org has a tiny little example for an emission controller.

It's eight lines one right.

Yeah, it's eight lines that checks that it's eight lines and in those eight lines it checks that all pods come from trusted registries.

Wow That's good.

That's great.

That's a that's a really that's a great policy there.

I can see it to have.

And similarly to that it would be like even if you need to use public images that you're pushing them either to private ones or you have all through to you that entered in their graphics specifically how armor.

Yeah all sorts.

Yeah And it's just a common engine for it.

I like that coming was pretty good.

And because it's a resource we can all build on it as a community just like Terraform and Helm registries and stuff like that.

All right, everyone looks like we've reached the end of the hour.

And that about wraps things up for this week.

Remember to register for our weekly office hours if you haven't already, go to cloud posse office hours.

Again thanks for sharing everything.

John for the live demo there of the SEC and tee off lint stuff that was really interesting recording of this call will be posted in the office hours channel and syndicated to our podcast at podcast.asco.org dot cloud posse.

See you next week.

Same place, same time.

Public “Office Hours” (2020-02-26)

Erik OstermanFebruary 26, 2020Office Hours

Here's the recording from our DevOps “Office Hours” session on 2020-02-26.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours.

It's February 26, 2020.

My name is Eric Osterman and I'm going to be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator.

We help startups own their infrastructure in record time.

By building it for you and then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to unleash yourself at any time if you want to jump in and participate.

If you're tuning in from our podcast or YouTube channel, you can register for these live and interactive sessions by going to cloud costs office hours.

Again, that's cloud posse slash office hours.

We host these calls every week will automatically post a video of this recording to the office hours channel as well as follow up with an email.

So you can share it with your team.

If you want to share something in private that's OK.

Just ask and we'll temporarily suspend the recording.

With that said, let's kick it off.

So I have only two talking for you today, one of which we had been on the docket for the last few weeks.

We never got to end one you won which are some practical tricks for change management ego or on the slack team.

Asked about this week.

So I thought I'd just jot some notes down.

This is the draft is not finalized, but these are just some of the things that we do when we work on our engagements with customers.

So with that said, I can turn it over to everyone who's attending.

Are there any questions you guys have related to terror from Kubernetes cloud policy DevOps you name it.

Maybe me.

All right.

Go for it.

I mean, I ask is think anything actually or like this mean more or less.

I mean, let's try and keep it topical to you know what.

What we do here.

Yeah suite ops.

But yeah OK.

Yeah Security right.

So actually they've written a course is sweet, sweet upstairs because I was googling these questions and your website is quite well positioned.

Google's who I am.

And when I was looking for things like what I mean.

If there's any tool out there that can help with the automation of software deployment to devices like, for example, let's say like I have like a session is out there, then we have software like for example, like a couple of repositories of love.

And it's a bit too chaotic to keep all those devices in the same way the same thing like I'm always running overseas with different software because then go offline and then go online again.

What I have right now is I have an unstable people running continuously on a loop, which is not very efficient.

I believe that it must be tools that can make this easy or maybe they can provide some kind of best for or graphic interface can help me keep track of fixing.

And like there are so many tools out there, and they have no idea what could be better than my incredible Look.

That's a good question.

So just to recap what you said basically, you're looking to learn what are some of the strategies for managing deployments and I.t. scale across lots of devices.

And to date you've just been using and simple in a continuous loop.

So I.t. is not a specialty that we have a Cloud passes.

So I can't really speak to that.

I do know that Amazon has some offerings specific to IoT and I believe just recently, they announced another offering related to IIoT and deployments.

It was just this past week or something.

I think I'd have to look it up.

But maybe there are people on the channel here have other insights anyone else here doing IoT deployments or any on armchair commentary I'm not doing.

I'll try my video on.

I'm not doing two deployments right now.

But I did interview a company that had hundreds of thousands of devices doing energy monitoring.

They wrote everything from scratch and had all of their energy devices pulling for updates.

So it's a very interesting like metrics system, because you get partially things that hadn't checked in a while like pop backup on.

And then be really old and have to get updates.

So they always had to maintain some update process from older stuff to like from their first initial stuff to the latest they ran into problems that they built all of their code and go.

So it was a single binary that we got shipped.

I can't remember exactly their process for shipping it.

But yeah yet unlike maybe it's just because people have different.

There's a lot of different chips you know embedded systems is very kind of like Android diverse operating system.

It's something that every company seems to do it a little bit differently.

But I would just like check out the bit like the cloud offerings for like their I.t.

Edge Gateway type stuff and see if there's any documentation on how they like getting started tutorials there.

Can I ask what is your I.t. system like what are the devices doing and how complex is a structure.

Yeah, thanks a lot.

Thanks for your answer.

So our system is very similar to what you would do this code before.

So we have energy management monitoring systems we have devices receiving data from sensors all these in a sense, to our platform hosted on a number of us and basically, that's what it must be that we keep all these devices connected through different to networking possibilities it can be LTE 4G 3G or ethernet or Wi-Fi because these devices are placed on factories sometimes these factories in remote areas in mountains.

So we know how internet connection is very likely that they only connect one to the.

And this is why we need to look for a solution that is going to realized when those devices.

Again or line and deploy and sometimes these two who were in the loop not fast enough to pick up those devices.

You also have to be worried about what happens if you break the device right.

That's always the thing I'm so impressed about with things like I run Sonos for example, at home.

Imagine if they send out some update that breaks you know two million devices.

Oh my god.

Yeah, I broke a couple of devices that I have to go there to this one thing the next week replacement.

But then knowing me doesn't happen anymore.

So I'm happy about that.

Yeah, I know what you mean.

Right you joined suite ops like team.

Yeah Yeah.

Yeah Oh.

Are you in the office hours channel.

I didn't know what your slack username was.

I guess I didn't work.

But make sure if you're in the office hours channel, you'll see the link that I just shared there, which was what I was thinking of at least.

But no experience with it.

OK Yeah.

I just saw the link.

Thanks for sharing.

Yeah Cool.

All right.

Any other questions from anyone.

Thank you.

I have a question that I posted in the doctor channel yesterday about.

Let's see.

Yeah, it's basically about I see Ducati just like eating a whole lot of memory, especially when we don't run deploys very frequently, and we're using the version of Docker that is in Amazon.

A young.

I don't know if that is different to the mainline version.

But yeah.

There seems to be some sort of memory leak.

We don't think it's coming from our services because it's Ducati that ends up eating up all the memory.

So I don't know if anyone else have any experience with that.

Yeah anybody seeing Doctor D memory leaks lately.

Yeah here is the message in the slack team.

It seems to scale with the number of requests that they're receiving to the doctor Damon.

So I mean, it seems to guy with the number of requests we see to our services.

Oh, god.

You got you guys I know.

That's why I think it's I mean, like log related because we like to stand it out and apply elsewhere.

Yeah and you don't see it.

Recover So it can just be a buffering thing.

No, it just goes up because up and up until we start off the day.

What so from first hand experience the up and up and up thing is sometimes still up and up and up to a point, and then it will recover like with Prometheus.

We thought there was a memory leak.

But it turned out that it needed it for in our case, it needed a minimum of 12 12 gigs of memory.

But we set our limits what we thought was still pretty high.

So interesting.

So maybe it's just not.

Maybe it seems like a leak.

But it's not high enough.

Yeah, I guess I just had this interesting go.

When we do redeploy.

Like it drops down dramatically and then when you redeploy the or demon itself.

Yeah, exactly.

No Yeah.

So it starts off.

Yeah Yeah no, no first hand account.

Just gut says that maybe it's probably not a leak.

And just buffering.

OK, cool.

All right.

Any other questions.

Yeah, this is a hurry and we have a different kind of scenario.

We use commenting on fights between multiple deployments.

So we are trying to find the best solution to start that.

Come on and get on with fighting between multiple projects.

In fact.

Yes, certainly.

I mean, I'm sure there are a lot of ways you can solve this.

I don't know about your specific use cases.

But I mean, you're saying environment files you're actually.

Like files that have environment variables in them.

That's got to.

Yes And so.

So the good thing there is your applications themselves support environment variables.

Now we just want to maybe consider an alternative interface for passing those settings with environment variables.

Are you familiar with a tool called chamber by segment.

Oh no.

This is new for me.

Yeah So this a great little tool to be aware of.

Are you on Amazon by any chance AWS are using the different cloud provider.

No, we don't use ws.

OK, sorry.

So my advice here is.

OK So this particular tool, I was going to recommend doesn't really apply to what you're doing.

But the pattern translates to another tool.

I'm going to share in a second.

So just for everyone else's sake, I'm just going to explain in 30 seconds where chamber is if you're not familiar with that.

So chamber is a clay tool.

It's been written and goes.

So you can download a single binary and what you do is when you call chamber you pass it the SSN like service or SSN namespace for your environment variables like production.

OK, sorry.

This is exact.

This is not chamber here here, where is chamber being used chamber exact.

So you see you call chamber exactly.

And the service name and then your command.

And then it'll export all those environment variables from that service.

But you can add any number of services there just listed separated by spaces and it'll concatenate or it'll merge those service name spaces and the environment variables.

They're in into one overwriting them in order.

This will help you.

So basically, if you defined a service namespace for your apps you could then very easily share environment variables between them.

So you said that you're not using that are you.

Is this bare metal or are you using a different cloud.

We use cloud.

OK, which cloud.

What provider Oracle appropriate for.

OK And so yeah, that's definitely outside of our wheelhouse.

But if you're using hashi court.

Do you guys have hashi court bolts.

No, we haven't started using it.

Do you have console.

So console by period has she caught actually the one that or what we are using is we are mounting we are creating a pass system to our news cycle that we're keeping these files inside the content of the amount.

OK And then we are using ocean command to source sourcing as in a number of Yeah.

So I mean, that's certainly a common way of doing it is having a shared file system like that.

But I mean, it does put a lot of it makes that file system, your central point of failure and traditionally scaling the storage is a or in the I/O and that could be or the availability on that could be trickier.

So console is used for service discovery, but also basically sharing configuration or settings in a highly available manner.

I believe he uses the raft protocol for consensus.

It's relatively easy to deploy.

It's very common in enterprise and other kind of settings.

If you don't have a console today it sounds like that might be a gap actually in what you guys are running.

The reason why I bring up console is that if you use console together with a tool called end console, you can you can achieve the same outcome as you can with the chamber commandos telling you.

So with any console, you can you can have these shared settings that are distributed in this highly available distributed key value store console.

And then expose those as environment variables to your commands.

So it's not exactly.

Maybe the answer you were looking for.

I mean, if you're going with the like the NFL route, I would just generally avoid the NFL draft if you can from a best practices perspective until I guess I have a related question.

If we were to used chamber.

How would you get those secrets into like running powerful.

Oh, yeah.

So Well, you have two options.

That's a great question that you ask.

I will.

I'll give you two options for using chamber would Terraform.

And I'll show you kind of the progression that we've taken a cloud passes because you know we also learn what works and what doesn't.

So for the longest time, what we were doing is just setting all those settings in chamber as the same parameters.

So let's see if we go back to Jimmy Carter.

So Terraform supports the use of environment variables for setting the values of Terraform variables.

But there's one annoying thing that Terraform does is so Terraform requires that Terraform variables look like a window here.

So if we were using Terraform for example, you'd say like export TFR foobar equals 2, three.

And now foobar will be available if I call Terraform plan or something like that.

So when you're using chamber you right.

Chamber right TFR you know foobar 1, 2, 3.

Then you can now call a chamber exactly.

I did this slightly once.

It's going to be chamber right.

My namespace.

So let's say prod and then TFR foobar and then I think it's 23 I might get the syntax slightly wrong, but you get the gist.

And then chamber exotic CRUD Terraform plan what this is going to do is it's going to fetch the seat fetch the variables from the prod service namespace and export them before I call Terraform plan.

Now one thing that to be aware of with chamber is that it automatically normalizes the case of everything.

So actually what happens is these things become TFR foobar which means that in your variables file in Terraform what you have is like variable foobar typically you would have something like this.

And this is lower case.

Well, if you're using chamber here's the thing that sucks.

You got to call this upper case foobar but there are two ways of looking at this.

One is that it makes it very clear that you expect this to be set as an environment variable and upper case environment variables is more or less a standardised convention.

So that's one way of looking at it.

But so iCloud because I didn't like that too much.

So we wrote a small little utility called TNT at the end is not a Version Manager TF end works like the end command and TAF end will map will re export all of these environment variables in a way that is consumable by Terraform.

So I just I'm just presenting this as an alternative way.

So if you're using TFA n you can you can use it together with chamber and then past the environment.

All right.

I'm happy to go into more details on this.

This was a little bit hand wavy but the reason why I want to kind of skip over it is I want to point out that if you're using chamber and you're using Terraform there's almost no reason to use them together at the same time.

And let me explain.

So all chamber is doing is reading and writing settings to assess them well Terraform supports SSL natively.

So if you use the Terraform data.

Yes, that's the same provider and provider resource you can just fetch those parameters directly from SSL.

So here's an example of reading a parameter named food from SSL.

If you do this.

So basically, what we're describing is something similar to what could be achieved with remote state.

And Terraform where you're pulling the outputs from some other plan or some other process.

But the difference is because it's an SSN it's usable by all kinds of services, not just terrible.

So of those two solutions presented.

Is there one you'd like to know more about or it was that in sufficient detail.

I think that's a great starting off point.

Thank you.

Get your we also it's a little bit out of date.

But policy question or do you just search our docs.

We have kind of how we used chamber here.

It's more or less current.

But because the interface.

The chamber hasn't changed too much.

But we explain a lot more of this here.

All right.

Any other questions I can help out with from anyone.

I can ask something that I probably already have part of answer to.

But no extra validation might be good.

So we just had a competition on our team about secrets.

We've recently started to put secrets in vault. I related to a previous question here and we recently deployed a whole bunch of stuff in home and got upset with him and pulled them out, and they're now playing our own stuff here customise and then other stuff your home.

It seems to work in OK fashion.

We've got Argo city shipping those which is pretty sweet.

Really like what we've let it go.

So far.

Thanks for the help.

Previous questions really did that work.

But are the way that different applications want secrets.

Some need them as variant variables Garcia and there's a lot of volume and others we've like injected them into and very variable variables not like through there.

So we have three different paradigms for injecting secrets out of all.

Yes, there is.

And that's what I wanted to ask.

The group is sort of like, OK, in those three situations.

What are the bad things that I can do.

So I can write best practices like for example, we just changed a bunch of our not in secret volumes, which are.

I think my current preferred version to do to have them in memory instead of on disk.

I forget the weird incantation that you put something called a memory in to get your mental volumes to be a memory.

But is that better in memory or worse in memory.

How else can I make it harder for people who have maybe broken and took a bit anyways or found an excellent improvement.

It's just not published and make at least a little harder for them to get the crown jewels of our kingdom.

Any opinions anyone.

Andrew Roth on the call.

Yeah So I'll chime in.

Yes, I think first to recap kind of your question here is that you bring up a good point that we so we talk often about secrets management, and like there is kind of like a canonical way of doing it.

But you're really ultimately at the mercy of the applications you use.

So if you're using some help chart you're at the mercy of how that help chart was written, how it was managed its secrets or if you have your own custom built in-house applications.

Yeah, pretty much every option available to you.

Because you are in control of how that you work.

Sometimes apps support environment variable.

Sometimes apps support configuration files.

So the reality is of secrets management is that one size does not fit all in the reality of integrating lots of different software.

So I think that what we should think about is prioritizing what you're doing for your internal apps and then going down from the other consideration with all of this is local development like one can create some pretty robust solutions for managing secrets.

But it's also going to complicate how you're writing that code or using a code perhaps for local development.

And when I say that it's kind of pointing to certain native bindings to AWS or I as to how she caught VoLTE and to like ESM Amazon Secrets Manager.

So if you haven't if you're making your app natively talking to these I think on the one side you're getting a better, more secure application.

But you're also vendor locking and making it harder to develop those things locally versus using environment variables or perhaps a config file for Yeah.

That's a very valid point.

When you go.

I just never go back.

So one thing that I've seen done is to give every developer access to it like a development namespace with some dummy stuff.

So when you do at home to play in a specific environment of things, then you can pull secrets from dev in the same way on your menu cluster that you would in any case cluster from your product thought namespace i.e. it's going to be a lot of tooling to get that to work smoothly.

But that is what I guess our team is doing right now.

I don't know if it would scale to the whole enduring thing.

And then like a lot of the other apps we have like a separate kind of branch like non-production branch of how do you run local development right now that's like Docker Compose file that has a file that gets put in the right place.

And then the app is happy with that.

But we've got a lot of fragmentation because things are done differently across different microservices, which is concerning to me.

So yeah, the.

And then the downside of I think putting it as a very verbose is if you inspect a container you can get most of those you can, which is the downside.

So I mean environment variables are not the most secure.

They're just the most portable.

Yeah but which is you're always weighing pros and cons right.

And maybe.

So I guess let me rephrase that question.

So it sounds like you'd prefer American verbose over.

I'm not just for portability which is.

And I've also been on calls with spec ops teams, which bring up exactly the point you talk about like environment variables are a best practice.

The other 12 factor pattern.

But the 12 factor pattern is not necessarily a best practice for security.

So these can be at odds with each other.

So I suspect if you're at risk since you're running a few more Kubernetes clusters and we are that you've also thought about other spec first pieces that we don't really have.

I mean, we send logs and like and metrics to write logs dysfunction metrics in Iraq to my dismay of open source lack of open source and we've got some metrics on various different things.

If weird stuff happens to customers.

But what are you doing to make it harder for somebody to break in and inspect a container.

Or do you have any sort of basic like recommendations for somebody wanting to just over in general secure no secrets.

But their culture in general.

I don't think I have the answer that you want to hear her second one.

But OK you fire away.

Taylor Yes go for it.

All right.

So there is Falco which is the CMC to.

What the call called again, one more time.

It's called Falco.

Palca Yeah.

So that was like a one time you run to stagger or Falco or other stuff we're getting into it quite there yet.

Currently, we actually use twist lock that actually does the same thing at one time.

So it's like the scanning images.

What function is the US was the open source version of achieving that.

So fuck me mean fresh memory up too many things that I can't remember like process free analysis is something that I could trigger a psycho alert off of.

So you said your game server would tell me about that process.

How much time would you invested into playing the open source tool.

Did you look at just giving take a pile of money to use their stuff, which gives you some of that for free.

What do you know.

Yeah, so we're trying to get away from that space and perhaps replaced Locke with something very similar.

I've actually looked at a few of them or not a lot of time invested really at this point too much to actually make that investment over.

I can Matthews take the post.

Watch out once they can deal.

It's a little hard to hear you.

Are you are far from a microphone or.

Yeah Yeah.

I keep thinking I can talk to the screen and I can hear much better.

Yeah Yeah.

Yeah Cool.

Yeah So I can post a link to all the conference talks.

Terms of how will follow.

It was used to actually managed and helped to evade any intruders.

Continues so that's one thing.

But in terms of an actual abuse you haven't actually gotten around to that sense.

Yeah, it's been a while since I left.

If I go to a much larger company right now that accorded to stay for a while and we deployed that whole structure as OPEC.

I don't know if we ended up engaging with them.

But I love working that team.

There are really smart engineers there is similar to how Chicago and any other company is they've got like a bunch of their stuff open sourced and then you hit this cliff.

And if you want the rest of the features you have to pay, which is totally fine.

But I haven't found any anybody that's using Falco in a purely open source fashion and being really happy with it.

But I would love to look into that if something actually related to this.

Mahesh asked earlier today about something kind of related, which was about ways to lock down life.

Is there a way without our back to lockdown.

The ability to exact control other than maybe eliminating shell altogether.

That's probably the right answer right there.

I was good, but even then you can attach to a running process with Capela to match something or other and do something there.

I can't remember how exactly be familiar with attach.

What does.

That where does that attaches you to the two.

That process is console the running process console.

Let me not fail at answering this question and instead just an overflow post.

But and then meet my own second focus that I just like to interact container.

So maybe you still need a shell to attach uncertain.

I will try at some point to remove bash and in cydia or remove all the shells and see if I can attach you'd have a tough time running anything without a lot.

Yeah, you certainly reduce it.

Dale do you know after your head.

I do not.

Gotcha all right.

And don't only to sense that I would add to this.

I mean, it.

So I think you have a right.

Also the general thing in security is always to have the different layers of security.

And I think one of the first layers is to eliminate the vast population of your company.

Being able to keep an interface directly with the clusters in the platform and instead move towards the get driven workflows for all of this stuff where you have audit trails you have an approval workflow, you can do limiting you can do policy enforcement you can do all those things.

The second you want to try and create policies for humans.

We're in for a world of hurt and the scale and complexity, the scope rather and complexity of doing this explodes rapidly.

So that's what I mean by that is this is not really the answer you wanted because, well, this doesn't solve it.

OK So what do we do about our SSD maybe or other personnel.

We need to access the clusters and do things for those different checks and balances.

All right.

Any any other questions Eric.

Are they anybody using hasn't file you know participants.

We started using them fight a lot of.

Yeah, I mean, are you saying we use held file every day all the time, all day long.

I think a lot of folks here now are using help file as well.

Is there some specific killfile question you have.

Yeah the same question when we use Helen file.

So I think that is it bad that we can share.

Fighting between multiple deployments.

I think we haven't started using fight.

But I hard.

Like, there is a possibility that we can share a common ground and fight Yeah Yeah.

Yes, you can have an environment file and then use that.

Basically when you call hell and file you specify the environment that you want to use with a flag on the command line and then that environment file more or less becomes your custom schema or your custom interface to your helm charts.

So the problem right with helm charts is that there is no standard a schema every helm for helm file are all or start.

Help chart author has their own schema that they come up with.

And it's improving with how 3 three, right.

And that you can have it was it.

Jason JSON key specifications to control it.

But it doesn't standardize it more than we have today.

So when you get to use defining your own environments and help file you basically get to define your own interface.

Thank you.

Yeah Hey Eric to kind of pull off of that.

We followed your code examples to set up home fire a little bit.

And eventually it wasn't working.

And then ripped it all out and don't have any help finding more.

Sorry, that's OK.

Could have been easily fixed probably.

But the other reason that we worked it all out was because we wanted to use our go to deploy not only our own apps.

But also other people's apps.

And so we have that all setup in our city to do.

I don't know if I could describe that quickly or pull it up.

A series.

I don't know what's deployed.

So I don't have a demo environment that I can sort of like look at that maybe I could one second.

I mean, this is not terrifying.

One thing I'd say that we are moving away from period that we have been doing in Harlem file.

So we're actively engaged in an engagement in a project right now using challah file as well.

But what we're doing is we're moving away from the copious use of environment variables because of the complexity of managing configuration code with environment variables.

So therefore, we're moving to using environments in the hell file basically, long story short.

When we started using challah file environments that didn't exist environments came much later.

And now we're looking to leverage environments a lot more.

Yeah which is sort of the answer to that previous question a little bit.

How do you get that basic com file to get deployed to you know stage fright et cetera.

Yeah Yeah.

I can't share my screen cause a because you're sharing.

I guess.

But let me stop.

There we go.

It's time to share hopefully.

Nothing nothing's terrifying.

Can you see my screen.

Yeah Yeah.

So we.

This is what our guys see it looks like.

You can sing refresh.

I'm not going to click too much.

But you can see we've got a bunch of stuff being deployed from one of our own apps, which looks like this has like various different processes running in it too like cert manager and like our metrics and these types of things is working as expected.

We've only put one of our main services into the node service, which is arguably less like if this goes down, our users won't like really, really upset.

So we're waiting until we feel comfortable with all the setup before we put more of our important services behind it.

We'll take it a bunch of angry users.

But you know they'll have customer support and you know the world won't go on fire.

There won't be any New York Times new sets about it.

So like figuring out how to make all this thread with racv is something that we have made a lot of progress on.

But it is a way where we would never use compile and you would use, how far to deploy a bunch of these things via home instead of the way that argc is playing it correct.

Yeah, we also might have different like ambitions or different things we want to optimize for.

So often a individual company can make some trade offs that we can't make in individual company can build kind of a system here for deployment for deploying these apps.

That is highly tailored specifically to your already unique requirements of what you want to deploy.

And you know what those requirements are today.

We so sorry.

What are those requirements is so that it's easy for us to maintain it and to grow our team and hire people who know how to maintain it.

Which the more customized it is, the higher that becomes.

Yeah or one of the reasons why we're using could raise itself.

For example, is to make it easier to containerized our services in a way that can be maintained by people who we don't have to train from the ground up.

Because it turns out every LCD incriminates too.

For example.

Yeah, I'd love to be able to follow the same principles with more nuance to aspects of our system.

But you're right.

We're getting into the weeds like, OK, we still use this Splunk thing and Splunk doesn't only support secrets one way.

So we have to do secret their way in order to get this third party vendor service running for this forward or I should say Yeah Yeah.

I hear that all the time.

And I am sympathetic to it, and I don't want it to be this hard.

I don't want it to be this complicated.

I guess we're just still in the early phases of letting this stuff get figured out.

Plus the term best practices is so fleeting in this world, because the capabilities are changing faster than the practices can evolve.

So making generalized statements on how to do this.

This is a quick way to know if I want to buy ones or ones words in about six months from now and we'll change everything 15 times before I wake up in the morning.

The harmful verses are our thing.

I'm sure there is a way where you could use just diversity for your own internal applications and help file for just your external applications.

We're using a combination of home charts and customized manifest files around our applications in part because of the way our city restricts you.

I'm not actually doing at home voice.

I can't even go back to your home.

It's doing a film that's expanding it, and then applying to Manifest Files.

But the other features.

You've got it.

I shouldn't click through to show like those manifest you can see the difference between deploys of each manifest file in the UI.

Which is great for developers to sort of understand different and basic liberties pieces.

Do you have like an idea from your assurances of where home file shines versus where something like this.

Yeah let me let me show a little bit of both.

Also Alex was kind of asking to see an example of what I was talking about.

If we go here too.

Why does that.

Yeah So I'm going to go back here to the cloud possibly home files and show first I guess.

Let me go here.

So arguably one of the most talked about taking note does not even talk about Helm or helm file yet right now.

What are the characteristics of systems that have proven massively successful almost viral in their adoption.

Back in the day we go back to Ruby and then we had Ruby jams and it's that ability to have this registry where you can easily download all these gems and get immediate benefit out of it.

Moving on to Python same thing.

And then moving on to Docker like the idea of containerization existed long before Docker came about with Docker did is they combine these concepts made it might size and easy to consume and had a registry.

So that it was highly reusable.

Let's step forward to help.

The reason why I'm also still a big proponent of helping is that we still need a way to package the knowledge of how to distribute apps apps, which maybe were never designed to run on Kubernetes in a way that we can run them on communities.

And I get all the negative critique of home.

But I don't.

But I don't see any better alternative that achieves the other things that I just talked about.

So like we have the customize these other things.

I still don't see a registry.

So to say of distributing that know outside of your organization.

And if that stuff isn't built from day one to live outside of your organization.

What we're doing is we're building snowflakes so go do it.

So I'm a huge fan of help and a proponent of it, especially with how I'm through with the downfall of tiller which we have not Puerto over to you.

But will I specifically was looking at him file like the use of deploying charts through.

Yeah because we don't have any intentions of moving away from all of our vendors that we pay money to maintain outright.

So we're going to up because that's the easiest way for them and to handle upgrades and whatnot.

We're plugging it into our TV, which takes away some of the value of home.

But not any of the things that you just said.

But by playing into our city in order to get a bunch of other value for our customized which we had home plus customize you can glue those two together.

They're not exclusive.

But in the way it's being held as a template engine not as a package manager and I want to say that these are distinctly different things where I look like if I were using Helms strictly as a template engine.

I know I'm less excited about it.

But yeah, I mean, the idea is we would publish our home charts.

So that the world can use them and then deploy them in our cluster our way without home to play.

Yeah, you're totally fair to do.

You're totally allowed to do if you wanted to do that.

But here.

Let me just withheld file.

And I brought up the example of Terraform a why I love Terraform so much.

And I think it's been massively successful it's partially because of modules and how easy it is to distribute the knowledge of building infrastructure with modules.

Compare that to CloudFormation which I mean, it hasn't myopia in my mind hasn't had the level of popularity of Terraform possibly because of what the text is ugly as sin, but also because of usability across organizations to get started.

They're doing a little bit more of this.

I think today with TDK but I'm not I can't speak intelligent about that.

My point here is then with how in files we get the same thing so much.

So what is the problem with health.

The problem with helm is every vendor out there gets to define their own schema for how to install and manage that application.

Also, sometimes there are additional things that we want to achieve that they help chart doesn't do.

So we need the escape hatch to be able to do that.

That's what we're getting with Helen file is basically this ability that we can share the knowledge of how to install the Elasticsearch exporter.

So all a kind of need to do is this.

And then we as cloud posse define our own schema for how to install this.

OK, I'm I want to pause there for a moment.

And then say, what I'm unhappy and this is why I'm unhappy with what we have been doing a little bit with alum files.

So we were using environment variables to the extreme.

And I can I'll on a separate office hours talk about kind of my soul searching on that.

But let's go and look at one of these guys like, hey, I am so busy and in here we.

This is kind of like the schema that we need to follow.

And in order to be able to install all the custom CFD using the raw chart.

Then we come down here.

And so there's just this.

Don't get me wrong.

There's a lot of ugly here.

But we're still being able to wrap all this this support provided upstream by the help chart maintainers.

So then our interface has been environment variables.

But for aforementioned reasons I I'm less jazzed about using environment variables to the extreme to the extent we have.

And that's why now.

What we're doing is using the environment files instead.

And the environment files are designed to be more digestible by developers or by the cons by the users of homes while I'm just going to try and pull up an example here.

If I can quickly.

I mean another window here is, if you don't see what I'm doing just wait the second this relates to your question, Alex on what this is look like.

Sorry I'm still logging into this system.

Well, I love it.

Objects on phones.

Yeah, this provision.

So this is what it ultimately looks like now when you're using environments.

So here's the EFS provision or here's the home file.

Here's what that looks like.

So they've defined their schema.

This is upstream with the help chart maintainers want.

This would be a law task of an ordinary developer to figure out.

But by using environments we reduce this.

So this is the configuration file format that we follow that we use.

So this is all that they need to share this all that they need to set up and configured.

And we can add or subtract from this, we can define defaults.

So what we've defined as we've defined our default environment environment.

So these are all the settings that we have by default.

And then these are what we override per other environment.

So this is what I mean.

I like the interface that we're then ultimately able to expose withheld files, and you might be able to see the parallels between Terraform and Helm file when you architect it this way.

So basically, our held files become modules for home and our environments are like the variables that you pass in TFR files in Terraform but we're doing it normal files with help from is is that more clear now.

Adam Yeah, right.

It's helping make you sort of understand more about how you guys are doing.

I understand how it is.

But that it would be difficult for me to do direct apples to even apples to oranges comparison because we also have three clusters and an application to play each of the three clusters with different values.

As it stands right now, the when we update those or get pushed.

So I think I would rather it like Jenkins job to monitor for a helpful change.

And then automatically deploy it.

I would like to control aspect of things a little bit.

But yeah.

So in this case here.

OK So I hear that.

OK So in this case here, we in lined the home file.

But this could have been a remote file.

The reason why we're enlightening in here is because we're in transition moving to this format from our upstream ones that use environment variables.

So So what happens here is that we treat this as a model repo and then when this repository changes we're able to apply systematically those changes.

And since the configurations are per environment you said you deploy to multiple clusters.

Well, so do we deploy multiple times to multiple clusters.

That's not a problem.

It's just more environments that you would be defining here right.

I think so.

I meant to say that these are good useful functionalities that both define the paradigms.

The custom piece for us.

Let me think.

So when using some home charts we wanted to patch things and you can't do that easily.

And that is the idea that you can.

Which is one of the signs.

And so it's easier for me to compare customized versus home or customized draft home because I know a lot about those.

But the wrapper around home using home file.

I can't speak to us as strongly.

So was trying to get a little bit more of the specifics on that about.

Yeah, we'll see.

Good Yeah.

No there are.

You're right that there are certain types of things we cannot do with this strategy like if we need to change the actual structure of what was generated by the helm chart.

Yeah, no go.

We can maybe rely on some other third party things that might inject things in there like an Etl to inject the pods.

But we're by and large, we haven't needed to do that as much as we've needed to add additional resources like oftentimes, we need to change the way ingress works.

Well, most of the time, the authors allow a way to disable that built ingress and then so we always is we disable the built ingress and then we use things like the rod chart or our motto chart to add all those additional resources we need.

So here's a perfect example of where deploying cert manager.

And cert manager doesn't do everything we wanted to do.

We want to install the Cia's for example.

So here what we have is we're using the Kubernetes raw chart to deploy each of these CRT is just defined as in line values.

I remember asking myself, why does a cloud party chart not need to install CRT separately.

And you just answer that question even though I didn't ask it, which is great.

I asked that question like a month ago.

We're not using your chart right now.

We have a separate manifest file that get deployed after that chart gets deployed to deploy them.

But looking at this, I would love to switch over to this.

Well, this is also that helm too has limitations that helm three dozen and managing C or D, the helm 2 versus some three is different.

So this.

I don't believe we've adapted this strategy to embrace the capabilities of 3 and maybe cert manager as well as updated the later releases to support the CRT books or whatever they're called home free.

They didn't when I was looking but that was now a month ago.

So maybe they did.

Yeah Yeah.

My example that came to mind where I was saying that we wanted to catch something which was actually my which I know you guys have an easy CSA lettuce.

Yeah like it.

We didn't like running and lances in Cuban areas because then you have a god pod that somebody can exact god to do anything.

Yeah, it's pretty terrifying.

Hence my security questions that I was just prompted in the beginning of this.

Yeah Hands off truckers.

Yeah like I as much as I love Kubernetes as much as he can.

Just because you doesn't mean you should.

And that's kind of why we went the farthest look.

Our first V1 of Atlantis was doing it in Cuba in 80s.

And we were successful with that almost immediately.

But then I realized do this.

This is scary as all hell.

The other pod sitting there, which is God of you.

In that account, at least or whatever roles you give that pod to do things.

And the fact that companies out of the box lets you exact into that.

That wasn't sitting happy with sitting well.

We put it in a dedicated tools cluster, which has different things like the Argo city and vault and I just noticed a cluster that basically doesn't get used by other teams and is properly segmented in some ways that one to us cluster does have.

God mode over at the rest of our clusters.

Yeah So it's a great command and control plan.

If you want to get into that cluster.

What's the host, by the way.

Hold on.

Let me guess.

But and then behind a VPN and et cetera through those aspects of the previous ID we had was just running on a standalone to instant spot up with Terraform and had like a maybe not super polished playbook, but absolutely.

But so this seems like an upgrade in many cases, it's much more AJ we've added a bunch of metrics.

And in the process and got all of our all the other benefits at the same time.

Right but an extra plus I started to worry more about the security of our clusters.

So we're also improving our security by being scared out of your mind.

Plus you're streaming all the output from Terraform into your log.

So you know you're already as passwords and everything you know are using shared that way.

And we haven't put any passwords into Terraform, which is a pain in the butt.

So because it means that there's a few pieces of structure that are not mean came by in order to get those passwords.

There's a few ways you can ask them and not get them streamed to logs and figure that out.

But yeah, we went down the masking route.

But it's like whack-a-mole.

Again, there's a lot of things you can't mask out of the box.

Yeah, you can pull it out of like came out or altered to do some stuff and try to get it not like have it Terraform do it after the fact.

Basically Yeah Yeah, thanks.

This conversation was helpful for me to feel like.

Sure Yeah, sure, sure.

We didn't get to cover everything today.

But this was great.

Good conversation, guys.

We reached the end of the hour here.

So I got to wrap things up.

Thank you for joining us today here are some things to check out if you're new to cloud past your sweet ops and what we do.

I'll make sure to sign up to our Slack team if you haven't already.

If you're tuning in from like a podcast or other media make sure you register for office hours.

So you can attend these in real time.

Going to cloud posse slash office hours.

We syndicate these as a podcast.

So if you'd like to consume that way.

Go ahead to IPOs the slash podcast.

You can subscribe whenever you use.

Anyways a recording of this is going to be posted to the office hours channel in a little bit automatically as well as posted to all our social media formats.

Thank you guys.

Talk to you next time.

Yeah, thank you.

Thank you, everyone.

Yeah, thanks.

Thanks Thank you.

Public “Office Hours” (2020-02-21)

Erik OstermanFebruary 20, 2020Office Hours

Here's the recording from our DevOps “Office Hours” session on 2020-02-21.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Machine Generated Transcript

Let's get the show star.

Welcome to Office hours.

It's February 19 2020.

My name is Eric Osterman and I'll be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator.

We help startups own their infrastructure in record time by building it for you and then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to arm yourself at anytime if you want to jump in and participate.

If you're tuning in from our podcast or YouTube channel, you can register for these live and interactive sessions by going to cloud posse slash office hours.

We host these calls every week we automatically post recording of this session to the office hours channel as well as follow up with an email.

So you can share it with your team.

If you want to share something in private.

Just ask.

And we can temporarily temporarily suspend the recording.

With that said, let's kick it off.

So I got a couple of talking points lined up here today.

If there aren't any other questions.

First, some practical recommend nations where one factor out of cabinet is a checklist I put together because it comes up time and time again, when we're working with customers.

So I want to share that with you guys.

Get your feedback.

Did I miss anything.

Is anything unclear.

Be helpful.

Also John on the call has a demo.

A nice little productivity hack for multi-line editing in the US code.

I've already seen it is pretty sweet.

So I want to make sure you all have that your back pocket.

Also Zach shared an interesting thing today.

I hadn't seen before, but it's been around for a while.

It's called Goldilocks and it's a clever way of figuring out how to right size you Kubernetes pod.

So we'll talk about that a little bit.

I am always on the list.

But we never seem to get to it.

Are you hotel ops interview questions.

So Yeah conversation was valuable over that as well.

All right, so before we get into any of these talking points.

I just want to go over any or cover any of your questions that you might have using Clyde technology or just general related to DevOps I got a question.

All right.

Fire away in a scenario where you have multiple clusters.

And you may not be using the same versions across these things.

Let's say you're in a scenario maybe you're actually managing for claims.

How do you deal with the versions of cubes deal being able to switch between cluster versions, and so on.

If you're getting strategies out there that you can actually utilize.

Yeah, I'll be happy to help answer that before I do anybody have any tips on how they do it.

All right.

So what we've been doing, because this has come up recently specifically for that use case that you have.

So I know Dale you are already very familiar with geodesic and what we do with that.

And that solves one kind of problem.

But it doesn't solve all the problems.

So for example, like you said, you need different versions of cubed cuddle.

Well, OK.

So cute cuddle I want to talk about one specific nuance about that.

And that is that you've got a list designed to be backwards compatible with I believe the last three minor releases or something.

So there's generally no harm in at least keeping it upgraded or updated but that might not still be the answer you're looking for.

One thing you wanted to I know we had to address this situation separately.

So that's why what we do is under cloud posse packages.

So this is good outcomes.

Cloud passes large packages we distribute a few versions of cubed cuddled pin at A few minor releases.

Now we think we only have the latest three versions because we only started doing this relatively recently.

But the nuance of what we're doing with our packages is using the alternative system that leave data and originally came up with.

And it's also supported on alpine.

So basically what you can do is you can install all three versions of the package.

And then you point the alternative to the version that you want to use to run those commands.

So that's one option and that requires a little bit of familiarity with the alternative systems and how to switch between them.

I think there's actually a menu.

Now that you can choose which one you want.

But the other option, which we use for Terraform is where you will want to have a bunch of 0 to 11 compatible projects and some zero total compatible projects and you want to have you don't want to be switching between alternatives to run them.

So what we did there was we actually installed them into two different paths on the system.

So like user local Terraform zero 802.11 then.

And that's where we stick the triple binary for that.

And then zero then user local Terraform 0/0 to been slash Terraform that would be the binary there.

So just by changing the search path based on your working directory or where you're at and you can use that version of Terraform you want without changing any make files without changing any tooling like that that expects it just to be Terraform in your search path.

I can dive into either one of those explanations more if you want to see more details on that.

Is that along the lines if you're looking for are you looking for something else.

Yeah Well, so there is a tool that came across before I really got into Judy's which was called tips, which happens like a form where you can actually switch between the versions are getting good.

Yeah, I was hoping that there was something that I hadn't found that was similar to that I would do a switch.

Gotcha Yeah.

There could be.

Like is it.

I just have a thing for Terraform also you're working on.

Yeah, I know your question.

Dale was related to Cuba.

But I mean, I'm just speaking for me personally.

I don't like the end switchers like rb ns of the world, the CSSM like the TAF cover version managers and stuff like that for us.

But obviously, it's a very popular pattern.

So somebody else is probably a better answer one.

Yeah OK.

So what.

There's one other idea on that.

Or let me just clarify the reason why I haven't been fond of the version managers for software of the purpose built version managers like that you need an IRB in for a real Ruby or whatever.

And then you need a terrible man for Terraform and then you need a one for coop kubectl and all these things.

What I don't like about it is where does the madness stop.

Like we technically need to be versioning all of this stuff, and that's why I don't like those solutions.

That's why I like the alternative system because it's built into how the OS package management system works.

Do you foresee taking a similar approach for Terraform and Virgin switching with.

So you could do in the future.

Well, so well.

So yeah.

So sorry.

I kind of glossed over it or handwaving so I go to get a cloud policy packages nap.

Now I don't know if you're using alpine and alpine as I come a little bit less popular lately with all the bad press about performance.

So let's see.

So if we go into the vendor folder here.

So we literally have a package for a cube cut all one of 13 one to 14 1 to 15.

So we will continue doing that for other versions and the way it works with alternatives is basically this.

And you don't have to use alpine to do this.

This is supported on the other OS distributions too.

But is you have to install the alternatives system.

And then when you have that, you can switch.

So OK.

So this here shows how to set up the alternatives that you have.

Then there's another command you use to select between the versions available.

And once you've installed all the versions it's very easy to select between those.

I just don't remember what it is up to.

It feels like a fuzzy wuzzy thing.

Yeah, I think he uses FCF for whatever that is.

Yeah You mean in your screen.

Oh, yeah.

It was.

Thank you.

Thank you guys.

It's a public service announcement.

I always forget to share my dream.

And I talk and wave and point to things like, I am.

Yeah So this is.

Yeah So here in cloud posse packages under vendor.

We have all our packages here is where we distribute the minor pin versions of cubed cuddle and then we also do the same thing like for Terraform likes.

All right.

Good question.

Anybody else have any dads that have one more.

If no one else is going to ask.

Also just really reminded me.

But it doesn't look like Andrew Roth is he on the call.

This conversation because Andrew has a similar tool to dude exec that he's calling dad's garage and the conversation was.

So if you look in I forget what channel it was a release engineering.

If you look in the release engineering channel those conversations.

Yeah Next question.

Dale So has anyone actually successfully done any multiple arctic bills for Docker.

Let's say that one more time multi architecture builds.

Oh, interesting.

So you're saying like Doctor for.

Oh, but doctor for arm and doctor for EMT 64 something that correct.

Yeah Yeah.

I'd be curious about that.

No personal experience.

I've been trying to do something while do that kind of build for going to frustrate petrol stop.

I can say is this where your Raspberry Pi could bring it closer.

Yeah So I figured, well, this is part of it.

But I just came across it where I'm really into issues with my Raspberry Pi three beetles where it's using R these seven and a lot of the packages are built to support our V8 which is 64 get to be 70 is 32-bit.

And I've run into the issue where it just would not outright launch little begin.

Figured out it was Dr. architecture that was the issue where some sources will support it.

Some will not.

Unless you're planning to do your own build and all that you know.

So I started to dig into it a little bit more guys like a rabbit hole.

Yes, it was.

It certainly was.

Yeah All this stuff is pretty easy.

So long as you stay mainstream on our protection.

What everyone else is doing.

But as soon as you want to do a slight minor variation.

The scope explodes.

Yeah, I'm at a point where much of an upgrade to the raspberry force.

Yeah And Yeah not better, just forget about it.

No, that's right.

No, but I did learned something new.

So that helps.

Yeah Yeah.

Echo Cole.

Any other questions before we get to a demo.

John, do you want to get your demo setup.

And let me know when you're ready to do that.

Yes code demo.

I'm ready to roll.

You ready to roll.

OK, let's do it.

I'm going to hand over controls.

Go ahead.

Oh, OK.

So just a quick introduction to everyone here.

So John's been a longtime member of suite ops.

He's hard core on form and developer hacks and productivity.

He gave a demo last week that was pretty interesting.

Yeah, check it out.

That was on using Terraform cloud and the new trigger triggers that are available in Terraform cloud.

So having him back on today he's going to show a cool little hack on how you can do multi-line editing which isn't something I almost.

I've seen it before.

And I kind of forgot about it.

And it's something like you almost don't think I don't need that.

But when you see it, you'll see that you need it.

And want it.

And I actually did as a quick follow up to the Haskell demo last week for run triggers and I did talk with them today and they aren't looking at doing a few UI enhancements as noted it is a beta right now.

So it's not meant for production use.

But we did talk about some use cases and some potential UI improvements and things like that.

But they are definitely excited about the feature.

So let's go over.

We see some stuff going on there.

But as far as multi-line editing.

So I'm a pretty big on efficiency.

And I hate wasting time.

And so it always.

I was managing a group of developers.

And I would always be frustrated watching them edit something because I tell them, hey guys if we're doing Terraform let's say, for instance, to keep it relevant hey go ahead and put these outputs and let's do an output for, let's say all of the available options here.

Let's do an output floor for those.

And it would basically go something like this.

All right output something output.

If I could type.

Typing is hard.

Something else.

And so you're just going through and slowly typing all of these outputs, which are all basically, let's say outputs from the telephone documentation.

And so you know I started showing them these little performance tweaks performance hacks to kind of improve on that speed.

And so what I'll do here is actually take all of these arguments for EC2 let's say if we were wrapping an EC2 instance.

And we wanted to provide all of these outputs and this is generally in the case of I mean, this is right now in the case of Terraform but it could be anything.

So we have a couple sections here.

Note we don't really care about.

So I'm just going to highlight the note.

The colon and the space and use commands to actually get rid of those.

So now we're basically dealing with just outputs.

Now, the key to multiplying editing is to find something that's common.

So the shortcut by default in Visual Studio Code your shortcuts may be different.

To do a single multi-line selection is commanding.

And so if I just use Command B on a is going to go through and select every day if I use Command Shift l it's going to select every a in the left in the document that I'm currently editing.

So it's very important to find something as unique if I just do a you can see here like PBS dash optimized single tenant CBS backed.

That's not really going to give me what I'm looking for and what I'm really looking for is to take the left side of this as the label on the right side as the description.

And so I can kind of get a little better by selecting the space around the dashes but then I still end up getting because normally they do just optional, but in this case, they did optional dash.

So it kind of catches this one as a little bit off.

So when you can't find something that's unique across all of them.

One thing that every document has is Lindbergh's.

And so if you have basically everything on a single line you can find all the line breaks and just use that.

And in this case, I'm going to actually get rid of these empty line breaks here.

So we can just have a clean structure.

And so if I basically go to the end of the line shift right arrow Command Shift l and then go back with my left.

I'm basically editing every single line.

Now I'm just at the end of all the lines.

So now I can move around and operate as if I'm on a single line, but I'm editing every line.

So if I'm doing an input here I can just type as I normally would.

And at this point, I'm inputting all don't remember how many variables it was but all 30 or 40 variables here and now have all of my descriptions filled out.

So that thing we can do is we know that for, let's say all of these options we don't want to have that in the documentation, because we're going to automatically output that we can just highlight all of the options Command Shift they'll delete it.

And then we can give it some default value as well.

And so the key is actually just moving around with command to go to front or back of line option to go by word as opposed to going by individual characters.

Because if you see line six here.

I'm here on start.

But I hit Option right arrow the same amount of times.

But here.

I'm in a different position because the length of the description changes.

So if I need to get to the beginning or to the end.

I need to use my command keys in order to more quickly get to that space.

But this is a very quick way as you can see, even with all the talking I've been able to adding all up.

Add in all of these values pretty quickly edit them get them in a massage place.

And now I can actually go in and tweak anything that I need to tweak.

There are times like this where you're not going to have as clean of a set of data.

I guess you could say so you may have to come in and make some manual adjustments or something like that, just to get this cleaned up.

But that's it.

Any questions around that.

I'm happy to answer.

But this is I just found us being something very important when I started creating modules and especially when I needed to output a whole bunch of outputs or have a whole bunch of inputs in that position.

Yeah, that was really helpful.

I've had that problem with outputs especially when you're writing modules and you want to basically proxy all the outputs from some UPS the upstream resource back out of your module.

I think this would save a lot of time.

Any questions, guys.

So it was good.

I've seen is a pretty common trend.

Like if you want to output multiple attributes of a single resource people define them individually.

Something I started doing was just outputting the entire resource itself.

Loving the end user choose from that map.

They want what they kind of want to interact with.

So is that a bad practice.

I know I've really seen it in public modules.

But I try to BBC I pretty much as output the entire VPC resource.

And then you can access it with dot notation just like you would.

The actual resource Yeah.

So you're on a terror form.

I want to correct.

So this was not something we could do in.

And so you had to output.

Every single one of these.

I switched also to doing a little bit more of just return the resource.

It's a lot better.

It's a lot easier to manage and maintain on that front.

But I guess there are some cases where you may want to adjust those things like if you have a case where you need to do.

Let's say out module.

Let's say you have an AWS instance, and you have multiple of those.

Something like that.

You may not want to output that as an idea and you may be using count just to hide it.

So this is where you kind of get into that place of doing these sort of things to kind of bring them into one output.

So I think there are some cases where you may want to do some of these custom outputs.

But yes, in general, I do like that you can return side got x I have underlined that question.

So does anyone know how can we do count dot index when we actually put resources for loop.

There's actually a few ways with a for loop it kind of depends on your scenario.

But there's a few different approaches.

I think actually the documentation is generally pretty good here on that.

So if I can find it real quick always go to a few different spots here.

Might be under expressions.

Finding it is always difficult for me.

Here anybody remember where it is it says four and discourage.

I think that's how you do the final.

Yeah So my question here is you know why I want to use for ages.

You know when you are using count.

And then you more do things maybe more the index I find on a list or maybe remove some items count actually messes up the whole sequence part.

So if you're using four for each eye creates you know results for each game.

Each item separately.

So it's kind of a set of the things or maybe a list of the things now.

But I still want to utilize the index update and had you know say, for example, if there is an area on the list, which has five items in it.

I want to trade on each of the index like zero 1, 2, 3, 4.

So I know what we were able to do it with the count.

I also do to find a way to do that similar end of stuff for each yet.

So I mean, the for each returns a map.

So what you could do is I think that use the values function to just access.

And it basically that will return a list of the values and kind of strip out the keys and then from there, you can column by index.

Oh, OK.

I tested out this page here actually goes through.

There was a place where they actually kind of gave a pretty solid breakdown.

Yeah right here.

And I linked to this in the chat but it's a pretty solid breakdown of when and why you could want to use one over the other.

And it kind of shows you like when to use count y you want to use count.

In this case, you know it's something that is kind of case by case.

Right you can use count for certain things.

But then there's certain things you may want to use a for each loop for.

So I would say it's case by case.

It really depends.

But I do agree with no one that in essence, I think everybody's use to count from 0 1 1.

So a lot of the times when you're transitioning and you want to use for each but then other resource references the original once you kind of get this mismatch.

So where possible.

I've been trying to convert from count to four each and use the keys instead of the indexes indices.

So I think it's just the problem occurs when you had previous code from 0 1 1 where you already have all the logic with contemplates and then switching it over is just an extra pain right.

Yeah So for that, you need to do a lot of DST DMB we trillion trillion more those resources and make that the data file and all graceful another question and maybe you can answer since you did a lot of modules.

So I was working on one.

One project for you know setting of the priorities of the rules and then is not to prioritise that dynamic solely.

If, for example, if we have a beautiful which deletes or maybe destroys and recreated the next room takes a bright step and then actually messes up the whole thing.

So where do you go to find a workaround for that kind of scenario.

I would have to defer that question to Andre who's been doing more of the low level Terraform stuff on like the modules and things like that.

Also, we are doing a lot more just I mean 99 percent of what we do as a company is uncovering eddies and therefore, like the bee rules is not something we deal with.

We just use ingress.

I do know that some of the stuff got a lot easier with zero 12 passing max around.

But I can't I can't give you a helpful answer right now if you ping Andre on the Terraform jam channel a that's at a k NYSE h he'll be able to answer that.

All right.

Thanks, Adam carolla.

I missed the question, but what was the question about the.

Yeah, John probably know the answer.

Yep So the question is for you know setting up the priorities of the rules in the application load balancer.

So when you have defined priorities for you know there'll be like lateral Terraform resource.

And then if there is you know if there is an instance where that rule.

Yes Yes regarding that I mean guess false destroyed if you're changing something.

The next rule takes the priority of it.

And then the whole sequence gets messed up.

So you know I was just looking to get there you know freeze to way where things remain as it is if at all.

If there is a requirement of regulating that list that rule and next rule doesn't get the priority that, yes, it is you actually can control that priority on the configuration.

So there's a priority.

Yeah So I am using that.

However, if there is an instance where you know by chance if that list never requires to get recreate it by, changing some settings, which requires us to get false destroying the default behavior of four it'll be easy.

You know, if I leave if, at least, the rule is deleted it actually changes the priority of all the rules which are coming next to that.

So if you are say, for example, at least that ruling with a priority of number five guess where you created all the rules after a number five will be unless you've had one.

One step about like six would become five 7 would become 6 and 8 would become seven.

I actually haven't seen that.

But up that may be because I always use factors of 10.

So I don't put them one by one.

I'll do like 01 10 20 up to whatever.

Yeah but.

But that may be why I haven't seen that actually.

But in all the primaries, we said we never I never seen them reset once their form applied again.

They were good to go.

Yeah, I play is good.

But if at all.

There is an instance where the results gets recreated that's where it starts freaking out.

So you know there is if you're not, you're not there.

There is a limit of number of rules that you can have and want to be.

And they will still really legacy application of the initial application, which required us to ensure that a lot or the Beatles.

And this is where I actually was able to replicate this issue interesting with how are you.

First of all, are you using you.

You wouldn't happen to be using the cloud posse AWS Albie ingress Dude, are you ingress.

No this is just simple it works on easy is not going to be noticed.

Well, yeah.

So the dad does a little bit of the confusing thing.

So what we did when we were designing our ADA are easy s modules and our modules for Albie we kind.

We made the opinion we took we had the opinion of making them feel like the Kubernetes in an interface for ingress.

So we created a module called from Adobe yes hb ingress which is which defines a rule.

So that it works similar to defining an English pressing Cuban entity even though you're defining a rule for an elite and using, say yes for example.

So when you do it this way, and you're defining a module explicitly for each rule.

And I it's been working for us.

But I can't say just like what John said, if we've encountered this thing where the priorities get re numbered or something like that hasn't been reported yet to us.

But the different part of what I want to make sure was that you weren't using something like account together with defining your rules.

So you know we do have some dynamic parts, which also involves doing this all also, again, again, a reference to the question that I asked prior to that by using for each and.

Yeah Yeah, that makes sense.

So you usually you're not emotionally mentioned the list, and then it starts freaking out.

I think that the problem.

Therefore, you have is probably more related to the module or the way you're doing the Terraform maybe than some fundamental limitation of like LP resource with themselves.

Yeah Yeah.

Because we actually hit an issue where sometimes if we have to recreate a target group like it if you change the name switching from one of the cloud posse modules, there was like a fundamental name change of target groups.

And so to delete sometimes they don't delete properly, they kind of hold on there and you so you can't believe it's hard to do the best thing use.

So we did a lot of manual deleting Warren just at this last week actually.

So we did a lot of manual deleting and I never saw priorities change all the priorities were stayed in line.

Yeah So target group is key.

But priorities only change if you know change the path or something like condition or something where that's where the listener role gets created.

Yeah but once you give it a priority number it should stay there.

I would Yeah.

But I want to I still want to fix the priorities for because those rules that dynamic.

So if you had a part, which is coming from a variable, which is a list.

Then you know you had another.

But then it really, again, speed it up a new listening to the next priority.

And then becomes harder to maintain.

Yeah So maybe look at adjusting your list and utilise a map or something to where you can define like a set priority for those that way when you loop over it.

It's a set priority every single time, no matter if you move it up and down in your list or something else, make it explicit.

Right OK.

All right.

Let's are there any other questions.

So high.

I have a question for my use cases like I kind of want to generate a report like based on like underutilized easy to incent and over Italy's ancient and automated to send the reports to an email.

Like if the incentives are running less than 10% I would like to see the instance list.

And if it's more than 80% I'd like to see the list of incentives.

So I just looking for ideas to achieve this catch up.

So the idea that somebody here might have a suggestion for that we work with mostly just infrastructure managed under Kubernetes and so we're not really concerned about any one instance, in and of itself.

From a reporting perspective.

But the.

Yeah anybody familiar with some tools to generate the reports that he's looking for AWS has the costing users report.

That's kind of built in.

I don't know about the request of emailing but I know that's in there.

Mm-hmm I could turn it into s.a. you could probably do that for sure.

Put this on the track.

It is probably the utilized utilization for some period of time right.

Yeah some period of time, like a week few days like the one of us.

Yeah somebody else comes to mind just feel free to share that either in this chat here or in office hours.

There are also I mean Yeah.

So like on talk of like there's the trusted advisor stuff, which will make similar kinds of recommendations.

And then there's I think they're SaaS services as well.

But I don't think that was necessarily what you're looking for.

Does this space for everything right outsourced totally outsourced go.

Yeah Any other questions.

I sent all right.

Well, let's cover a couple of the talking points here that came up.

Sure I know.

I definitely recognize a few people in the call here using Kubernetes a lot.

So I think this Goldilocks is a will be a welcome utility that was shared Goldilocks we haven't they I only learned of it today what I thought was interesting was how they went about implementing it.

So behind the scenes it's using the vertical pod auto scaler in just notification mode or in like debug mode.

So that it's not actually auto scaling your pods but it's making the recommendations for what your pods should be.

And then they slap a UI on that.

And here's what you get.

So I think why this is really valuable is.

Well, if you saw the viral video that went around this past week about how they dubbed it that famous video with Hitler reprimanding his troops.

Well, they took that, and they said that they parody the situation about running companies and production without setting your limits and how asinine that is.

Well, it's unfortunately, it's a pretty common thing.

So helping your developers and your teams know what the resort what the appropriate limits for your pods should be is an important thing.

So this here is a tool to make that easier just building that.

Like when I joined this team that was an issue as well.

I guess at the same time, companies were getting to companies in an early stage all the experience was not there.

And we will find that over time.

We have a lot of evictions going on within the cluster because we didn't have any resource limits set for these things.

So we actually had to go through the process of evaluating which why this kind of tool.

I didn't know existed until after questions.

Well came about which would really help because we actually implemented data to help us give us some color back in terms of what was going on.

And then started to send more traffic to see what 40 points are to determine that.

So I figured that everyone had their own methodology to determine what those resource limits were going to be as well.

And I saw this question for this flip was posted some equation in terms of how to determine the ratio of limits and you know that it was this whole thing so much looking forward to using this tool to make a determination as well.

Yeah, I think that's basically been what everyone's been doing is looking at either their Prometheus or looking at their graph on us or data dogs and determining those limits, which is fine.

What I like about this is that it just dumbs it down to the point where literally copied copy and paste this and stick it in your home values or you know if you're doing raw resources stick it in there and you're good to go.

There are a couple other tools I've seen specific.

So the concept here is right sizing, so right sizing your pods.

So if you're using ocean by spot in Pets.com so ocean is a controller for Kubernetes to right size your nodes.

Ocean also provides a dashboard to help you right size the workloads.

The pots in there.

So that's one option.

The other option is if you have cube cost cube.

Cost is a open source or open, core kind of product that will make recommendations like similar to trusted advisor.

But for Kubernetes and it also gives suggestions on how to right size your pots.

All right then let's see what was the other thing I add here.

So I asked with some feedback got it got a lot of great conversations going on in the general channel.

If you check yesterday related to this.

We're working on an engagement right now and the customer asks so what kind of apps are should we tackle first for migration to Kubernetes.

And it's a common question that we cover and all our engagement.

So I thought I just kind of whip it up and distill it to the characteristics of the ideal app.

I like to say that pretty much anything can run a Kubernetes the constructs are there the primitives are there to sort.

So to say lift and shift traditional workloads.

But traditional workloads aren't going to get the maximum benefit inside of Kubernetes like the ability to work with the horizontal pod out of sailors and use deployments and stuff like that.

So here are some considerations that I jotted down using one factor as the pattern.

So the total pull factor app.

I'm sure you've been working with this stuff for some time.

You've seen those recommendations before.

It looks something like this.

In my mind that they're slightly dated in terms of the terms as it relates to Kubernetes and it's a little bit academic.

So if we look at the explanations for how these things can work.

So OK.

So first of all, this is very opinionated just talking about like specifically Ruby and using gem files.

But let's generalize that right.

So the concept here is if we're talking about Kubernetes is we still want to pin our software releases.

But we also can just generalize that say distribute a darker and Docker file that has all your dependencies there in and into releases.

So that's kind of what I've done here in this link here.

And it's a check list.

So if you can go through and identify an at the check stuff all of these.

It's a perfect candidate to tackle first in terms of migration.

I like to say move your easiest apps first.

Don't move your hardest apps until you build all that operational competence.

So working down on the list here.

Let's see here.

So there's a little bit opinionated.

But we really feel like Polly repos now are so much easier to work with and deploy it with the pipelines to Cooper and Eddie.

So that's why we recommended working with that using obviously get based workflows.

So this is that you have a pull request review approval process.

So that you're not editing m.

Believe it or not, some companies do that.

We don't want that.

And then automated tests.

So if we want to have any type of continuous part of our delivery.

We need to make sure we have foundational tests.

Moving on then to dependencies that things that your services depend on are explicit.

One thing that we see far too often are hard coded host names in source code.

That's really bad.

We got to get those expected either into a configuration file or ideally environment variables like environment variables because those are a first class citizen.

And it is very easy to later services being loosely coupled.

This is that your services can start in any order.

Your your API service must be able to start even if your database isn't yet online.

It's frequently an older pattern where well, if the API can't connect to the database.

Well, then the API just exits and crashes.

And this can create a startup storm where your processes are constantly in a crash loop and things only start to settle once the back office kick in and your applications stop thrashing so much so hiccups.

So would that then be more on the developing team to ensure that kind of control.

Yeah So the idea is here that if you feel like this is not a hard requirements list.

But these are like, well, oh, if this jogs a memory.

Yeah, that's how our application works, then what we ask is that they change the way their application works.

So that it doesn't have these limitations.

So much actually working with a client that actually they got wind of your list, but it took your list the opposite.

Hopefully not on all the points, but most of them.

Oh no.

In fact, they're still using confusion.

Oh stop right there.

Take me back there.

Yeah Yeah.

So it sounds like you have a big project ahead of you, which might be changing some of the engineering norms that they've adopted over the years, we've been pretty lucky that most of the customers, like a lot of this stuff is intuitive for that we're like stuff they've already been practicing.

But then there would be like one or two things here.

There that are used like like the one thing that we get bit by all the time is like their application might be talking to S3 and then their application has some custom configuration for getting the access key and the secret access key.

So that precludes us using all the awesomeness of data.

Yes SDK for automatically discovering these things if you set up the environment correctly.

So having them undo those kinds of things would be another use case.

My main point would kind of going over these things would be to jog any reactions like anybody is totally against some of these recommendations or if any of these are controversial or anything that I've missed.

So please chime in if that's the case Banking services ideal services are totally stateless that is that you can kind of offload the durability to some other service some other service that's hopefully not running inside of humanity.

So of course, it's not saying the communities is not good for running stateful services.

It is it's just a lot larger scope right.

Managing a database inside of companies versus using art yes let's see there.

Anything else to call out.

This is a common thing.

Applications don't properly handle sig term.

So your apps want to exit when they receive that gracefully.

Some apps just ignore it.

And then what happens is ease waits until the grace period expires and then just you know speed kills your process the hard way, which we want to avoid.

Sticky sessions.

Let's get rid of those.

Oh, yes.

And this one here.

This is it.

This is surprisingly common, actually.

So we're all for feature flags.

We recommend feature flags all the time.

They can be implemented in different ways using environment variables is an awesome way of doing it.

But making your feature flags based on the environment or the stage you're operating in is a shortcut but not really an effective use of feature flags because you can't test that function.

You shouldn't you shouldn't be changing the environment in staging to test the feature.

You should be enabling or disabling the feature itself.

So that's a pet peeve of mine here and related to that.

It's also not using hardcoded host names even configurable host names in your applications.

That's that if you're running Kubernetes that's really the job of your ingress and your application should not be aware here or even care where you even care.

Exactly Yeah.

Alex siegmund posts in chat here.

See you mentioned on court finding that you should listen on non privileged ports.

But what's the harm of having your application math for 80 for example, if it's a website.

So the harm is really that.

Well, the only way you can listen on port 80 and you're inside your app is if your app starts as root.

And then that's contradictory to saying that you should run non privileged process.

So things that you should not be running your processes as root.

Yes, your container can be as rude.

Your application can start up as rude.

It combined to the port as rude and then it can drop permissions.

But so often things do not drop permissions and then you'll have all these services unnecessarily running as root.

And just for the vanity of the service running on a classical port like 80 is it really required.

And that's not the case.

So exactly so Alex says, aha.

So that's why you should run is not inside of the container.

Exactly So that's our point there.

I think a lot of times what I see is like when the doctor finds that you created because it works as a route.

No one if he goes back and say, all right, let me have a run as a privilege to use those.

So it actually works that way.

And they may hit upon button like the road blocks and don't get to work.

Probably not merely because of a lack of knowledge as to how to do that.

And then say, hey, you know, it works.

Let's just leave it.

And then it just goes out there.

And then something happens, and then that pattern gets replicated right because somebody says, oh, how do I deploy a new service.

Oh, just go look at this other people.

And then they copy paste that stuff over and then it replicates and quickly becomes the norm in the organization.

Yeah Yeah.

So logs.

This is nice to have.

I mean, it's not a hard requirement.

But it really does make a log.

Once you have all your logs centralized in a system like Cuban or any of the modern log systems like simple logic or Splunk is that if your logs are structure you're going to be able to develop much more powerful reports.

And most if you're using any language or framework they a lot of them support changing the way your events are emitted these days.

So no reason not to.

The most important thing is just don't write to this because we don't want to have to start doing log rotation and you can if quotas aren't set up correctly, which they usually aren't.

You can fill up the disk on the host OS.

So let's not do that.

Lastly this is another common problem.

I see is migrations run as part of the startup process when the app starts up.

So if you have a distributed environment you're running 25 3,100 pods or whatever, you don't want all of them trying to run a migration you want to, you want that to be an explicit step a separate step possibly run as a upgrade step in your health release or some pipeline process in your continuous delivery pipeline.

So making sure you can run your database migrations as a separate container or as a separate entry point when you start that container is important.

Same thing with Cron jobs.

We know those can be run in a specific way under communities.

So be nice if those are separate container and any sort of batch processing as well.

So the basics here is that these can run as jobs versus everything else should be either deployments or stateful sets or other primitives like a recommendation registered domain.

Put this under oh the register to the.

Exactly I do that.

Yeah, I should do that.

It's a good list.

So if you have any other recommendations.

Feel free to slack those to me.

I'll update this list.

What's your thought on the read only file system.

You know, that's good.

But you know it's probably good default. It's an optimization.

You know, I make a point here that file system access is OK.

It just shouldn't be.

It should be used for buffering or it should be used for caching or things like that.

But it shouldn't be used to persist data.

So perhaps.

Yeah having it scrapped to your point at your root file system should probably be configured at deployment to be read only.

But then provide a scratch space that is right.

I think I'll take a list and sentiment learned.

So I can.

I follow this.

There you go.

Let me follow up.

Let me know how that conversation runs or goes well.

So we've got five minutes left.

That's not enough time to really cover too much else.

Did this jog any memories or any other questions that you guys have something I know.

Yes, I have one for Kubernetes.

What's your experience with this to you.

I see a lot of hype around it.

Oh so I was actually, hypothetically you know thinking of a solution, which we can make where you can look up the geographic location.

And then we can do it again.

And it releases like release this particular piece to India.

And then the rest of the world to see if there was.

Yeah So any means.

How was holistic business, then.

Yeah And I'm sure there were others here will have some insights on that as well.

I'll share kind of my two bit cents on it.

My my biggest regret with this deal is that its CEO is in a first class cape as it is that service mesh functionality is in a first class thing inside a Kubernetes that we have to be deploying this seemingly high overhead of sidecar cars automatically to all our containers when they go out.

That said, the pattern is really required to do some of the more advanced things when you're running microservices for example, the releases that you're talking about here.

So the thing is that Kubernetes is by itself.

The primitives with ingress and services are perfect for a deal for deploying one app.

But then how do you want.

How you create this abstraction right for routing traffic between two apps that provide the same functionality.

And then track traffic shaping between them or when you run that really complex microservices architecture.

How do you get all that tracing between your apps.

And this are the stories that are lacking in Cuba and 80s out of the box.

So I think service meshes are more and more an eventual requirement as the company reaches sophistication in its utilization of communities.

But I would not recommend using a service mesh out of the box until you can appreciate that the primitives that communities ask.

It's kind of one of these things, we need when people start on E. Yes, I think that's great.

I mean, I personally don't like.

Yes, we spent a lot of time with these yes and I think ECM kind of shows the possibilities of what a container management platform can do.

And then once you've been using.

Yes with Terraform or CloudFormation for six months or a year you start to realize some things that you want to do are really hard.

And then you look over at companies and you get those things out of the box and really.

So this is how I kind of look at service measures.

I think you should start with the primitives that you get on communities.

And then once you realize the things that you're trying to do that are really hard to do.

And that the service mesh will solve.

That's the right time to start.

It's better, to grow into that need than try to just as Eric said just have a litter box and then try to use it for a problem that you don't currently have.

Yeah, I agree on that.

So Glu t.k. price supports for these two out of the box that you know eakins was becoming a big supporter of steel and you know looking at some stats that I've seen out of all of the profit, which goes to Kubernetes 19% of traffic is being sold by service measures to you.

So I don't know what the sample size.

But this isn't what I have seen the random numbers here and there.

So I agree on that.

I have done just for deployment in the Cuban 90s.

And I use traffic for that for it to be flight gross controller and it looks just like you know it wasn't a good fit for that particular deployment, but I don't think you will fly a big requirement of where you know you want to deploy a really complex solution for Ken everyday lives is mostly because that's where I see these two, you can bring in the.

So it can.

It's one way of solving that right.

The other one is to use a rich feature flags by buy something like launch darkly or flag or a couple of these up.

There's a couple of open source alternatives and in that model it requires a change kind of a little bit in your development model, and it requires knowledge of how to effectively use feature flags in a safe way.

But I think what I like about it is it puts the control back in your court versus offloading it to the service mesh.

It's kind of like the service mesh is fixing a software problem while the feature flags are fixing it with software.

Yeah Yeah like we had are quite similar scenario about what we did was to fix it with a feature flex.

So that actually took the model of actually shaping the development to suit that scenario where we could actually roll up features to selective users within our testing group.

And then they would actually do what you need to do from there.

And then we can actually break up for the next couple of weeks.

And so on.

Yep plus you get the benefit of immediate rollbacks right now you can just disable it immediately just by flipping the flag off.

Exactly So go.

So I think it.

I don't think there's a black and white answer on it.

I think that some change to the organization.

Some change the release process making smaller, more frequent changes all these things should be adopted including like trust based development.

Yeah, I agree.

Because I think introducing to their birth their idea behind future flags gives a bit more overhead on the internet operations team and taking it away from the developers themselves.

So unless you're going to train your developers to use steel to actually integrate those feature flights into the deployment.

So I think you have a better chance of using that tool darkly.

I'm starting to do that.

Yeah Yeah.

Good luck setting up the app with all this stuff in the Minikube and doing appropriate tests all that stuff for your development.

I want to know how it goes.

I still want to find some time to get my hands dirty beyond that.

But I may be doing it in a month or something.

How does seem that long term upkeep for that to be a headache.

Yeah All right, guys.

When we reach the end of the hour here we got to wrap things up.

So thank you so much for attending.

Remember, these office hours are hosted weekly.

You can register for the next one.

Bye going to cloud posse slash office hours.

Thanks again for sharing everyone.

Thank you, job for the live demo there.

Yes code was really awesome.

A recording of this call is going to be posted to the office hours channel and syndicated to our podcast at podcast.asco.org clown posse.

See you guys next week same time, same place.

Hey Eric can I ask you a quick question.

Yeah, sure.

So I just ran into an issue like 10 minutes ago and thought hate office hours is happening at clock.

All right, let's.

I do got to run to a pediatrician appointment right away.

So let's let's see if I can spot instances have you managed to deploy a target spot task with Terraform so.

Yes So we use that we have it.

We have an example.

I'm not going to say that it's a good example for you to start out with doing it.

But we have a Terraform.

Yes Yes.

Yes Atlantis module in there.

We're deploying Atlantis as you see as Fargate task using our library of Terraform modules on the plus side, it's a great example of showing you how to compose a lot of modules.

It's a great example of showing you that modules are composed of all.

And it's advanced example.

This example is exactly why I don't like you.

Yes And Terraform they don't go together with you there and we think we have a pretty advanced infrastructure.

It's ready in Fargo.

We're just going to go to Target spots.

So what do you see that module is called.

Oh stop.

Sorry Yeah.

No, no, no, no, no, no.

I don't have spot far gates Spock.

No worries, no worries.

We're going to an area that's interesting to provide provider strategy and see if we can make it do something.

But no worries.

Figured I'd just check.

Yeah, thanks for bringing it up, though.

So misunderstood.

Absolutely All right.

Take care.

But

Public “Office Hours” (2020-02-12)

Erik OstermanFebruary 12, 2020Office Hours

Here's the recording from our DevOps “Office Hours” session on 2020-02-12.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours is February 12th 2020.

My name is Eric Ostrom and I'll be leading the conversation.

I'm the CEO and founder of cloud policy.

We are a DevOps accelerator.

We help startups own their infrastructure in record time by building it for you and then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to unmute yourself at any time if you want to jump in and participate if you're tuning in from our podcast or YouTube channel, you can register for these live and interactive sessions just by going to cloud posterior slash office hours again, cloud posse slash office hours.

We host these calls every week will automatically post a recording of this session to the office hours channel as well as follow up with an email.

So you can share with your team.

If you want to share something in private just ask him could temporarily suspend the recording.

With that said, let's kick things off.

So here are some talking points that we can cover to get the conversation going.

Obviously first, I want to first cover any of your questions.

So some things that came across or came up in the past week since we had our last call.

Terraform cloud.

Now supports triggers across workspaces.

John just shared that this morning.

I'll talk about that a little bit.

The new ADA of US clay is available with no more Python dependencies.

However, I'm still not celebrating it entirely based on my initial review.

Also this is really wise is quote that was simply put in our community yesterday or some things like you can't commit to the overhead required to run something you're introducing a weakness into the system rather than a strength as they'll quickly end up in the critical path.

So that was the way that Chris Child's said something and I want to talk about that some more.

See what reactions we get.

But before we go into those things.

Let's see what questions you guys have.

I have one thing when you're going through tariff for Terraform cloud can you also go through just your general experiences with Oprah.

And I were looking at using it earlier this week or just having a little bit of some pain doing so.

Yeah, just some general experience that would be useful.

I can give you some kind of like armchair review of Terraform cloud.

We are not using it in production as part of any customer engagements we've done our own prototypes and pieces.

So I think the best thing would be when we get to that point if the other people on the call that are actually doing it day to day.

I know John bland has been doing a lot better from cloud.

I don't let me paint him on suite ops.

Let's see if he can join and share some of its experiences.

Do you guys know if you can continue using remote state with S3 with Graham cloud.

I couldn't figure out how to do that.

Well, you should be able to let me explain.

Mm-hmm It's a good question.

But IM not 100 percent of what you put into it.

So yeah.

So I Yeah, I cannot speak from experience trying to do it.

What were your problems when you tried.

I mean, I assume you had the best credentials and everything hardcoded.

And if you had that provider block or that back in Setup set up it was airing or it requires that validates that you have a from Workspace back in.

I personally came to find the place where you can even put Intel from cloud the crates.

I would be in environments settings.

So there's the build up using it as an environment variable.

I know by guy I exactly you have to do that for every single workspace yet as retarded as it is.

Exactly we don't like that either.

No awesome.

John's joining us right now.

So John has spent a lot of time with Terraform cloud.

So he can probably answer some of these questions or you and Mark.

Welcome howdy.

I is going to have you mark have you gone to play with Terry from cloud at all yet.

No, I haven't even browsed the docs.

OK Just curious.

But Brian, your.

You've been dabbling with turn from cloud a little bit or.

Yeah, just taking it out.

It was because I was working on data from provision provisioning of my EFS housing on a kill two birds with one stone.

Yeah And dabbled with it didn't love it.

So I probably am just going to do my on provisioning code fresh.

Yeah, it's a little bit more intuitive for me, especially because I used her from CLI workspaces.

Yeah, it it'll be a lot easier for me to implement something that's driving reusable if I were to just do like a cut.

I could fetch that already does the right like workspaces commands for me.

Yeah, I'm hoping that maybe in a couple of weeks or a few weeks, maybe we can do a revised code fresh Terraform demo on this.

We did one about a year ago or more.

But this time Germany on my team is working on it.

And we want to kind of recreate some of the constructs of Terraform cloud.

But inside a code fresh.

So that it integrates more with all the other pipelines and workflows we have there.

On the topic of terror from I would want it, I would want it.

So like I go back and forth on my decision to use Terraform workspaces.

I love the fact that it was so easy to use the same configuration code for so many different environments and I've been able to take advantage of that.

Why didn't love was having to kind of act together way to get all the back end to point to different S3 buckets and different AWS accounts.

I'm curious if anybody's ever worked with that.

Plus if they ever switched off of it to go the other route where we kind of have configuration per her database account that might be less dry.

Such using tigre.

OK, let's.

Yeah OK, let's table that temporarily.

I see John just joined us here.

Let's start with the first question there on first hand accounts and experiences using Terraform cloud.

I know John has spent a lot of time with it.

So I'd love to for you guys to hear from him.

He's also a great speaker.

Well, thank you.

I've seen the check in the mail.

I actually I've done a lot.

Whatever form cloud as the primary c.I. for all of our Terraform and generally, I like it mainly because it's a little malleable like you can use it for like the whole Ci aspect or you can run your c.I.

I mean, you're Terraform in fresh air anywhere.

And it's just your back.

And that's it.

Instead of having S3 buckets everywhere all your state is just stored.

They're really easy to do remote data things that I saw terraforming it is really easy to do.

They have a provider that gives you poor access.

And I did see on the agenda there the talking points the workspace, the run triggers actually did it video or that be wasn't that only to already.

Well, yeah, except now.

Yeah, I've wanted to just play with it non-zero models were recorded and it's decent.

I think they have some improvements to do to be able to visualize it.

But we actually do utilize I forget who it was speaking Brian.

I think we do utilize multiple AWS environments and from our Terraform scripts where we set it up.

We actually have each workspace control or we tell each workspace, which the environment is going to use.

Now this is using access key and secretly preferably we'd have something a little more cleaner that was a lot more secure than just having stale access fees sitting around.

So that's one gripe I didn't have with it.

But in general, we've had a lot of success running it in Terraform cloud water.

OK So you're leveraging like the Terraform cloud.

I don't want to say it's like a feature chair from cloud.

But the best practice a chair from cloud using workspaces or using lots of workspaces and terrified and how has that been.

Because while workspaces has existed for some time it wasn't previously recommended as a pattern for separating you know multiple stages like production versus dev.

How's that working out.

It actually worked out really well, because locally where you set it up, you can set up locally.

Sure Yeah.

So I mean, the reflection here once again.

There you go.

There is a difference between tech from cloud workspaces and from CLI workspace stuff.

Yeah, sure.

So the this is just my little sample account that I play around with the tutorials.

But if this was set up locally in the CLI and because this prefix is the same.

I can set my prefix locally and my local workspaces would be called integer and separated.

And so locally.

It maps directly to the local CLI actually.

So I can say Terraform workspace, select integer.

And now I'm on that.

And I can see a plan and it'll run not playing right writing on Terraform cloud.

I don't have to have variables or nothing locally.

It'll run everything in there unless it's set up as a local because you have multiple settings here.

Would you be able to do that right now.

Sorry to put you on the spot.

This is exactly what we're trying to do.

And if you're saying that it's actually much easier than I initially thought then I might reinvestigate this.

Let's try it.

But thanks for roll.

It's lights up for everyone else.

Maybe if you just joined.

John has been put on the spot here to do a unplanned for demo of Terraform cloud and workspaces.

Possibly even the new beta triggers function.

But on a different sort as it's set up there and working.

I can definitely walk through the triggers.

Peter would be especially useful for us as well, because there are scenarios where we run multiple turns from the place where you can sleep to reports, a serious long shows.

And we can also get close to Yum It want to have like five minutes and then we can talk about some other things and come right back to this.

Yeah let me get a few of my things to worry about just the connection and all that sort of stuff.

Yeah Cool.

Let's do that.

And we'll just keep the conversation going on, other things.

See what we can talk about there.

All right.

Any other questions.

All right.

So I guess I'm going to skip the Terraform cloud talking point about triggers across workspaces.

I think that's going to be really awesome to get a demo.

Basically to set that up as you decompose your terror lists into multiple projects.

How do you still kind of get that same experience where once you apply changes in one environment can trigger changes in another environment.

And that's what these triggers are now for moving on AWS has announced this week, that there's a new clay available.

I'm not sure how new it is per se, but they are providing a Binary Release of this clay.

I suppose it's still probably in Python they're just compiling it to write code might.

The downside from when I was looking at it is it's not just a single binary, you can go download somewhere there's still like AWS clay installer.

So they're following like the Java pattern right where you still got to download zips and sell stuff.

Personally, I just I've gotten so spoiled by go projects which distribute single self-contained binary.

And I just download that from get up to this page.

And I'm set to go.

So has anybody given this new client, a shot.

Now you mark calling you out.

All right.

Don't buy it yet.

Cool And then there was one other thing that came up this week as somebody was asking kind of like you know I think the question the background question was like alternatives to running bolt and if it's worth it to run bolt and Chris Chris files responded quite succinctly so thank you.

We've heard this said before, but I thought this is a really succinct way of putting it.

And that's like if you can't commit to the overhead required to run some new technology like Cuba and 80s balls or console you're introducing a weakness into the system rather than a strength as the quickly end up in the critical path of your operations.

And I think this really resonated with me, especially since we run this DevOps accelerator and our whole premise is that we want our customers to run in and take ownership of their infrastructure.

But if they don't understand what they have, and what they're running, then it is a massive liability at the same time, which is why we only work with customers ultimately that have some in-house expertise to take over this stuff.

And also Alex just Yeah getting some thumbs up here from Alex Eagleman and Bryan side both agreeing with this statement actually.

Yeah, that actual response to the that's what's the response to something I mentioned.

So the original person asking a question that came about came at then and also get.

From what I understand.

And I just thought that maybe I'd remind them you know like maybe want to just give centralized management a shot the volt really is going to want to sound like I'm pushing it that much.

But the reality is if you look at a hash record that created Terraform created volt they make most of their money from both.

They really do put a lot of product hours into featuring sorry.

Why didn't the feature set that product.

So really, it's a mature solution.

Yeah 100 percent when it comes to houses response.

It's very true.

What happens is actually, a lot of the time is if you don't commit a lot of people they like they take the route token, and then they distribute it to everybody and it becomes more of a security hold than a security feature really.

Yeah And really, it's reminiscent of terminators as well.

In my opinion, we really need like a large team of people putting energy into that to actually make full use of it.

So it's not a burn anymore.

It's actually something that can help you pick up velocity exactly like you want to take these things when it gives you an advantage a competitive advantage for your business or the problems you're trying to solve.

Not just because it's a cool toy or sounds interesting, but Yeah, those are really good summary.

Thank you for it.

For a peer.

Just secrets management.

I would.

Probably doesn't have all the bells and whistles of a vault obviously.

But I went with it to be a secret manager.

Or you can use parameters store does it much easier to maintain.

Yeah Are you making copious use of the lambdas as well with Secrets Manager to do automatic rotations.

Not yet by but definitely something that I wish I had time for.

Yeah, because it also requires application changes right to get all right, John, are you.

You need some more time.

No really.

All right.

Awesome Let's get the show started.

So this is going to be a tear from cloud demo and possibly a demo of triggers across workspaces, which is a better feature and terrifying cloud.

So this isn't going to actually give me a plan, because I don't have the actual code for these repos locally on this computer.

But this is the time to show how the workspaces actually work.

So essentially, you set up your back in as remote hostname.

You don't really have to have it.

That's the default. What organization and in this case, I'm saying prefix.

So if I actually change this in the say name to random.

And if I did in a net on this.

It no.

Yes, I need to.

Yes, there because I already initialize.

That's why so by setting the name essentially, it's supposed to.

What did I miss.

They weren't just a minute ago, I promise.

It's always this way.

Let me clean up this dirt one.

But by saying a name I kind of found this by accident.

I didn't mean for it to do it.

But they go it'll actually create the workspace for you.

So you technically don't have to do anything to create a workspace.

It'll do it for you.

In that case, it doesn't give all the configuration there.

But if you utilize prefix here instead of name just wipe it.

What it does is basically create to Terraform cloud and says, hey, everything with this as a prefix is going to be my workspaces.

So in this case, I can say, let me select the integer workspace.

That's awesome.

And so if I do a workspace listed, you'll see that same list there.

And then you can do select separator any one of those.

You can also.

And I'd have an alias for Terraform by the way.

So that's why I'm just saying to you.

You can also get your state.

Of course, if you have access to that.

So we can say show we can pull that locally.

And so it'll output the actual state here.

And then my favorite part is actually planning.

So I don't have any variables everything.

Mind you this workspace doesn't have a lot anyway.

But it's actually running this plan on Terraform cloud.

It's piping the same output.

It's common just like you would normally expect.

So it's piping everything to my local CLI here.

But for console.

But it's basically this.

So you can see the output matches.

But the beauty of this part is I can have all of my variables in here completely hidden.

Any secrets that I want and none of my developers ever see them.

They never know that they exist or anything locally, but they can play an all day and do whatever they want.

And so this is destroying because I don't have the code.

So it's like, well, it's gone.

This random resource integer, but that's the quick run through of utilizing those workspaces locally.

I mean, it's really just this.

And I have a tee up bar set with my token locally.

So that's I have that work to also go back to the tower from cloud UI that we had the settings.

The variables because just because it came out the second a little while ago.

Brian was asking about environment variables you see there that bottom.

Brian Yeah.

If you need obvious credentials you can stick with me.

OK You're not using a dubious provider right now.

No, I'm not going to put them off.

Yeah, no.

OK And nothing precludes you from using the obvious provider.

So long as you still provide the credentials.

Yeah Yeah.

So you know your random workspace.

I got created.

Do you have to go.

Do you manually go in and add the eight of his grades for those I actually Terraform the entire thing.

So that's all done through Terraform so basically terraforming Terraform cloud.

So essentially I'll generate a workspace and that'll help my general settings there, and then I'll just you the last two or three variables.

And you can do environment variables this way too.

So you can kind of tie this in too with like your token refreshes and things of that sort.

Especially those I've mentioned or.

So there's the ball provider and you can actually tap into a ball here actually gets you a token key from your AWS or however you want to do your authentication there get your token from AWS your access key, et cetera.

And then plug that into Terraform cloud.

So that way it's all automated.

And you're not just wasting variables in there.

Do you have to run your own vault or do they run that for you know you could if you're running your own.

So I would assume assume someone doesn't have all.

How do you plumb anybody's credentials and or ask.

Yes token generation in.

So we just came.

So just utilize came to mark them all as you like.

You don't want to put that stuff in code right.

And so use came as encrypt the values manually.

We built a little internal tool to do it.

But encrypt those values put them in code and then once the workspace actually runs, it'll actually create manage update all the other workspaces.

So in essence, you have one workspace that has all the references to all the other work spaces are supposed to create and it'll configure everything.

And so in that one, it'll decrypted came and then add it to the project or the specific workspace as a environment variable.

And so there is when you say came as you're using SFA we do use that system to store that the product of the commercial kitchen blob no.

Now we just encrypt the value in games.

But we actually we actually use it as a Sim for farm gate secrets.

But this repo here is where I actually have a video of it where I kind of walk through how to do the full thing with Terraform your own workspace and then using their remote data as well to pull from.

And so the pipeline feature that was playing with earlier essentially this repo.

I mean, this workspace is going to trigger this one it's going to trigger this.

And so the way it's set up and they definitely say do not use this in production yet.

But these run triggers here.

So you can specify all your workspaces that you want to actually trigger something here.

And so anytime they trigger oh it's a loop because I already have that one set anytime they trigger they will actually trigger the next one.

When we delete these real quick, and I'll just show a click Run.

And so if I cue this one where it finally kicks off.

There we go.

So that's going to go through the plan.

And this is just going to generate a random integer.

It has an output and all it does is output the results of random integer.

And so once it finishes the plan is actually going to show me which or any workspaces that it will trigger next.

In this case random separator so if I go ahead and Confirm and so this one is applying if I come over to random separator nothing's happening here.

I'm not I haven't hit anything haven't pressing buttons.

This appliance finished and there's random separator that was triggered automatically.

And so you can see like it's essentially going to go down the line there.

The good thing is it will tell you here that the run was triggered from this workspace.

And it was triggered from that run.

So you can kind of rabbit hole your way backwards into finding where and what actually triggered that one.

And so if I confirm and apply on this one that someone is actually going to trigger the last one, which is random pat.

Now, pull up the code real quick as well so that when finished and random pat is here.

And there's random pit running.

Quick, quick question.

Do you guys ever use it because I know what the VCR is integrations.

You can actually kick off a plan and apply from GitHub for example together is that.

Yes exclusively.

Yeah And so this is these repos are actually tied up to get up as well.

You do the confirmed circuit collaborative send them to the UI here do it through the UI, you could tie it in like there's a CSI you can tie it in and do it through any c really.

And so there's the end of the pipeline.

But as you can kind of tail like you will rabbit hole right like you're here.

And then it's like, well, was generated from here and you go back there and it's like, well, one was generated from another one.

And so then you end up having to go back there.

So a good visual tool would be really useful.

Jenkins blue ocean or something where or circle C I kind of chose you the path of something would be really useful, but it's kind of interesting.

I'm sure that's coming.

Yeah So this code just to show this real quick isn't using the remote state.

And so I set up variables manually for this demo, but utilizing these variables.

And it just uses the remote state data to get the value from the integer workspace.

And then the pet basically uses to remote datas to get the workspace state for both the separator and the integer workspace and then it just uses it down here in the random bit.

So it's decent.

I'm liking where they're going.

Yeah, I think this has some potential, especially to minimize the configuration drift and simplify the effort of ensuring that change is promoted through multiple workspaces.

And the good thing is like if you saw on separator I actually had to confirm and of course, that depends on your settings.

Of course.

Because you can tell it if you want to auto apply or manual apply in this case, I'm set to make makes sense.

But it goes auto.

They would have just cheered it all the way down the rewind as many.

And so the practicality of it is like if you separate your networking stack from your application and you update your networking stack and for whatever reason, it needs to run the application form as well.

You can kind of automate that now as opposed to where you ran that one.

Now the one person in the company that knows the order thing can go in and manually hit q on something else.

So yeah.

So I think there was some questions in the chat here.

Let's see.

Alex Sieckmann asks, how do you handle the chicken and the egg problem with bootstrapping saying AWS account and then Terraform enterprise to have creds and such.

Actually a good question.

So it would have to come from some somewhere right.

So like especially if you set up like AWS Organizations.

And you had like your root account that you were set with you can utilize that Reed account.

And you can actually Terraform it obvious orbs and then once that new account is set up, you can assume role and those sort of things in order to access that other client.

I mean that other account.

But there is still some manual aspects of that right.

Like you have to search your email address and then that email address is your root account.

And then you want to kind of lock that down.

So you can do use like some service control policies and things of that sort.

But there's still a little bit of a manual piece to bootstrap a full account.

That's the part that really sucks and we go through this in every engagement right.

Because if you don't reset the password and have MFA added anybody with access to email of that root account of that sub account or for that matter can do a password reset.

And take over the account.

Yeah, exactly.

So there was a question about automated destruction.

So terrifying cloud actually requires you to set confirm destroy.

So what.

So if you do automate confirm destroy set to 1, then yes, you can.

You can delete from trip cloud.

But you can't cure or destroy unless you actually have that environment variable z So you can set it like and have it as a part of your workspace and then you destroy will actually destroy it.

Nice cool.

But yes, that aspect of the chicken and egg is something that is definitely something that could be cleaned up on either side, just to help the bootstrapping especially for the clients that have like 78 IBS accounts.

Yeah which isn't as abnormal as it sounds it's one enterprise.

Yeah Any other questions related to Terraform cloud and put this in queue.

Not sure cost value.

Opinions now it's way better than before.

It was rough like multiple Tens of thousands of dollars for the Enterprise version.

And so now it's actually to where you can basically sign up and utilize it now for free.

You have to keep in mind that it is still a subset of different features.

But it is really good.

And it is.

Obviously, if you're a small team and you don't have $100,000 to spend that yet.

But I figure that it is a subset.

Yep And so the main things that you do miss cost estimation is actually pretty cool.

It will tell you if you're starting up like the T3 micro how much that's going to cost or involve large it'll kind of give you those pieces as an estimate.

But it kind of helps.

And you can utilize sentinel which is basically poly policy as code utilized sentinel and say no one can create a project.

If the cost is over $1,000 or whatever.

Or you can say, hey notify somebody or whatever already requires approval.

And so syncing it was actually pretty useful.

And then, of course, you get the normal sample.

And this is the private install a small sample clustering as you go up.

But you can.

But funny how it goes from unlimited worth.

Everything else is unlimited and unlimited workspaces the enterprise no matter limited anymore just 100 plus.

But I mean, really, this free up to five users is pretty much all you really need unless you are on a larger team than movie roles and the role basically plan read right now and admin support.

But the private registry is actually pretty cool too.

I think, as I said, as a profession need to push back on enterprises that try to make you pay for security here the MLS ISO is pretty much the only thing controlling the keys to your castle and I don't think that it's right for people to hold security as a tool for making money.

Yeah, that's like always there 1, 2 right.

Yeah, but I hope we get the industry aligned with security as a first printing like the first class it isn't all products, not just for if you're willing to pay.

Yeah, there were other.

We've shared this before like the SSL attacks website, go to ss no doubt tax, then it says it all.

Yet it's funny.

It's the wall of shame and the price increases with pleasure.

Areas things I need to add terrible glad to know.

Exactly base price pressure.

So price.

It's just insane.

The gouging that goes on.

Look at HubSpot my Jesus.

63 percent increase 586 call us.

Well, you want to factor that's going to cost you.

Yeah Yeah.

Two factors another.

Yeah Well, I mean that comes usually with whatever you picked as you say.

So Right but I don't just a cloud offer to factor.

It does.

Yeah So I have one set up here just normal.

I use all the networks, but then again, I'm also I have my day jobs account on theirs.

And it's paid.

So maybe that's good.

Yeah, maybe that's where it comes from maybe that's not over for.

Any other questions on cloud for a small team.

Do you guys think $7 a month is worth it for just set to no.

It depends what kind of roles do you have in place for like your instruction.

My team is a team of one right now.

So there is no like actual rules automated but obviously being proactive about it.

Just when I'm speaking of infrastructure.

But as the team grows.

And I think we're growing our security function here too.

I think a lot of security engineers I'm talking to, they're doing it manually where they go into your database console and like check it you know your last $3 are public.

I was saying like we could automate this with a sentinel.

I was curious if I do say a team of five is $7 a month worth it if you have those sort of rules in place.

Yes Yeah.

This is basically like a pre-emptive you can choose to block or you can just walk.

And so in this case, it was essentially a function and they're adding to this resources and they end up pulling it back.

Right And so you can basically take those things.

And as you validate them you can give specific messages that you want.

And basically say yay or nay if it's approved and it'll basically block the run.

So it is a good way to catch it ahead of time.

And you can catch some of those things.

Another thing that you can do as a team of security.

We talked about open policy agent integration with Terraform that can also do some of this stuff and also someone else recommended comm test, which is built on top of.

OK and add support for HCl and Terraform plans as well.

Yeah, there's a little library that's like a Lancer as well to offset this one.

And it's pretty decent too.

And I can catch like $3 an HTTP where so they should be.

Yes And it also provides a way here to where you can take these rules and you can actually ignore it for like a specific line like if you want an ingress here and you don't care about this.

This rule here.

It's just it's a requirement.

You have to have it mean you can ignore it.

But you can tie this directly and with see I can just run to you set up for locally with Docker and so I would probably start there as opposed to going to sentinel because then you do have to manage quite well you have to write the Central Policy you need to manage that.

And then you assume a lot of that risk at that point to you know all you have to develop all those opinions on what you mean.

Right well that makes a lot of sense though, when you have like cyclops that focus on that if you're the 119 and suddenly just adds to your plate.

Can you share this link to get up 45 seconds and office hours channel and the episodes after shooting for.

And let's see.

So we got 15 minutes left or so 15.

There were some other questions here unrelated to Terraform cloud as one to see if we can get to that.

Alex, do you still want to talk about this your Prometheus question.

See I can't.

He is chatting in the Zoom chat.

Looks like Zac helped you out with the answer a little bit.

I guess I'll just read it for everyone else's benefit.

How do you know.

Let's see.

Assuming you have the Prometheus already running the Prometheus operator and you run gipsy deal get Prometheus all names faces you'd set up a Service Monitor.

Oh, this is from Zach.

I did not.

Yeah, I gave him like 1,000 foot view of an answer for how to set up a Service Monitor for Prometheus a custom service running in a cluster.

I wouldn't have answers so quickly if I weren't your candidate the exact same moment.

That's cool.

Yeah, maybe we can.

The essence, you don't have to make today.

Let's punt on the question to next week if you're able to join us and we can talk more about service mind stuff.

I'm curious if anyone else is using anything other than custom rules for you know, if there's any other tooling out there for Service Monitoring or adding you know people have multiple teams, multiple microservices and you know if there's any organizational strategies around tooling.

This in a declarative manner any I can answer how we do it.

But I'm interested also first before I talk about what other people are doing.

We've talked about just monitoring the individual services oh Yeah.

Just Prometheus right.

Hangs in multiple services and you know there could be a thumb roll.

You know that some of them come and go ensuring that generic monitoring gets put in place and teams that they want to put extra and additional monitoring and you know, for items you know that those are also able to be deployed.

Yeah I'm just really struggling with the getting a good template going I thought.

Yeah Are you using helm your.

I am.

Yeah then sorry I missed my computer's not responsive here.

So then yeah, I can kind of show you because this came up recently, for example, with century.

That's the good example, my ship is going to invest one but I'll show it.

So we've talked about in the past that we use this chart called the chart that we developed.

Zach, are you familiar with the chart.

Dude I am so familiar with the model chart.

OK created my own version of it.

So yeah.

Well, thank you.

Yeah So the pattern there that we have.

And then are you familiar with the service monitors that we like the Prometheus findings that we have in the model chart, you know I probably should go revisit that on that.

Honestly, I haven't looked at it.

OK So I will give an example of that here and I'm getting it cued up in my browser.

So let me rephrase the question or let me rephrase.

But let me restate the question and add some additional context.

So in my own words, I think what you're describing is how do you offload the burden of how a application is monitored to the application developers themselves or the teams at least responsible for that service.

In the old school model it kind of be like employer your services, and then you throw it over the fence to ops and say, hey, I was deployed.

Update now those are some archaic system like that.

And monitoring and that never worked well.

And it's like very much like this data center mentality static infrastructure.

And then you have a different model, which is kind of like an Datadog where it will maybe auto discover some of the things running there and figure out a strategy for monitoring it, which is magical but it isn't very scalable right.

Magic doesn't scale.

So you want something that allows configuration, but also doesn't bottleneck on some team to roll that stuff out.

So this is why I think Prometheus operators pretty rad.

Because you can deploy your monitoring configuration alongside your apps themselves.

So we had this just came up kind of like what you said Zack about you were just actively working on this other problem that Alex heads.

That's why I was fresh in your memory.

So this is something that we did yesterday, actually.

So we run a century on prem.

We've had some issues lately with century stop ingesting new events while everything seems totally normal.

So it's passing health checks.

Everything's running everything's green and hunky Dory.

But we wanted to catch the situation where it stopped processing events.

So at the bottom here, we've added using the motto chart, for example, we don't have to create a separate release for this technique but we're doing that here and using them on a chart.

What we do is then we add the Prometheus rules.

So we can monitor that the rate or the delta here of jobs started in five minutes over a five minute period is not zero or in this case is 0.

So that's when we alert k minus 1.

My point here.

Those So let's see are we using mono chart to deploy this.

Do you have something that keeps a baseline level of jobs starting a busy cluster a busy environment.

So like is this generalizable no.

But in this environment.

So here's the thing.

Oh, I think it's generalizable because you could make that Cron job you know it does.

So in our case, we have century Kubernetes deployed.

So we have a pod inside the cluster that is ingesting all of the events from the Kubernetes API and sending those to century.

So you could say that we just buy it by having that installed.

We have our own event generator because Kubernetes is always doing something right.

So we ran when we ran this query we side identified the two times over the past month that had the outage.

So we deployed it, and went live with that.

But I just want it.

So this mono chart though, is this pattern where you can define one chart that describes the interface of a Mike or service in your organization.

This happens to be ours that we use in customer engagements.

But you can add you can forget or you can create your own that does the same kind of thing.

And let me go over to our charts here and see Mike in more a different example that we have.

So here's an example, like a simple example of deploying an engine x container using our motto chart.

And the idea is that, what does everything you deploy to coordinate his needs.

Well, it needs.

Well, OK, if you're pulling private images you're going to need maybe possibly you'll need a secret.

So we define a way of doing port secrets.

Everything is going to need an image.

So we define a way of specifying the image.

Most things are going to need config maps.

We define a simple way of having consistent config maps and all of these things are a lot more terse than writing the raw Kubernetes resources.

But you can also then start adding other things in here like then we provide a simple way of defining infinity rules.

So you can specify an inline affinity rule, which is very verbose like this, or you can just use one of the kind of the macros the place holder ones that we define here should be on different node.

And this is an example of how you can kind of create a library of common kinds of alerts that we deploy.

Now I'm talking.

I'm conflating two things affinity rules with alerts.

I just happen to have an example here of infinity in helm files here to share your screen.

Oh my god.

I can do that again.

But I'm just always used to having my screen shared.

So I. Yeah So sorry.

OK So this makes it a little less handwaving then by seeing my screen here.

Here's what I had open, which was just an example of using our monocytes chart to define the Prometheus rules to alert on centuries.

So here at saying century job started minus a century job started five minutes ago.

And if that's zero we weren't on.

So mono chart itself.

Here are using Monroe chart just to define some rules.

But Monroe chart allows you to define your deployment.

So here's where deploying engine x we're setting some config map values we're setting some secret some environment variables.

Here's the definition of the deployment.

But we also, unfortunately, we don't have adjacent schema spec for this yet.

So you kind of got look at our examples of how we use Monroe chart.

And that's a drawback if I search for this here that we'll find a better example.

Who's so for example, I'm not sure Calhoun is going to help fly the where we use Monroe chart frequently is a lot of upstream charts that we depend on.

Don't always provide the resources we need.

So then we can use Monroe chart much like we used the rod chart to define the rules.

So here is where we're deploying Cam for a k I.

This is a controller that pulls metrics out of k I am and sends them to Prometheus.

So somebody provided a container for us.

But the chart was apart.

So we just used our monocytes chart instead.

So here we.

Define a bunch of service RGB monitor rules to monitor.

In this case kIm so this is complicated like using the raw expressions for Prometheus but I don't want to say that like in your case.

Zack what I would do is I would define canned policies like this that you can enable in your chart for four typical types of services.

OK So that is the route I'm going.

And so with the sanity check means I'm not going the wrong round.

I know it's just it seems like a lot of work.

It is.

But the thing is like so.

But nothing else does.

There's no one else is doing this.

So this is like I say, I don't see.

I haven't seen any other option out there, aside from magical auto discovery of things running for monitoring this thing where applications, deploy their own configuration for monitoring very.

I don't know of any Sas product that does that.

And it's very specific to the team and organization and the labeling that you have in place.

Yeah So.

All right.

Well, I mean, did the model chart is the right route in my mind as well.

I've been going that route.

I call it chart architecture.

Yeah but I'm using that to do a bunch of other deployments and not forget to microservice.

So this will be rolled into it.

So thank you for the answer.

Yeah, no, I want to just add one other thing that came up just for to help contrast the significance of what we're showing here is yes, this stuff is a bit messy.

I wish this could be cleaned up.

And it wasn't as dense, but when you compare this to like let's say, Datadog and Datadog has an API.

There's a careful provider for Datadog.

But I would say that's the classic way of setting a monitoring.

It's a tad better than using nodules because there's an API, you can use Terraform but it's not much better than using nodules because there's still this thing where you deploy your app and then this other thing has to run to configure monitoring for that versus what we're saying here is we deploy the app and the monitoring of the app as one package to Kubernetes using.

Well, we're almost out of time today.

Are there any last thoughts or questions related to perhaps as Prometheus stuff.

I didn't check if you posted anything else here.

Alex and Chad thank you for the Terraform cloud demo.

Thank you.

That was all a demo.

Thanks, man.

No problem.

Was month month this year.

I think it's half on sales next month.

The lies and generalizations which can be hard.

I suppose for most HP REST API.

You could do some kind of anomaly detection or basic five minute alerts but there is Yeah, there's not a general, there's no general metrics across all kinds of services.

So Yeah, that's right, Alex.

So that's what all these other services do like data dogs this thing is they'll provide you some good general kinds of alerts but nothing purpose built for your app.

All right then let's see.

I'm just going to be up closing.

Slide here.

Well well there you go.

There's my secret sauce.

That's what we're doing here.

We're at the end of the hour.

There are some links for you guys to check out.

You enjoyed office hours today.

Go ahead join our slack team if you haven't already joined that.

You can go to a cloud posse slash slack you sign up for a newsletter by going to cloud posse slash newsletter.

If you ever get registered for office hours.

Definitely go there and sign up.

So you get the calendar invite for future sessions.

We post these every single Wednesday.

We syndicate these to our podcasts.

If you go to cloud policy slash podcast.

You can find out the links where you can subscribe to this.

Like iTunes or whatever podcast player use connect with me on LinkedIn.

And thanks again for.

Yeah, for all your input and participation area.

This is awesome.

What makes meet UPS possible.

Thank you, job for that presentation.

And I'll see you guys all in the hall next week.

Take care.

Thank you guys.

Thank you.

Page 30 of 34
←
1
...
29
30
31
...
34
→