Public “Office Hours” (2020-03-25)

Erik OstermanOffice Hours

Here's the recording from our DevOps “Office Hours” session on 2020-03-25.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours.

It's march 25th 2020.

My name is Eric Osterman and I'll be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator.

We help our companies own their infrastructure in record time by building it for you.

And then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

If you want to jump in and participate feel freedom.

If you're tuning in from our podcast or YouTube channel, you can register for these live and interactive sessions by going to cloud the posse office hours.

We host these calls every week will automatically post a video recording of this session to the office hours channel as well as follow up with an email.

So you can share with your team.

If you want to share something in private.

Just asking you can temporarily suspend the recording.

That's it.

Let's kick things off.

There are a bunch of questions brought up today in the office hours channel.

I thought we could use those as a starting point for some talking points.

And first, I want to, of course, start with getting your questions answered.

So any questions from the community.

Good evening.

I'm on the Scott frost.

I'd like to understand a fuel Jenkins suite of Terraform tools that build a full end to end pipeline are all complete.

Or is there additional work that we would need to do.

All right.

So if am getting the your drift you're talking about this care per module.

That's correct.

Gotcha So this.

I'll say that we don't typically use this module regularly because we're mostly using code fresh.

We have kept it up to date as recently as five months ago.

And not only that we have added the terror tests to this module to validate that the server comes online and is functional.

That said then there's two pieces of this right.

So there's this Terraform module here.

And what's unique about our module for our Terraform from module for Jenkins is that it does see ICD of Jenkins itself using code building code and pipeline.

So it's a little bit better.

And then there's the Jenkins Docker image.

Now we are not as you can tell by the age of that we're not actively investing in it because we don't support and none of our current customers are actively using it.

Also we're moving more in the direction we're actually doing a project right now with Jenkins but we're going to be deploying Jenkins with helm on Kubernetes netizen said.

So this approach.

I think is valuable for those who don't want to run Kubernetes but it's not the one we're investing.

Thank you for that.

Looking at the architectural diagram apologies.

I'm no expert in this particular area.

I've gone through the Terraform code.

And I can understand most of this.

But the one bit that seems to be missing is the slaves configuration thing or is that something that we need to build ourselves.

So Yeah a good question.

So as you see in India and its aging are this was designed for us that you would then point Jenkins at a Kubernetes cluster.

So you might wonder why.

OK So first of all, if you're using Kubernetes is kind of moot.

You'd have to do something else.

You are thinking, why we created Jenkins as a standalone instance and beanstalk running outside of Cuban Hayes was that we could run on the get UPS the commands to it administered junk Kubernetes itself.

So that basically Jenkins could rebuild the Cuban at his cluster.

If we wanted to.

So one way is to have the slaves running as EC2 instances which there's just a plug-in for that.

And the other is running those slaves as pods under Cuban netting.

So if we go to cloud posse and search for Jenkins.

I just want to point out one thing.

So there is new or better ways of doing this.

There's like Jenkins configuration as code.

We are.

This is not using that plug-in.

This is using just straight up groovy.

I wasn't the one to implement it says a lot.

I'm not as fresh on it.

But the idea is with this groovy this.

Here's how you can initialize I bleed groovy with EC2 instances as the workers.

And then I think there's I think I'm maybe mistaken.

Anyways look over this Jenkins container because the Jenkins container is the one that sets up everything for that.

And if you look at the plugins that see these are the plugins that it installs out of the box.

So if one adds the there's an eight of plugin for running slaves on it.

Yes there's also a plug-in for like running using code build I think.

So there's a few different options that anybody else have more to add to this.

Anybody else maintaining tier 4 modules for Jenkins so thank you very much for explanation.

If I wish to put you on the spot and say whilst you guys are doing limited maintenance on this as a company that might be looking at this approach probably for the first time to get to roughly what your network diagram displays and the flexibility that we believe brings to the table for us should start now.

Traditionally my company is pretty much had AC 2 machines that have just been that formed and standalone Jenkins reference to us.

This is much more of an improvement.

Would you advise not going down this road and looking at some of the other Jenkins modules you've mentioned and steering away from this because it's old hat just so put a freeze on it.

So two things.

I mean, this is something that's going to be the critical path for your business and something you're going to want to extend or customize.

I would probably start by forking the module.

And then customizing it to your needs there.

Also, I just wanted to I was just joking here.

We've also updated this.

So the data pipeline to do backups is simplified now because W as backups came out since we did this diagram.

So now the backups here is done with those backups.

So to answer your question, though like would I recommend this or something else.

I'm not sure.

I haven't done any research on Jenkins Terraform modules.

When we implement this it was like back in 2016 or something at least two years ago.

When we started.

I think earlier than that, though.

And it was very bespoke to a particular you know you guys decided that you wanted to build Docker images using Jenkins save it to and deploy it to Elastic Beanstalk and if and if someone else wants to do something different for any of those, then this doesn't work for them right.

No, it's opinionated as it is a lot of the modules.

But here's the thing is like you want to, you want to have you want to deploy Jenkins to us.

You have I guess you have three options high level right.

You could do farm gate you could do cars or you could do beanstalk or OK.

And then lastly, you could just roll your own you see two instances if we go the roll your own you see two instances, then we have to invent the whole deployment strategy for that.

So the reason why we went with beanstalk is being stalked has the concept of rolling updates as the concept of integrating with code building code pipeline to do the deployment.

The other thing is if you like roll your own.

You have to come up with the strategy of how do you update Jenkins itself.

I like the idea of having a Jenkins container or running that.

And then you can actually developed and work on that Jenkins container locally if you want.

So I mean, I think there's an eye sometimes to be frank, I use this as an example of why you shouldn't use Jenkins not.

Why not.

Why this is bad architecture.

It's just saying like if you take this to the extreme.

This is like a best practices implementation of open source or Community Edition of Jenkins unaided of us.

And let me explain why that is so like you know you're deploying in this case and you'll be for the traffic you need to have TLS terminated at that.

So we set that up and then you want your access logs.

So we rotate, we have those going to the bucket, but then you know you don't want to have those access logs persisted forever.

So we you know we lifecycle those often glacier and then ultimately delete them.

And I mean, that's a lot to do just right here.

And we haven't even come to the point Jenkins yet.

And then we deploy Jenkins in this case with beanstalk look legacy technology, but hey, it does the job in automates or classic application architecture where you have multiple nodes.

And then fronted by a load balancer in this case Jenkins master instances you can't have more than one you can only have one online at the same time where you risk corruption of the file system.

But if you want to be able to do rolling updates, even if that means going down to 0.

And then back up again.

We need to share the file system.

So the easiest way of doing that is using IFRS as that files.

And then we also need to worry about the file system running out of storage.

But OK.

Now we deploy any offense.

We need to have backups of that.

So how do we do backups that historically was a real pain in the butt.

Where are you provision a data pipeline data pipeline has like some lambda that runs in one 8 BSS 3 sink and sinks those to the first bucket and then all the lifecycle goes on those backups.

Now that's simplified using data as a first class service called backups but then we got.

How do we update Jenkins so one thing is getting Jenkins deployed.

How do you keep Jenkins up to date infrastructure as code wise, and that's what this is solving.

So the idea is that you have the code pipeline and code build and you start to do that again, staying within the Amazon ecosystem.

Now if you want to do is give up actions or circle Ci or that would be a totally different solution.

So we tried to do here was implement using almost strictly AWS services.

This implementation.

Thank you for that.

I would say what used to build your diagrams right there.

Oh so this is a lucid charts lucid.

Yeah, they've got good.

They've got good to be us and Azure and GCP, you know MIT graphics and everything.

I would take carbonated it makes us easier nowadays.

Totally Yeah.

So I mean Kubernetes I mean, once you get over the hurdle of deploying and setting up Kubernetes, which is its own roadblock.

But once I got the whole let's determination thing is you know that's incredible.

Yeah and an adaptation.

Yeah So Yeah, this gets radically simplified if you use Kubernetes to run and manage it.

But introduce new complexities like if you're running Jenkins under Kubernetes how you if you're building Docker image is that suddenly is a lot more complicated and you need to use one of the alternative doctor bundling tools.

Thank you for that.

So a lawsuit or a folder.

Yeah, a lot of room for that.

Any other questions.

No, not yet.

It doesn't have to be related to this just in general.

My impression was actually, you might.

So I'm working with Kubernetes and trying to figure out how to manage certificates.

And I came across a project called cert manager.

Yeah and started using that.

Is that pretty much the way to do it.

It's the first one I just happened to come across.

Yeah, that's the canonical like TLS certificate management software.

It's basically.

So I believe it's just that manages that one day they built the first one, which was called LEGO.

LEGO is now end of life and has been superseded by certain manager and cert manager does a lot more than just or something.

Let's encrypt certificates it can also be your s.a. your certificate authority and help you with TKI infrastructure.

You need to do self signed certificates for services like the Alaskan side of the cluster.

OK How does it.

How does a new certificate feature.

I saw that's coming in.

Kate's was.

I think they announce something with certificates that anyone happen to see that.

Yeah, this for.

Is this for though secret encryption or this.

I think there were people there was so Cuban has an internal service for certificates for provisioning certificates for you know LED and for whatever you know for internal services for Kubernetes and people were using it, even though it was like, hey, we're not where this is really not meant to be used.

But people were using it anyway.

And so they saw that people were using it.

And what we can you know, let's go ahead and make this a public API that you know and kind of bless this.

That's cool.

Yeah, it was.

To be honest.

I was not aware of that.

So yeah, I saw the same thing from cystic that you're looking at now.

And at the same time that I happened to be working on cert manager and then I'm like, oh, what's this coming down the pike.

Is this going to change how things are potentially done in the future.

It sounds like maybe I guess if you wanted to use the route s.a. that's in the cluster.

Yeah Yeah.

Or I see it maybe replacing the TKI stuff.

But I don't see it replacing LetsEncrypt.

So what are you using certain integer for now.

Just actual like search on the ingress controller.

Yeah So you're going to want to use like LetsEncrypt.

Yeah to do that.

So that you know when someone's hitting your ingress with the browser they don't get this search is not trusted or whatever.

I say only way to get that green lock is to use a trusted root s.a. which is which is what led to this getting you one thing I really wish is that lot of us would add the ACME protocol for less.

I remember it being really nice.

Yeah, I was Ruby doing some research for this.

And I don't remember exactly what I was searching for, but basically you know I love.

Let's encrypt and I want to continue using LetsEncrypt but if I could pay to raise those rate limits.

I would do that in a heart beat.

But they don't even provide a way to pay for better service if you need it.

So that's why I really wish there was any other you know official CIA that provided ACME protocol without limit.

So managers not DNS validation right now with about 53 actually, I've got cert manager creating the records necessary in the row 53 zone for validation because all of this is on a network using external DNS for that.

Yeah Yeah Yeah.

We can't we can't use the HP endpoint because it's all private.

So using something Andrew briefly sir manager is not the only option if you're using like STL STL we'll do it too.

There's like glue.

I think you know that'll do it.

You know most of the more the full featured ingress options.

We'll do certificate you know do less termination too, which has a certificate management Ceo's is called right now is called citadel.

I think if you search STL citadel I think you're going to find it.

But it's getting rolled into another service it the CEO.

Guys are simplifying their architecture a little bit.

Yeah, that makes sense.

Lots of outsource.

Take a look at later.

We're not using vista right now just this is really awesome.

And it does a lot of really cool stuff.

But it's definitely you pay and you know you pay the price with complexity because it definitely adds complexity.

Yeah yes supposedly they were to make it simpler by just throwing everything into a single game.

And as opposed to having all the parts spread out into just like SVD or something.

Yeah but like the perfect.

There's a couple of really good reasons to use this to show that you know, if someone tells you our CEO is not worth it.

You don't need it.

You can tell them to shove it and like one of the perfect reasons is the mutual TLC that you get.

So like we had this we recently had to setup key cloak with open air that as the backing for the user database.

And we like the guys where we're digging down this rabbit hole trying to figure out how to get best to work.

So that you could securely do you know through key cloak.

You could if you know, if I'm a user and I go to my key cloak and I want to change my password.

It makes a secure connection to LDAP to change the password using t allow us to do that using all that pass and they're going, oh my god, you know, we're having such a hard time getting this to work we're you know, we've got to get it to figure, we've got to figure out how to get these guys to get open LDAP and key cloak to share certificates and manage those certificates in one hour we're all like, hey stop stop wait just turn on STL mutual ts and you've got a frickin' steal tunnel all the way through.

That is is solid.

Don't worry about all that mess at all have an unencrypted l that connection that is encrypted by Mitchell teal s through its dio bam.

So that's why it is magical when it comes together like that cool, cool.

Thanks Any other questions related to anything else.

I posed the question.

So the last two days.

I had a couple of issues with the commissioning issue two instances in particular, I see 5 o'clock in here central a one they just sort of place this thread in select as well.

I posted it with an image of the arrow message.

I'm just wondering if you have like knowledge where I can find out if something like this also an image in the thread, if you click something how I can basically find out if instances I will like I like mutate.

Yes I was Ezra all the way before.

So that's interesting. you know I don't.

I mean, it's a legitimate request and you should be able to just have an alert for that.

I know that not answering your question.

But are you able to spin up a different type of instance, if that one is not available.

Yeah, I was able to adjust like it's all around us.

And we didn't care about multi easy in this case.

So like the same consensus were available in Erie central a B and C So but in general like I don't know how to do it.

And I'm currently wondering if I should like introduce and an auto scaling policies auto skaters and AWS and makes instant policy.

So yeah.

So if you do, that's what I was going to kind of get at.

If you do a what do they call a template and no template or whatever.

But a lunch template.

Yeah, you can.

You can have a priority list of instance types and it'll try your first priority first.

And if it can't because they're out.

It'll do your second priority right.

So yeah, that's just what.

I'm like, because this is what I'm currently evaluating if it's worth it because like all three zones being elephant census seems unlikely to me.

But in the case, it happens.

I like to instance, a state doesn't change if you try to open up new instances, you don't get a notification like, I don't know what's going on there.

Like I actually like I don't have any insights.

We're just kind of weird.

We also just note that notice because well apartments rental broking.

So the only reason why for us.

We actually noticed it.

Yeah, I don't even know if there's an API to be able to check if that exists without trying to launch it itself right.

This is the thing I was thinking of.

I just want to see if they meant it had any mention of the problem that you're talking about.

So this is a set of lambda as you can deploy to monitor your AW limits.

Like when you're approaching them get alerted on that which is also valuable.

But I don't know if this is going to tell you well.

Like you can't even Launch Instance to begin with, even though you haven't reached your limit on it.

Maybe Yeah.

I mean, it wasn't my limit.

It didn't vote on any of our accounts.

So it was just a hobby.

This particular song didn't have any instant intent of this type of label and you've previously been able to launch them.

Yeah, it also worked in a certain sense and type and adopt a.

I switch some back to a smaller 1 again.

And that also.

I just could not open up.

See if I think slouch like for me.

I'm neutral.

It was like like I just like you I mean, they only have a certain capacity of bare metal servers you know what I mean, that's where spot instances come from this is your buying extra capacity for cheap right.

So there could be something there could be something in spot instance availability that you could look into.

You know that if spotted then start if spot instances aren't available.

That means they're running out of that type of resource.

And the other thing to consider are spot fleets which then can be composed of multiple instance types.

So don't know.

It's almost there.

It's almost like treating instance types as cattle as well.

Don't rely on just one type of cow.

Right now right.

Yeah So that's what I'm currently thinking about if I should do it If it's worth it because, well, for one particular, obviously spin up between 20 and 100 instances throughout today and well if we cannot spin up one sentence.

We're kind of in a better shape in terms of performance and stuff.

So yeah but Yeah.

Cool Checking Spartan might be a good indicator.

You can see here.

So Jeffrey and then we have Jeffrey you've been on the call.

Yeah I'm here.

Hey Jeffrey give me that.

Well, we get to your question here.

Jeffrey Then you know the best way to do multi region would Terraform unaided of us.

Let me bring up your thread that you had brought posted.

So everyone can get the context here.

Yeah, I could also just repeat as well.

So basically what I'm looking at is to have a disaster recovery for Terraform essentially just a duplicate set of resources.

We have a Kubernetes cluster.

And we use all of S3 and then so we're primarily in the US east region.

We want to have a failover into my example is just us west too.

But then.

So what I was worried about is managing that Terraform remote state in S3 because given that AWS is S3 buckets are region specific.

If let's say my initial attempt was just to have everything in USD one managed by $1 that was and also USD 1 and then basically sweat it out that way.

So our bucket in us was to would also manage all of us Western resources.

But then if you actually have an entire region going down.

That's right.

And also in between, we also have to read from a temporary stay as a data source that way.

I mean, because it's like for example, if you want to replicate a database from their source, then you also need to just be in a know the Amazon resource name from the other.

But then if an entire region goes down, then I imagine destiny.

But it would be inaccessible.

So my next thought would be just to have a single S3 bucket that manages all your regions.

And I mean, basically whatever regions that you're they also have resources in.

But then you would have kind of like a cross region replication of that bucket into somewhere else.

So in the case that a region actually goes down, you can basically just point it to the bug in the innisfail region.

Well, yeah, that's a great summary of the problem.

And we mentioned there to two thoughts come to mind first is I think thinking of it in terms of fail over is a fragile approach that you should think of it an active active as much as possible.

Now that might mean you run smaller sizes of your workloads in those two regions, but then you are actively testing them all the time.

Because what happens.

So frequently is you fail over.

But if you're not running real workloads on there, you don't really know that it's viable.

Yeah So that's something that you actually do want to do.

But under the direction of I kind of are management.

They want the capability to have a failover.

You know that's just based off some regulations with some things that we're working with.

But then to actually just kind of just turn off that capability until further down the line.

So at the moment, we can have an active active setup, then I'd like to propose another kind of, let's say wild or wacky idea non-technical solution really to your problem.

The idea is to phase shift.

And you what you want to do is ensure that no failure of one system is coupled in some way to the failure of another system.

And well, what about just not using Amazon as your state storage back end for Terraform therefore.

Yeah, that's possible.

I mean, this is just not like I know, for example GCP.

I like their data storage is like multi region.

I mean, I guess that's the advantage there.

I mean, like, well other, I guess you have any suggestions.

Well, so that was the one suggestion.

The other is though and I I mean, I still as much as possible.

I think you should still keep the state storage bucket totally separate and isolated per region and that regions more or less share nothing with the exception of some things that you mentioned like pure BPC peering connections you know where does the states along for that.

And then like setting up database replication on the other side, you have to fail over capabilities of the product like the front end and ideally you'll be able to address that just based on the health check capabilities of route 53 or whatever system you have sitting in front of that.

So that traffic will automatically fail over to the available region as necessary.

Now, if you need to do some operation like promoting one database to be primary.

I think that's going to depend on the technology.

And that's where you're going to have to be most careful about where that state is stored that you're not in a pickle on that.

Yeah And that's kind of exactly the.

It was it was during the process of kind of restoring or I guess promoting the reader right up to a master database and eventually like reverting back is where we ran in to that where this thing is even the resource name we still have to be reading from the remote state.

And then at some point when we want to start.

Let's say you remove all these internally consistent.

So remote state and us E2 should always be reading other remote states us these two Never remote states in USD 1.

I see.

OK Yeah.

OK So.

So that means that to bring up this infrastructure in USD to you have a parallel set of resources in that region and parallel state files and all that.

OK So.

So essentially just like a cross region replication.

And replication has kind of some loaded concepts with it that things are getting like there's some process replicating it.

That's not what I'm saying or suggesting.

I'm saying that you actually have the ability to spin up a parallel stack in the other environment of which most of it may be turned off or whatever cost reasons.

And then you have your replica a database that's turned down.

Now There's also Postgres or has their global database.

Now, I don't have firsthand experience with it.

But this seems to address part of that use case that you have.

And they support Postgres and zip my SQL to now.

My 6 4 5 7 I guess.

OK I think right.

But some.

Yeah So you might be paying a price for running that.

But I mean, it's probably going to be less than the engineering cost instrument in your own system.

Yeah Right that makes sense.

Thanks a lot.

So what do you look like.

What are you proposing with the Rover well.

So they have just like Microsoft offered with the SQL database they have now a global database.

So it should support regional failover or the situation where a region is offline.

And then how were you applying that to terror for he's said so that I wouldn't have to have a good source and failover party at square one gets promoted indicates that A goes down.

So it basically just handles all of that failover.

OK and round 53 is global.

So I presume the end point for.

There are a global database that endpoint would work in.

And rather to the appropriate region since OK.

Thank you.

So we covered that one.

All right.

So interstitial topic, I thought this was pretty rad.

Slightly disappointed.

But I get why they did it the way they did it.

So one of the nice things with Cuban this is like this API where you can just throw documents at it and it creates it for you kind of like CloudFormation but genericized.

And then there are four operators down the business logic to do whatever you want.

Well Terraform therefore would let you know.

It is this gap like it would be awesome to be able to just add deployed a document to Cooper and eddies.

That describes my RDX instance or something that I wanted to deploy and then Cooper and handles the scheduling and provisioning creation of Terraform to do that.

So there's been a few attempts at doing this by other companies by other people basically rancher had one.

Lauren brought this question up again recently.

I'm going to put you up.

Pull this up.

Or not Lauren.

I mean, Ryan Ryan Smith.

So AWS has like their service operator, which is kind of a rapper for calling CloudFormation and let's see where Yeah, this was the link to the one that rancher had and these all allow you to just use Terraform open sourced to provision Terraform code on Kubernetes but now turf now hashi corp is coming out with a first class provider for this not a provider.

I mean operator for this.

And it works in tandem with Terraform cloud.

So basically, it triggers the runs to happen on Terraform cloud for you.

And then Terraform cloud can handle the approval workflows and exception handling and triggers that happen like if this project runs and trigger this other project, which would be really difficult if they had to re implement all that side of Cuban.

So that's why I get why they're making this integrated type of terror from cloud is it clear kind of what the value is of what this operator is providing for communities like I can go into more detail.

All right.

So you know it's not clear from our understanding of truth and evidence to begin with isn't there.

So I like it.

Yeah, I like to think of Kubernetes as a framework to things.

It's a framework for cloud automation that's the genericized across providers.

As you know spring is to Java as rails is to Ruby as Kubernetes is to cloud almost.

And then it's another thing.

It's a new generation of operating system.

So what are the characteristics of an operating system an operating system has a kernel that does the scheduling community says the scheduler.

That's kind of like that concept, but it treats nodes they can distribute those workloads.

Another characteristic of an operating systems that has a file system while Kubernetes has a primitive before file system operating systems have primitives for configuration like xy.

Right We have typically all the configuration of communities has config maps analogs go on and on and on and on.

So if you work within Kubernetes within the bumpers of humanities then you can achieve automation much easier.

So that includes the problem with Terraform that we have today is how do you orchestrate the actual business logic of deploying a module like if you had if you had if you wanted to do blue green with Terraform where does that logic of blue greenness go in Terraform.

There is no place or you could write your own Terraform provider to implement green or you could do weekly ops.

Basically you described in a wiki page how blue green with your telephone or you can script that with Ansel or you can script with some Bash group or maybe you know the future is actually you know orchestrating that stuff more with terrible.

That's right.

Humanities and humanities operated like totally missed the punch line.

So to kind of make sure I got to understand what the value add for this cash crop offering is that through this operator for Kubernetes nannies.

Now, I don't know what the difference is between the operator and a customer resource declaration operator implements what the customer resource definition describes.

OK, thank you.

So I guess what I'm saying, though, is that you could take this operator apply it to your Kubernetes stack and then give him whatever credentials that needs.

And then you supply a manifest saying like, hey decides this app.

I also want to run this, you know I don't know Amazon global database Aurora instance.

And so when I spit on my app it says, hey, by the way, before you do this on yours and I need this dv does that Cuban is reaches out to either of us through Terraform creates that Aurora instance comes back with the end point feeds into my hand and I have a running app but alter I think it's in the picture.

Yeah, thanks Jim for Brennan bringing that up.

That's a really good point.

So like when you deploy your apps to like here's an example half we have under a cloud posse and we're deploying this withheld file lab you know using Google days or how file that's not as helpful here.

But I wanted to point out some things that are happening here.

One is that we're deploying here a C or D for a gateway.

This is just a generic gateway that happens to be for steel.

Here we have a virtual service.

And well, what I could add below here would be a C or D for Terraform.

So here Terraform provision this module for me for RDX.

So now we have one document language that handles the configuration and deployment for everything Kubernetes and everything outside of Cuba.

And we can package all of that up and in the help package we can actually have then a help package for our application that spins up our services and companies are banking services with Terraform and RDX a provisions that I am rules with Terraform and it deploys this CloudFormation from our third party vendor in one thing.

What are the implications of this being Terraform cloud as opposed to say open source Terraform.

So no first hand account.

Of course, this just happened.

Looking over the announcement in the video, the main thing seems to be that it triggers all two things.

One state storage.

It works with the Terraform from cloud state storage.

So well if you're using that that's going to be a deal breaker.

And then to the next big deal breaker is it works on the concept of remote runs.

So with Terraform cloud.

It's really cool.

You use the client to trigger runs in the cloud.

So even if your client disconnects or you close your laptop or whatever everything keeps running and you just reattached to it in the cloud.

No And that removes the need for managing is the area that is managing the state at all whether it be or as through your back and Yeah turbulent cloud as is doing them for you makes it a black box more or less to you and you don't have to worry about corruption of that securing of that allowed.

Quick question as you've touched on Terraform cloud when you're using something like local Zach power and Terraform where you'd normally give it something like sh whatever you command might be.

Yeah My team operate in multiple OS is Unix based on Windows based.

That's always been a challenge for us with that type of thing of ensuring we get the right type of commands have a handle that in Terraform cloud.

I can answer it as much as I know from our experiments.

Also, if let's see here John is online Lan John's not online.

All right.

So to date my understanding is that Terraform cloud only supports Linux like runners and those are hosted.

So the Terraform command local exact runs in that Linux choke.

The other problem is that they don't allow to bring your own container.

So you're restricted to the commands that exist in that container.

So that's why what some people do is you have a local exact that goes and downloads the aid of US cloud.

You have a local excuse me a local exact that goes to download some other binary you depend on.

Then the others way it works is that you as the convention that you in your dock Terraform dot folder you you get and you add that to your source code.

And in there you can pick any providers that your module excuse me that your modules depend on.

OK So that partially answers your question.

That's how it works for.

For Linux based plan and applied and stuff like that.

I don't know if Terraform cloud has a solution that works on Windows yet, but I'm guessing it.

So I just.

Just as a observer of this.

I'm sure I'm impressed and surprised by how much Terraform is happening on Windows and started on Windows workstations and windows environments because it's something I haven't touched in 15 years.

Yeah, basically surprised.

And I look at my colleagues who use Windows daily and say, how but there's no I mean, I have nothing but respect for Microsoft these days they've really turned a corner in my book and spoke both supporting open source providing awesome tools.

I mean visual code is arguably one of the best ideas out there that the windows subsystems for Linux.

That's pretty cool.

And a strap that on.

And I think they've been made that more first class and improved.

So yeah, I get how it's possible in school.

Cool then adolescence.

Yeah or what are they calling it.

But Nadella sons Nadella stands to unroll unpack that one for me.

I don't get that.

Such an adult Nadella is the new CEO of course.

OK, got it.

He's turned the company around.

Yeah, that makes sense.

All right.

Well, then a personal shout out here if you guys wouldn't mind nominating me for hashi corp ambassador that be pretty awesome.

I know I reached out to some of you directly, but I'll share this in the Slack channel.

I think that would help us reach more people.

So your nomination would be appreciated.

I just sent the link to the office channel.

Right then another one.

This is AI don't know if I'm going to be able to answer it in a way that's going to be satisfactory because I don't think we've solved that out of anybody's entirely solved.

But Ben was asking in one of the channels.

And I shared across posts that link in office hours here to his question.

This one.

So hey folks.

I'm looking for some advice on how people are tackling the chicken and the egg problem with secrets management.

I had the idea to use Terraform to provision bolts but with this comes the question, where do I get the secrets needed within Terraform scripts.

Of course, I do love to use volts for that one solution I've heard is to place the scripts in a super secret repository along with a secret restricted access points like you.

And while I guess this works something about it feels dodgy but I guess these and its secrets have to be stored somewhere.

So there's a few questions here.

There's a few topics here.

So one is what to do about vault masters unlocking secrets specifically and there's actually a great answer for that.

But then there's more like like what to do generally about this cold start secret because it's going to vary case by case on the type of secrets and the type of applications you're working with.

And we can't just generalize all that.

So with vault the one of the nice enterprise features that they have now released last year at some point into the Community Edition is automatic unsealing with cameras keys.

So I believe in the vault Terraform module that's g.a. on the magic corporate web site, it supports that automatic and sealing with came as keys and then it's just then it's just a challenge.

I am.

So you've got to set up your IAM policies correctly to guard control.

But then let's talk about the other things that he talks about like you know one.

One thing is to have your secrets actually in a get out the repository in plain with access to a select few.

I think we can all categorically say no to that.

We've all learned that this is not a good thing to do.

There's an augmentation of this that we see.

It's kind of like what you can do is you can encrypt those secrets using the public key that the cluster then has the private key and the clusters enable to decrypt these at deployment time or you have that baked into this process and the cic pipelines are able to be.

I still don't really like that solution too much.

The reason why is it's not opaque.

Not that its secrets can be opaque, but I mean here you're encrypting binary data.

If somebody is inadvertently and I'm not seeing the forest.

I'm not saying as bad actor just accidentally messes up another secret.

I don't approve that request.

I approve this broken deployment of secrets going out there.

I don't think that is that great either.

So ESM Amazon Secrets Manager.

I think has obviously thought a lot about this when they implement that and how it works.

So I want to talk about another key problem here with secrets and that's rotation right.

So you can't just rotate the secret and expect you node cluster not to fail if you don't support concurrent secrets being lied at the same time.

And then lifecycle cycling out the old ones.

So you almost need to have blue green secrets kind of or you need to let you know at some for any given period of time multiple secrets online and working.

And this is how like Amazon's SPSS API works when it issues the temporary credentials your old credentials will eventually they expire.

They don't expire at that moment.

And you have some overlap where both of them are valid.

And that's to have graceful rollovers.

So in whatever solution we come up with here we need to take that into account, the other is we need to have some certainty of what is deployed and being able to point to a change that triggered it possibly.

And this is where, like, an idea.

Let's just say like security aside man wouldn't.

Everything would be just so much easier if we could just encrypt these.

Not that we could just have these secrets be in plain text in it.

It just how it with the rest of our cic pipelines man.

And if everybody was just honest and nobody would do things wrong bad and simplified everything well.

The alternative to this that Amazon Secrets Manager does that make it.

So easy is that you get a version i.e. that refers to that secret.

And that version I.t. has a computer nothing sensitive and in that one, you commit the version idea to your source control.

And that's how you roll out the changes.

So that gives you a systematic way of doing that.

Now how do we get the secrets into the ESM not us as an E&M in the first place.

I don't have the answer.

Ideally for that.

I think some kind of a front end for that integrated with no single sign on would be nice.

Obviously, you can still use the Amazon Web console for that.

I think some of the more purpose built for that would be nice.

And then lastly, I'd like to introduce like another concept that I'm finding pretty awesome is the external secrets.

Operator so the external secrets operator from Cooper and this is now again, more of a community centric solution is that it supports multiple back ends of which one password is one of them.

And I find it's kind of cool because this allows you then to present a really easy UI for the companies.

And then developers to update their secrets.

Is this going to work in a mega corporation with 2000 engineers.

Yeah, it's probably not the right solution.

But you know, for your smaller startup with 15 to 25 engineers this is probably going to save a world of hurt and complexity.

Just by using one password for a lot of those applications specific secrets.

These are the secrets I'm talking about for like integration secrets.

Third party APIs, and stuff like that.

Another option is just get out of the secrets management business together and use something like dg.

So this is very good security and what they become your third party that you entrust to manage secrets and then you just have tokenized values for everything that you where you need a secret.

You can commit those tokenized values to your source control and deploy your code here.

But the secrets themselves are with yes and the way it works is basically you export a proxy setting.

And that makes your requests go through.

Yes Now would you do this for internal services and where they ingress and egress again.

No probably doesn't make sense.

But where it makes more sense if you have third party integrations that you need to handle security.

This might be an option.

All right.

So yeah.

Any other tips on secrets management things that you're doing.

Andrew Roth.

I have some ideas or suggestions.

I posted a Lincoln office hours to get tsunami labs steal secrets.

Gotcha Yeah, that's what we would be using if we had this problem at the moment.

Yeah, I remember looking at this.

Yeah, this is that same concept basically where you encrypt this data they provide a quiet tool cube zeal to make it very easy to encrypt the secrets with the public key of the cluster.

So you can just commit the sealed secret ports control.

And I'm done with it.

It's pretty hard to decrypt it.

And creates a regular secret inside the cluster.

So that's a good tip.

What's the use case for this.

Plus the same one.

Basically you need.

Let's say you need the database credentials for RDS or something and you're not using already as I am.

Well, how do those credentials get to the cluster.

So this allows you to encrypt a.

This is a custom resource definition of a type sealed secret that you have this encrypted data that looks like this.

So here are the credentials for my service.

And then you deploy this you deploy the operator inside a Kubernetes and that operator looks for this custom resource that looks like this.

And when it sees it it decrypt it.

And writes a secret called my secret into that namespace here.

So here's the output of that operator.

They created a new resource of type secret where it.

Here's the actual password decrypted from here.

So this is secure to put in get.

This is not security putting get that.

This is what you're asking.

Haitians need to consume it now.

This is would only be able to be leveraged with Kubernetes.

Yeah, this is going to be a specific solution.

But this is, again, another reason this is why Cuban enemies make so many media outlets easier at this point, if you're not already seriously thinking about, you know starting to move towards Kubernetes as a holistic approach to your entire business.

Thank you again.

Yeah It's like imagine that you could script this stuff.

You could orchestrate something like this, you could build something with an ensemble or whatever yada, yada, yada.

But whatever you build is going to be a snowflake that won't really work in anybody else's environment.

But when you deploy an operator like this this the operators designed to work with Kubernetes.

So if you had a Cuban ID and you like this strategy.

OK, let's just install it operator.

And now you get that functionality.

So ultimately extensible.

So the DOJ is using this color like like full scale.

50,000 users.

Well, that's well that's a vote of confidence right there.

Maybe But it's very good security, then.

In other words, it is the distinction that that PDF is the external to the cluster of the words the key.

It is it's the third.

It's the Sas service.

OK Yeah.

And the idea is that you basically get turnkey compliance.

This is you could say this is like Stripe so stripe all they do is credit cards.

Well, what if you wanted a service like Stripe that works with arbitrary data and any API endpoint.

That's yes.

So you don't have to worry about any PHI p.I. CHP any of that stuff.

If depending on how you implement your yes.

Wow nice.

So you basically get a token and they support being so your customers can submit that the data like through a form like what you have to the post go directly to.

Yes So you never even touch it.

You never even see it.

And then you get back API a token that represents that bucket of data.

You can now operate with in your icons.

So what's the story behind this Terraform operator from Pasha corp. Yeah, so we briefly just covered it.

Oh, did you.

I'm sorry.

I wasn't thinking about that.

Yeah wait a minute.

You missed a great shows epic full on demo all of the bells and whistles.

You're joking.

No, it's just really brief.

Honorable mention here that it only works the terror from cloud because honestly, the edge cases you've got to handle with Terraform I get it.

Why they made it that way, even though I personally wish it just worked with open source Ed.

There are three other operators at least or four other operators for Terraform other communities that I've seen.

But this coming from, how she caught I think this is going to be the one that ultimately wins unfortunately requires hashi she for state storage.

And it actually uses hashi courts remote run capabilities for all the planning apply.

And that's also how it works with approval steps and triggers and all the other functionality that tear from cloud brings.

Gotcha So it's a kind of a simplified one stop shop as long as you sell your soul to us.

Yep And as a company that's recently implemented Terraform cloud like functionality by using a generalized CCD platform in our case is code fresh.

It was Yeah, it was a lot of it was hard.

And I can't say that we solved it, especially as good as Jeff from cloud.

But what we did was get consistency.

So that it works like the rest of our pipelines and everything's under one system.

And we have full control over what that workflow looks like versus the opportunity to turn from cloud one thing I saw that came up.

There's a guy Sebastian is a friend of mine.

His company called scalar scaler has a there there.

The early product here implementing they reverse engineer the Terraform cloud API basically.

So you can use scalar as an alternative for Terraform cloud.

I think what I think is interesting about that is the day that Atlantis or some project like Atlantis comes out as a terror cell close to terror from cloud alternative why did I do this.

So you don't have to make a contract for project corp. or is it like you pay for content posted on prem or so from cloud or sorry.

Peter scalar.

So they have their own user base.

I can't speak to the business objectives per se, but what they're trying to do is build a build an alternative to terror from a cloud based more on open source solutions like I believe it works with.

OK open policy agents leave it allows bring your own container.

So that makes it compelling.

Since with terror from power you can't do that.

Also let's see here.

I think it's the Sas.

And where I think this is actually entirely self posted.

So I don't think this is this.

Us have gone into pricing at least.

OK So Yeah, I just know honorable mention that's what somebody else is doing out there.

There's also space as facelift that I owe but I don't think a space.

Lift does influence the Terraform cloud API.

OK Whew all right.

Well, we are overtime here.

So I think it's time we got to wrap up for today.

I appreciate your time.

I think we covered all the talking points.

So here's some next actions for you if you haven't joined the call before.

Make sure you take some of these including registering for our selecting Go to the cloud policy slack.

We also have a newsletter the first line previously, I've been able to send it out as regularly as I'd like but that's cloud posse newsletter.

Register for these office hours if you want to get a calendar invite.

So you can block off the time.

I think that's the best way to be a regular on this go to cloud plus office hours for that.

Did you ever miss an episode.

You can always go to cloud posse podcast.

They'll redirect you to podcasts a clown posse which is where we syndicate all of these office hours recording.

So if you want to catch up on past episodes we've talked about that's a great place to go.

Everyone is more than welcome them to add me Ellington so we can stay in touch.

I share a lot of interesting information on LinkedIn.

So it's a great way to stay in touch professionally speaking and if you're interested.

What was that.

Oh, sorry.

Just a question.

You post all these office hours.

They're all like summary, your descriptions.

So when you see like 30, 40 videos you're like, oh, yeah.

Yeah, that's a good point.

It's a labor of love.

I would love to summarize each of these videos and add that there.

I will say that what we are doing.

Do we have the automated machine transcription.

So if there's a topic you're interested in looking for you.

You can probably go to like you do site cloud posterior and you like blogs or you search for like Prometheus back.

And I think let's see if this works well fail here on this one anyway.

Well, they are searchable from the transcripts.

But it's not ideal.

I know it's not the answer.

You're looking for.

So thanks for that feedback, though.

I'll take that to heart and see if we can't start adding some show notes for each one of these sessions.

So therefore discoverable.

And then Yeah go to see accelerate or slash quiz if you want to find out what it's like working with cloud possum.

Thanks for your time, guys.

See you guys.

Same time, same place next week.

Take care what was the worst thing besides scaler or the last thing after scaler.

Oh, that was space.

Lift that I oh OK.

Thank you.

Now the outpost that in my office hours, of ice opening.

Adios thanks very much for answering my questions this evening.

Goodbye Yeah.

Come again.

Public “Office Hours” (2020-03-18)

Erik OstermanOffice Hours

Here's the recording from our DevOps “Office Hours” session on 2020-03-18.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show started.

Welcome to.

Office hours.

It's march 18, 2020.

My name is Eric Osterman and I'll be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator.

We help startups own their infrastructure in record time.

By building it for you.

And then showing you the ropes.

For those of you to the call the format is very informal.

My goal is to get your questions answered.

So feel free to unmute yourself at any time you want to jump in and participate.

If you're tuning in from our podcast or YouTube channel, you can register for these live interactive sessions by going to cloud posse office hours.

We host these calls every week will automatically post a recording of this session to the office hours channel as well as follow up with an email.

So you can share it with your team.

We want to share something in private.

Just ask.

And we can temporarily suspend the recording.

So with that said, let's kick things off.

We have a couple well talking points here and another one in the Slack channel.

Again, if you haven't joined our slide team go to slack dot cloud pass econ again slack dot cloud posse and you can register and join the officers channel there.

So one of the awesome recommendations was from Brian Tye.

If everyone can share some of their work from home tips and I'd like to expand that to maybe some productivity hacks I'm sure this affects pretty much everyone on the call here today.

So I'd like to learn from that.

The other question that we had just ponying up here was how it was a very common reoccurring question.

I think it comes up almost every office hours, but we learn something new every time a peer asks he'd like to have his monitoring strategy vetted for deploying Prometheus operator.

We'll get into that.

And Dale let's see here Dale just posted something Dale what's this about to get his guys working from home.

Awesome So this is by you actually yes.

Knight OK.

OK So let's Yeah let's review that in a second.

As soon as we first do the first order of business.

Any questions today outside of these two talking points that I just brought up there.

Yeah, I have one actually.

All right.

Have you.

Has anyone done Kubernetes ingress behind the VPN.

So you can't use LetsEncrypt.

So we're looking at doing a node port service.

But everything kind of feels like a hacky thrown together mess.

Are you like in a private cloud scenario or I ate of us Governor cloud behind a VPN.

Yeah not my area of focus.

So what.

First of all, what's the problem here.

I'm not sure I am understanding the problem like, why can't you just use regular and Unix ingress in that scenario.

And what's the relationship with the VPN and the ingress.

So because I'm behind a VPN LetsEncrypt cert manager with LetsEncrypt doesn't work.

Gotcha automatic.

I can't because I can't do the automatic verification.

So I'm looking at doing a like an application load balancer and terminating TLC using a certificate from ACM and then I guess doing a instead of a load balancer type service for the Nginx ingress service doing a node port type service for the Nginx ingress and pointing the MLB to the port to the hosts on the port that end indexing grass is on the node port servers.

But it feels it doesn't it doesn't feel very clean.

I was wondering if anyone else had.

Oh, yeah, I've done that.

And this is a much better way to do it.

Can you explain you know the search again.

Yes So cert manager automatically goes up to LetsEncrypt and does the certification.

The certificate generation using LetsEncrypt and the way.

Let's encrypt the way.

Let's encrypt validates that you are the owner of the particular domain that you're trying to going to search for is cert manager deploys a pod that publishes and HTTP endpoint that LetsEncrypt then goes and looks at.

And it's got some token or whatever.

And it's called a sticky B0 1 validation.

But if I'm behind a VPN, you something from the open internet can't hit it.

Can you not use DNS everyone challenges because this only requires you basically to have a public reachable DNS record for the domain and modify that.

We don't have a public reachable DNS yes, it's almost no public.

It's almost there.

Yep Yeah, pretty much.

So I actually will try to do something similar, but some of what I was looking at.

Roger tried to run it globally but committed to the well.

Is it at that you want to use less encrypt.

No, not necessarily.

I just want ingress.

Yeah So take a look at how they do their range.

To me it was with us and start my angel.

I think they actually disable it and use something else to actually do it.

I think that may Patrick steer it in the right direction.

You're not be an external violations or requests for that check.

Oh, yeah.

And I mean, in this situation, since it is a very closed ecosystem why not have your own CIA and cert manager itself.

Can you manage your TKI for you, and you just need to trust that CIA then can you even let vault to generate a search for.

Yeah I mean, yeah your man, or your cell phone can handle that whole process and you set up basically it for cert manager when we were using Cam we set up cert manager as Aca, which would then generate the certificates for I and it's all self-contained ecosystem, then works.

And we have to sell less now let's face it.

There's no nice solution for all of this because of those constraints that you have.

But to me, the your own private dressing setting up that that sounds like that that's something I hadn't thought about setting up cert manager as a CIA that we would just have to tell everyone using it to trust.

Because Nginx ingress will create self signed certificates that are signed as you know to be daddy's fake certificate, or whatever.

And there's so difficult. Maybe you can kind of trust those.

But I think they change.

Like they don't kind of stay constant.

But every employee that ends next will be.

But right.

But if cert manager can be set up as a stable persistent CAA that I can then say, OK, go.

Trust this s.a. that would work.

Yeah, that would make it.

So that I don't have to terminate because I would I would prefer to avoid having to use ACM and terminating to a less set at application load balancer because it adds a bunch of one.

And overhead.

I would rather just wait till I set at Nginx in grass.

And then there's.

I mean, doesn't it.

Yes has all the private ECM stuff.

But I know that's very expensive.

I don't know.

And I don't know if it's in old club for what it's worth.

I'm doing something similar with the lambda right now accessing on prem where we're rolling our own s.a. and the BP BBC through like a VPN gateway site to site VPN.

And then we're Yeah, we're setting up our own CIA using Server Manager.

We're using 80, which was out of my control.

But Yeah we're using 80 to generate the certs.

OK, thanks Adam.

Yeah Now that's good.

That's good info.

Thanks any other quick questions.

All right.

Well, let's get into the first talking point, then which is going to be working from home tips.

I'd like to first open this up to Dale since you've already shared something on this that you've put together.

Let me open that up on my shared screen here.

So we can all see what that's about.

Of course.

My corporate firewall blocks Instagram.

Is this your setup down.

Yeah, it is.

That's why work from home.

So it looks like a Star Trek or something.

Pretty a pretty sweet system.

Only one monitor.

Get good scrub.

Yeah So Yeah.

Do you want to.

Do you want to narrate these slides what's going on here.

All right.

Or office actually implemented.

Work from home policy.

Due to the whole.

19 virus.

So I pretty much have started to put together just a list of thing that actually works for me in the past.

I actually had work from home prior to moving to New York for approximately eight years until a similar tip that actually worked for me.

So even when I got her one first answer did was to control the bedrooms into a little office just in case my girlfriend's here then I could actually set some boundaries as well in front of that space and make it as comfortable as possible.

So I like to be a bit more organized in what I'm doing.

I'm this freedom of words generally.

Even if you're in the office.

So I actually would start off with a natural list.

I do minty like a whiteboard to the side as well to keep a book or an iPod just to keep it kind of keep out things structured with that personal stuff.

I just use like things.

And for work related stuff I like.

That's in Europe just for everyone else's edification things Shazam app.

I do tracker.

It's not just things.

Yeah, there's an well.

Yeah thing that stuff.

So I use a lot of code based applications like everyone is more accustomed to doing as well.

So like Zoom slack jiro you know just to help with the collaboration or office metric does you use arms like generally for everything notifications just meetings as one would not just switch notice things like Blue Jeans are between blue jeans and Zoom and it suits us.

One practice.

I do maintain is getting dressed in the morning not nice.

Again Open a button down shirt but just get a little pajamas.

Take a shower.

Morning routines.

So you can mentally prepare yourself to actually get started.

Even with that.

I tend to set my desktop clear things off, make sure it's a little more organized and get seated.

You know I mentioned earlier, both setting boundaries a time again, because people think because they're working at home, you're available.

Especially if you have a family setting those boundaries making sure that your voicemail actually thinks about you're available at the times like how things sort of make people know go as far as them put it like this posted on northern and on the door with my hours of operation.

So yeah, I think that's a really good one in setting those boundaries nick and then having the conversation with your family so that they know that this is the case that things haven't really changed you just happen to be at home now.

Yeah, no.

And for my office based on this image you didn't even look at that.

I do minimum.

But at Instagram.

I put a lot of what my desktop was before that.

But I work with that system disk from Jarvis.

I switch from a dual much to a single ultra wide.

I invest a lot of time into making that space almost like a replica of what it would be like in an office space.

The music.

My laptop, whatever I would need to get things done so well.

I mean, the office or me back home.

I can still function as I would in either location.

No, I really I really like these arms for the monitors.

So you can move your screen around and get it up.

Especially when you're sitting a lot.

Having it at the right angle for your head is going to reduce some of that back pain and stress in your wrists by proper posture.

I think that's one thing that's not mentioned.

Nothing means that necessarily working from home.

But the prompt the difference between working from home often.

And the office is you have if you're not doing it often.

Yeah, pretty bad desk situation and chair situation.

So pay attention to if you start having pain in your elbows and risk because you're probably sitting in a bad post with that posture.

Yeah, I currently suffer of a slight impingement that I did therapy for and that was related to my posture at the desk.

I put Timothy Dalton like the romulo chair and I will hold my position as well just to keep some level of activity.

I like.

Yeah the I pad thing if you guys have I pad pros.

I'm not sure about the other tablets I researched it.

But I had close.

You can use that as a dual display.

I guess in 10 or 15.

It's natively supported but before that would do it.

That's awesome.

So actually my scream that I'm hearing right now is an iPad tablet.

So it's a great way to get dual displays like today if you already have.

Yeah, it's very useful, especially to see things like that while I was actually in Jamaica I had used as a second monitor as well.

That kind of simulated what I would work normally.

But my whole workflow.

The lowest I slide.

I just spoke about taking a walk was taking that break.

Step outside a lot of people don't realize that they spend so much so many hours indoors.

They don't get the sun the vital produce as much vitamin D, which may also end up with a flu season right.

Also it helps for kind of working through blockers state of mind.

Don't step outside clear your head.

Welcome back.

In and go back at it.

And then the other thing that I tend to do is to overcomplicate so we'll have chickens as well with my direct supervisor that gives the team title.

I also keep like an open zone that shows there's one guy to just jump into it.

I speak to it.

I like that.

Yeah Is that clear like I am.

Actually, that's what I'd like to talk a little bit more about I've been thinking about having as well is like for teams probably so not company wide and probably for maybe project related.

What about just having a Zoom room open that you can hang out in during the day.

You can mute yourself, you can stop the video doesn't have to be a loss of privacy or any of that.

But at least you can quickly hear any water cooler conversation that comes up related to topics on that as they want.

Now that we have that I actually implemented it.

So we use Google meat and everyone every time someone joins into two our like coffee break room as this called triggers a message in our random selection.

You can just hop in as well.

So having this as well, made it so much nicer because like in the beginning, people were just sitting there in their forest and nobody was really talking about something.

And now people joined and now we also have a calendar in mind every time that one Saudi PM and everyone is invited to join there.

So since you announce when it is.

So it's not all day.

You have it at is between specific hours kind of the day and it's open all the time.

You can point it there all the time, all day.

But like we have a dedicated session of 30 minutes where you can go in there.

That's what you announced your slack team.

Yeah you know in a general.

Yeah, we have a keyboard that are running throughout the day.

Ghost puppet because normally you in the office.

We'll just tap each other on the shore and it also helps as I'm getting lies and guess does mentally you're just not feeling alone.

Yeah, that helps.

That's a good tip.

Yeah Yeah.

And you know, this whole thing does wash your hands as good as possible.

Then if you miss it.

But I do have other tips on my Instagram that just love some of the slides about between my coupon.

I use Docker and working from more and more.

I can't stress enough that over communicate.

You That was one of my notes too.

And I think that's a really important one is that I don't think it's hard to over communicate actually and most people are actually under communicating what they're working on.

So people are not really informed on what's progressing, where they're stuck and Yeah.

Any anybody else have thoughts on that.

So one of the things my team just recently started doing.

And we really like it is there's an app on Slack called Dixie that is daily asynchronous standups and you set it up for a particular time.

And it sends each member of the team a message saying, OK, it's time for stand up you know.

And it asks you the typical three questions.

What did you do yesterday.

What are you doing today.

Do you have any blockers.

And it has.

I think it has helped a lot with getting people to write down their thoughts because when we do, we do a stand up call every day also.

But sometimes that can just be.

Oh, yeah, I was working on this other thing.

And I'm still working on it.

That's kind of it.

But getting them to write it down, gets them to go into a little bit more detail and especially with the blockers portion It is much it's much quicker to get blockers resolved when you write them down and slack and say, this is a blocker for me right now.

Someone almost always immediately goes and picks it up like if it like you know a blocker for me is I have this spread request that's waiting to be approved.

And you know nine times out of 10 somebody goes, oh, I'll go look at it.

You know cause it's right in front of them.

I'm curious about this.

And I end there.

This is probably like one of the most common app categories almost that I see for Slack.

I'm curious about anybody who's been using a tool like this for say six months or more and is still young and still sees at least let's say 80% participation in the notifications.

My inherent skepticism based on my own patterns is is like a confession here is that anything that is automated that I know is going to happen every day at the same time.

I tend to ignore as opposed to those things that are infrequent.

So this is why personally, I don't have hacks like that said a reminder every day at the same time to do something because then I just end up ignoring any anybody using this successfully in their company for a long time.

We've only been used there.

We've only been using Dixie since January, but I think as far as participation.

Our messages go out at 11:00 and we have our stand up at 11:30 and most of the stand up is going through the Dixie messages.

So if one of them isn't there you know, it's instantly you know kind of a polite name and shame kind of thing.

Why a why didn't you.

Why didn't you edit your stand.

You know.

But it's.

Oh, yeah sorry.

I forgot.

You know I got busy or whatever.

And yeah we haven't had any issues.

OK with people just forgetting about it because I mean, as long as leadership does it.

I think it tends to trickle down.

Yeah Any other.

Yeah Any other suggestions for working from home.

Brian any of your own tips or hacks you'd like to share or something in particular, you were thinking of when you asked the question in office hours General.

I actually don't have a lot of experience working in the office often.

I only work from home usually when I was sick.

So that's kind of the reason why I was asked the question.

I do like the idea of the coffee break.

I why do we already know this is like.

The office banter that we had at our office.

Yeah, so I think we're going to try today.

Think you guys a suggestion that I realized that I actually am working later into the night.

So because there's not that like a drive home thing that kind of stops you from working.

So I'm trying to figure out what I can do to fix that.

Two things.

Two suggestions on that help me at least one is making sure you set your office.

So I think developers have some different challenges from managers managers tend to live in their calendars and developers tend to just be pulled in every direction.

So it's sometimes harder to read regiment but what I was going to say is like for me on my calendar having definite work hours to find them.

So people aren't scheduling your time outside of hours.

And then the other one is disabling your slack notifications on your phone and on your desktop automatically at 5:00 PM 5:00 already or whenever it is you want your workday to stop.

Sure if you happen to be looking at it, you'll see it.

But at least hopefully it can give you the chance to close the laptop lid at a particular time and move on with your day and focus on family.

Yeah, I'd also make a comment like for the mobile apps like we use Teams internally on our organization and they usually have quiet hours.

Mm-hmm So we'll go on and/or I personally like 6:00 PM I just owe them pretty much gets news that I don't see him till the next morning, which could be a good thing or it could be a bad thing.

But it's definitely helped me when trying to disconnect.

Yeah not use like like uptime notifications and stuff.

Also anything serious like that should be set up with deletion policies actually.

Right So those should be going to page your duty or obscurity or something like that.

So that they escalate using that medium.

If it's urgent but overall, you can set, you can set different settings for different channels too.

Yeah, exactly.

Yes, we will.

We'll have a different.

Yeah, he's got a channel called alerts.

I would totally have different settings for the alerts channel than I would for the general channel or whatever.

Yeah, you can configure those settings.

And if you guys do like an uncle rotation like those of the weeks where you never have quiet hours you know.

So just think about her.

I've taken advantage of the team's feature the mobile app.

I tend to leave mine on because my team likes to just you know it even when we're outside of office hours we tend to like you know, we enjoy talking to each other.

And you know we'll put funny memes or whatever that we find.

And my software VPN particular is a night owl.

So he's up you know, every night at 10:30 doing interesting things because he is just one of those brilliant guys that is a manager.

But is smarter than I am at technical stuff.

And so he'll be up 10 30 posting links to sd 0 set.

So I like seeing that stuff.

But if it gets too much for me at any particular time, I just hit the slack is a snooze button you can say it's news on notifications for four hours or whatever.

And then that's all tend to do.

The other thing is built into OS X is the notifications menu here, you slide up and you have this.

Do not disturb.

It's also helpful.

You also can just add click it.

And then at all unreal.

So like all options for childcare.

Yeah, we'll see what other.

I jotted down some other notes.

Oh, yeah.

One thing that wasn't brought up is white boarding.

This stuff has gotten really good.

It used to be horrible.

You know you see these chicken scratches on the screen that are unintelligible.

But if you have a tablet like an and iPad Pro with an apple pencil together with either Microsoft Whiteboard, which is my personal favorite or Google jam board both of them are free.

You can do really good, high quality white boarding on these that are legible by others.

And you can then literally just if you're using Zoom, you can share share that screen on your tablet.

I would show you an example, if it's interesting.

Zoom even has fantastic white party features.

Now Yes, you does have pretty good stuff.

I would say it's a difference of if this is something you want to persist and work on or collaborate across zunes sessions, something you want to centralize like if you're using jam board.

I mean, that fits into the whole G Suite know office products.

Same with Microsoft Whiteboard.

So it's like you can continue to refer back to them and update them over a series of calls if you need to or even prepare for a call.

Yeah And as we see here.

Let's give it to you.

You just mentioned Microsoft's whiteboard and you're on a Mac.

I'm just hearing about this for the first time that I only just see developed in those 10 and I was I was curious if you use it on your Mac.

So So my point is.

So my point with this.

Why they're so usable is with a stylus.

Right So I'm using the Apple pencil on that.

And it's as good as paper for me to write on there like the quality of my I think the quality of my sketches is just as good as if I was doing it in person somewhere got it.

OK I'll just throw this in there.

I use Evernote has pretty nice.

I'll do that with the apple pencil and you can you can share those sketches.

That's true Evernote has improved their work for sketching as well.

So I haven't what.

I haven't tried to do with Evernote is collaborating on the same sketch with other people.

I don't know how that is.

I know that works well with white Ford and GM Ford.

Yeah, that's a good question because Evernote in general has been pretty poor and collaboration real time collaboration on a single note.

I always get no conflicts in that case.

I got to ask a silly question, but on the apple stylus can you.

Or maybe it's a software thing.

Can you change the shape of the tip.

And the size.

Well, yeah.

Yeah Well, that's on the software side.

So when you're using jam board or whiteboard you can change it from a pencil to a marker to highlighter to pen and different with of all those details and grids.

So it helps you draw and they also have what do you call it.

I think it was called, but they'll auto detect the shapes.

So if you draw a circle it'll make it a perfect circle.

If that's it if you like that.

Yeah, it's like snap to whatever.

Yeah Yeah.

There are other tools you can look at on the profile.

But I put itself like stability and flow.

If you're really good.

The notes is another one.

And they're all tools all there.

But make for sketching flowing from Moscow.

And does the will divert as well.

It doesn't actually have that feature.

You just measure where you can draw certainly makes a perfect circle for you as well.

Yeah What I liked about the jam board though, is like you are a sweet shop you have everything in one place.

Are you guys are you guys performing any interviews during this time.

Or are you guys going to put on all we are very firm.

We do.

I mean, we do remote interviews anyway.

So it's not really miss Messing with it.

I mean, the very final one is an in-person but we could do a remote for that to the in-person is just really do they spell.

You know do they have good hygiene.

I mean, at this point, you've talked to them a bunch of times already.

So yeah.

And you guys like the whiteboard tool.

Possibly Yeah.

We use all kinds of stuff for four interviews.

We've done some of like coding challenge type stuff that it's for some reason our legal department is it's giving us issues with that.

Yeah What the hell doesn't an announcement.

I'm actually, I recently tender my resignation at some so no one boards another company.

So I've mentioned in our whole onboarding remotely as well.

So the next three weeks or two weeks and two days I'll be there.

You mean, there is an on site.

Well, there is a revolt. You catch him.

Congrats on the change.

Thanks interesting times to start.

No Yeah but you've been remote so much.

So gear I wanted to get to your question here while we have some time.

So we you you're pretty much a regular on these office hours or haven't attended many of them.

You've heard our other talks on kind of like the Prometheus architectures.

Right And I also have to answer that a couple of times already.

But right now, I'm re implementing and rethinking.

Like I switch companies.

And we are currently like, OK.

And so that's why I like is it actually still the best thing to do.

It's just something else that I might be or should be looking out for.

So right now, my idea is like one premier just operate a protester which has a short term surge of maybe a week and then move one with long term storage, which will go entertain us, which I have never used before like I have not used it.

I use all the time Elasticsearch for long term metric data.

So yeah, I just wanted to get feedback on it and hear what you guys are doing in terms of this.

For example, which I really liked with a deadly search was that I could have all up jobs that basically would delete certain indexes after like three weeks three months for different staging clusters where the metrics are not that important for me for long term search.

But for production.

I really would like to have some metrics will like forever.

Yeah And I forget who it was there was some participant.

Now this is probably back in December, November and talked about Daniel.

So I don't have firsthand experience on Thanos.

And then there's another one competing against.

And so forget what it is.

Both of them had pros and cons and I wish I could find my notes on that plan.

Was it humor you know.

No it wasn't that one.

Anybody want to fill in while I do some rapid googling for what it's worth.

I took Erik and Andrew's advice on using it for Prometheus and that's where great for us.

I got also working with my ephemeral clusters.

So So the esfs is long live.

But the Prometheus operators are could be short lived.

So nice tool.

The interesting thing about it is it buys you a lot of runway especially since you can provision more and more IOPS as necessary and engineering time and effort is often more expensive than the provision to ops though.

So your mileage may vary in the scale of data you guys are dealing with.

I mean, if your Facebook might be different.

But for most companies.

It's not that intense.

Plus when you'd go the tiered approach the Federated approach with Prometheus and you have multiple Prometheus instances with shorter retention of Victoria metrics was the other one.

Yeah And the challenge with some of these systems is they offload the a to another system that you still now have to manage.

And my concern with having a very complex monitoring infrastructure and architecture is then staying on top monitoring your monitoring systems.

So the simpler this system is I the happier it is in my mental model right.

So for me, the long term search is more like for historical data.

And if something is basically use the class us down what happened five minutes before that like stats for what it is actually meant to be.

And for alerting all the stuff that should be in the station cluster.

So that this will stay as simple as possible.

But a long sought job search should be still there in my opinion.

Without picture metrics or something you.

So actually have found something that I will look into.

So awesome.

Thanks for that.

And yeah, I think I found the original blog post that this might have been the one that evaluate compared Thanos with Victorian metrics and the pros and cons of each and pretty like honest assessment of each one in the trade offs I am going to share that officers right now.

Thanks for that.

Yeah, I shared that as the thread of his question about I should residency.

Cool any other questions related to that or going back to the original talking point or any new questions.

It's really open ended here.

So if you haven't joined before we have quite a lot of people on the call here.

If you have any questions design decisions that you're trying to make in your organization is a great chance to get feedback on those.

And I have an abiding interest in any progress Andrews made with the get lab helm charts.

The fact that we're the ones that actually work.

It works fine.

It's just complicated.

Yeah, I was.

Yeah, I had the same experience.

OK, well complicated like like all of these different things you know like external object.

Yeah, they're like there are lots of moving parts that don't necessarily line up.

I'd love for somebody to probably have a particular, I'm not doing so well the operator I want to get the operator to work because basically, I have I no longer have access to like unlimited data about us like I used to.

So I'm running a sort of cheapo digital ocean cluster that like, well sporadically bring stuff up and down.

So I basically, I guess I just want like a scale to 0.

Get lap server and I don't have any particular like you know like it doesn't have to be any particular object storage or any particular web server I'm pretty fucked.

So if you're not.

If you're not going to use like a of USS 3 get lab a provision mineo used mineo.

Yeah, I mean, you know, it comes mineo is sort of like under the covers of a lot of like little toy projects that I end up doing.

And that's good enough.

I suppose.

And we've Yeah, we've been running mineo on our proud cluster actually.

So we SAIC has this thing called the innovation factory and part of it is the skit lab for people to use because there wasn't really a good centralized get solution that anyone could just go in and use.

But you know that spin in like beta for a year about it because we just haven't had the resources to pour into it to get it ready for any kind of a good sl low SLAs.

And so we started out just using mineo and we're still using it.

And it works fine.

It's backed by esfs.

No problems, other than the other day as in like two like Monday our esfs ran out of burst credits and everything came crashing down like to the point where it would not work at all.

So you like get lab was completely unusable.

So all I had to do was go in and up.

The And we weren't using provision day ops at all.

And so I just provisioned some my ops and it was like it was like a switch turned on.

I mean, it was like that.

Everything worked again.

You know it's.

$80 a month.

That's nothing.

You know, that's an hour and a half of my time.

You know.

So totally worth it.

I'm 100% on board with the effects.

I am not one of the doubters when it comes to, you know all kinds of people say, oh, yeah don't run your don't run your stuff on NFL don't run your database or whatever.

Yeah If you're Facebook.

Best idea.

But we've been running it on esfs.

We've been running a Boston database.

We've been running giddily which is the back service for all your know, command line for get lab.

We've been running mineo on off of VFX.

We've been running Jenkins off of VFX for over a year now.

And no problems whatsoever.

Zero zero problems.

Personally, I missed the first part they're on, where does get lab depend on something like object storage like medium well mostly for like the repositories in where it's been used a generic object storage.

It doesn't require like tell you system.

I can tell you exactly what.

Yeah, exactly.

They do elicit a dependent docs but Yeah Yeah it'll do.

Well, you have to tell it what object storage.

Oh the registry.

Sorry, that's another important part.

So artifacts backups packages planets registry and those are all those all go into buckets into three buckets.

You don't have to use those three.

You can use mineo which is this open source tool that mimics the API of S3 i.e.

That makes perfect sense when you say I was curious how they were doing get on S3 like object storage.

And it seemed like a lot of work to implement giddily itself does not which is giddily is the back service that does all the get RBC stuff.

When you say you know git clone whatever you're talking to giddily that doesn't use object storage that just uses a it's in Cuba that is it Staples that backed by a persistent volume claim and that persistent volume claim is in esfs using it has provisionally.

Any additional questions related to this or new questions.

Yeah the mineo or using exactly so mineo is a tool.

It's minack.

And you can you know, it's open source.

You can go get it on GitHub or whatever.

There's a helm chart for it and everything.

And it's a tool where you can basically host on premise.

And $3.

I they had three protocol right.

Yeah, it's exactly the same APIs Amazon S3.

So literally you can like you could have it.

You can even use something like that doesn't require local storage on the earth for it to do what it's doing obviously right.

Oh no it uses the offense to go.

I mean, it's just it all it requires is a persistent volume claim.

Right to put things in field your own history kind of.

Yeah, that's exactly what it is.

It's played out on the street.

But what's cool about it is tools that use Amazon S3 minute is a drop in replacement for them.

All you have to do is change the u.r.l. that it goes to Amazon S3 is S3 down Amazon native US or whatever.

That's Amazon.

That's the URL for us three.

If you change it toward every year you're mineo is being served at.

Everything works.

It's the it's all the same protocols it's all the same authentication.

Yeah, it works great.

My understanding of the texture of mineo isn't too radically complicated either in terms of components.

And services right.

So deploying it in Cuba and 80s with just one pot.

Yeah which is that's pretty amazing.

Yeah could he be up at all or anything like that per adding more feature functionality to this whole container base.

Sort of abstraction layer and a third, I think if I were to get more advanced on storage right now, it would be with Rooks f.

I think that's tending towards Rooks Steph is is tending to be the de facto favorite child right now for Kubernetes.

I'm actually trying to experiment with my grass spread by coastal wood and distributed optics for the law.

It seems pretty straightforward, simple enough in the face.

It looks when you're using set with that or on which one looks off.

Yeah So got like the external USB drives two works into the raspberry pies and then said that because we're called multiple types of back and providers you're sure you're thinking just stuff, though.

Yeah Yeah.

Rook rook and Rousseff are two different tools that.

And I'm not I'm not you know I know how to spell them.

That's about that's about it at this point.

But since rook is the CMC I've certified or whatever choice opera well for Kubernetes then it's going to get the best support compared to things like Gloucester which most people when I talk to them about Gloucester they say, oh, don't use that as a dumpster fire.

Yeah, it's a dumpster fire.

Totally that's the only one I can testify to firsthand experience.

I think many have actually a success story with bluster.

I'd like to hear this open b.s. stuff is like becoming popular too.

Might be worthy of keeping an eye on him.

Yeah, I think pretty well.

If you don't get a beautiful book you end up using open us.

So either one should be for any ticks.

Tips and tricks for using esfs in large capacity are like is a to have one large volume and just like segregated path based or how do you price.

Price wise, it's definitely better to have one large if s because the amount of i.e. ops you get is directly proportional to the number to the number of gigabytes of data you're storing.

So if you've got you know if you've got 10 CSS instances that all have 80 gigabytes in them the IOPS that you get from each one is tiny but if you have one CSS with 800 gigabytes in it you get a lot more.

IOPS and then, of course, you can provision more throughput till like on Monday.

I went and provisions 20 MBPS.

And I think it's probably more than I need.

But you're able to change it like once every 24 hours.

So I can bring it down.

The house like 90 bucks, man.

Yes Yeah.

So I think one would be good.

But then you have to worry about blast radius concern that's right.

If all of a sudden, a database is going crazy on your first full volume.

Other stuff doesn't work.

So I don't let anyone touch the esfs other than the offense for visionary.

No one else has access to it.

So the only things going into that you have employed using that DFS file system, you might have the noisy neighbor problem.

Yeah Yeah.

If you're sharing a big one like it when it comes to UPS like if you run out of first credits.

Definitely Yeah.

I mean, that's the problem we ran into on Monday.

As long as the first credits, then, yeah, you're fine.

That's what we.

That was one of the things that we ran into with Prometheus in the early days was that the volume was so small that we didn't have a diverse credits.

So we ended up having to artificially we will in our case, we provision more IOPS.

I know Andrew, you've also said, you can just write a 0 0 seal file just garbage data to increase the file size of the Earth to increase the size of the file system.

Yeah, you can do that to you like you've got to just work out the pricing would you expect me to resist paying for it and keep the architecture.

Simple then.

Yeah imaging.

Yeah And I know I didn't do that when it came to actually needing to fix this thing when it all came crashing down on Monday.

I just provisioned for throughput.

Yeah, it's kind of like a provision I have on already.

Yes It's just more expensive than just expanding the disk size.

Mm-hmm So yeah I've actually been considering and it's going to be a battle getting my team on board.

Is there you know they think our audience is God's greatest gift to humanity.

But with the way, I'm not I don't see the industry.

But with the way my industry and the government's face is going they really put a premium on making things.

Cloud agnostic.

So I've actually been doing been a lot more interested and doing some more work on looking at everybody tells me not to run a database in communities.

But I kind of want to run a database and companies.

So you should look at post stock SQL like you never lived until you like upgraded your whole Cocker cluster in like 10 minutes worth of planning and five minutes of execution.

And like Harvey any like even a blip of an outage.

So it does.

Man it feels pretty good.

Thompson So well that Wall Street felt like something else after they have shots.

Borkowski I got released.

I mentioned right.

Yeah, I had happily customized charts.

Yeah well this has got all kinds of databases.

Yeah So this is open.

I've heard of this before.

Yeah And its operators for communities to manage these business logic for managing these services on Cuba.

Oh, that's awesome.

Well, you know, I can't speak from firsthand account.

I just know that that's what their prerogative.

She just my business model.

I bought that site.

And I'm just sort of unclear where they're coming from.

But they make some really great stuff.

To pod disruption budgets help.

I mean, can you set them up to like cap the amount of data storage access.

No, that's more about how frequently Kubernetes can nuke it and move it somewhere else.

Oh, OK.

Yeah Yeah.

So it's maintaining stability of the service.

Yeah over rebalancing pods in the cluster that's beautiful.

I am definitely to check this out.

Well cool.

Yeah, I mean, we don't run production grade databases in the cluster but we've had no problem with staging and other stuff in the cluster.

So why mostly personnel issues like we don't have enough time to really understand that system.

But our address is well understood.

So more of just a it's a safety fallback for us, unless we can actually engineer that test it, build it, make sure it works well and actually monitor it and operate it.

Well, it's a little more nuanced worth.

It's not worth the risk to us to not use it.

Yes that's fair.

If you've got if you've got a team like 10 guys you can you can fit that in.

Go for it.

We've got three guys.

That's not enough.

That's a really good point.

And it also goes back to that wise comment Chris fouls that thinks that you know if you're introducing software like this and you don't have the resources to manage the lifecycle of it.

It's going to be in the critical path and the problem.

That's my paraphrasing his statement, which was, if you can't stand the heat, get out of the kitchen.

Yeah, more or less awesome, guys.

So that brings us to the end of the hour.

Thank you for sharing all the tips from working from home.

Brian, I expect to you to be productive.

Now during the next two weeks as a result of this.

Thanks, everyone, for sharing.

Remember to register for our weekly office hours if you haven't already.

Go to cloud plus slash office hours a recording of this call will be posted to the office hours channel as well as syndicated to our podcast at podcast.asco.org dot cloud posse so you can subscribe using whatever podcast software you use.

See you next week.

Same place, same time.

All right, guys.

But I use.

Public “Office Hours” (2020-03-11)

Erik OstermanOffice Hours

Here's the recording from our DevOps “Office Hours” session on 2020-03-11.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours.

It's march 1120 20.

My name is Eric Osterman and I'll be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator that helps startups own their infrastructure in record time by building it for you and then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to unseat yourself at any time you want to jump in to participate.

If you're tuning in from our podcast or YouTube channel you can register for these live interactive sessions by going to cloud posse office hours.

Again, that's cloud posse office hours.

These calls every week will automatically post a video of this recording to the office hours channel on our slack team as well as follow up with an email so you can share it with your team.

If you want to share something in private just ask.

And we can temporarily suspend the recording.

With that said let's kick this off.

So what we have today are a couple of talking points that came up in the last day or so at least in my own dabbling here.

One thing I'm really excited about is that it was just announced yesterday or something that case now has envelope encryption for secrets in case that is that those secrets are separately encrypted with a KMS key.

Not only that the e the Terraform.

Yes OK I linked to the wrong issue here.

That's the JMS module by the W us by the guys at firma ws modules.

There is a reference pull request into the Terraform provider to add support for this it's already supported and then the other thing is the helm too.

I'm excited about this one.

Helm 3, 2 is going to restore that functionality to create namespace as automatically for you.

I totally get your point.

Andrew I saw your comment there by baking gobbler on why it's nice sometimes not to have this functionality.

It are use cases very frequently in preview environments where we bring up environments from scratch that we want to have that namespace created for us and in that case having to do it just reduces the number of escape hatches we need to use to get stuff deployed and that's the nice any other news you guys have seen.

We'd like to call out people around the world are doing a lot of work from home stuff.

Yeah about that.

I'm hoping that might kick off some kind of new wave of revolution.

More working.

Yeah work from home revolution might be cool.

Yeah and create even more problems in the commercial real estate sector.

As if retail store is shutting down enough now.

Next thing we know Google announces they're closing off this isn't going away from home for the past 5 6 years and it's definitely better than going to the office.

Yeah 2 and 1/2 years for me I think it goes both ways obviously.

I'm going right now.

But that's why we have these office hours so we get some of the same banter.

So I've done.

I did two years.

I started an LLC you know my own business.

It was like you know tiny little consultant shop.

I did two years of that and got super lonely because it was literally the only people I talked to all day were customers.

And you have to be like on all the time when you're talking to a customer.

Now I'm going on two years with a team of like 10 people and we talk every day on Zoom and slack and everything and that's been 100% better.

Do you guys have like just channels you can join at any time and talk or hang out or get there or is it always.

Absolutely it was definitely not a random channel and we have like video channels or any like video route.

So to say where people are just hanging out working my team has a couple of different theme accounts and you know someone's usually in one of them but no one not usually.

I'm just curious if that's work for anyone.

One thing we used to do as a part of our helped us team since helped us was global we'd have everybody on a Zoom all day long.

Meaning from like when your shift started to when your shift ended you were on a resume the whole time and like for the first like two weeks you're kind of like, what the hell is going on like what these people are just constantly watching me.

But then you realize it's so helpful because you could literally look up and be like oh like Tom's online like hey Tom.

Can you you know it is just so much easier, especially in a global health team you know compared to somebody sitting next to you.

So I know that is one thing that I've done in the past where you just had this one zoom by d that helped us zoom and everybody was just always and everyone's just muted by default and when you want to I guess that's kind of interesting and it's kind of a take on the Slack channel instead of people protest that at all.

I guess as people join the team like the first week or two is definitely kind of weird but you definitely realize the benefit of it though because like let's say a guy that's sitting next to you is actually at lunch you can just like hey you know I'm in New York you're in Austin or you're in London like I need help with this.

Oh Yeah I got you right now.

And it's just I feel like you solve problems a lot quicker.

Yeah that's kind of the benefit that we got out of it.

And you know I think there were.

Yeah I totally agree with that.

And then it's just like all right if you feel uncomfortable it's like you know just turn off your video when you're not at your desk you know.

Exactly but some people barely have to give up any privacy at all.

I mean turn you turn Video it's like Yeah.

And then if you're not there, then it's like all right you're not available.

But like being on the Zoom it's like hey can I bug you for this.

And like if you're missing it's like oh you might be with another customer or something like it's my team we've started doing like I won't say we're doing XP yet but like we started doing a lot of programming.

And so that's been really nice on Zoom to you know typical XP is like you know two desks two chairs two keyboards two monitors one computer.

That's like typical XP and you can you can mimic it with Zoom.

One person hosts and the other person clicks requests keyboard and Mouse Control and that way they can break in whenever they want.

And it's been nice yet that that goes in line a little bit with what you're you were asking before we kicked off office hours actually like you know what's your protocol for.

Your question was kind of what's your protocol for when it's OK to give somebody the keys to the kingdom.

Like what's the process for that.

How do you determine that somebody is ready for that level of responsibility and trust when they don't want it.

You know I mean over.

I don't know if people are over eager for it.

Then there's this I'm you know it's like how much do you really need.

You know I try to get eliminate those rights for myself where I can.

And you know Yeah typically that overeager disappears.

So it's a little bit like on a need to know basis.

And I think those need to know basis is come up as their responsibility naturally increases.

So I don't think that one needs to give that out automatically or by any compulsory milestone lessons more Yeah for sure.

Cool any questions or interesting observations.

Anybody else hides from the community.

This US bottle rocket container.

What's interesting is that it's been around for a while.

I just stumbled across it today.

They just announced it.

Yeah but I think I've seen some mentions of it.

I'm a little burned out on the like container native os thing just with the number of OS that's out there and then the number.

And then like you know so like cockroaches that's what it was and it's just went well.

Like last week I was on the same with rancher os.

You know what.

When I looked at it when I looked at bottle rocket I was like, man, this sounds a lot like rancher os.

And I was looking at ranger OS and then all of a sudden I found out they're not gay.

You know they're not working on it anymore.

And I was like oh OK.

Yeah so Yeah the timing is a little bit off to come out with another OS when so many of us are getting killed.

Well if you're all in your own Kubernetes clusters what OS would you use.

Well on Amazon I'm just going to use the standard Amazon Linux whatever they ship default.

And if they're going to make this the default fine so be it.

I just I don't I basically I don't want to be concerned with it at the level that we operate in.

Different companies have many different requirements.

Yeah I think certain enterprises maybe like Disney or something require that you must run this version of enterprise red.

But we try not to play that game if we can avoid it.

And companies get into that because they have their own APM distributions or whatever and their own signed packages and their own way of doing it.

Teams that manage that.

Which then makes us even less palatable to pick.

OK we're going to now suddenly bottle rocket which has no historical proof of that, then it's going to stick around for a while.

Now Amazon lets this be really interesting.

I don't know what services has Amazon deprecated in the last 12 months for example compared to like say, Google, or others I don't have an answer for the stops making them any more money well that's Google's thing right.

That's Google's strategy right.

But Amazon has been a little bit more commit haven't you know more relationship commitment based on the.

The concern I would have with bottle rocket is.

I mean just go to the GitHub repo and read the read me and you can immediately tell that the vast majority of their efforts are going to be focused on UK as a native US only.

And so if you come in like there's a comment that got added.

Remember when it got added but it was like hey this would be awesome on Raspberry Pi.

Not a single response.

But it's.

But it's like Yeah.

OK Amazon's making bottle rocket.

And they've already said their first you know.

Very into whatever is going to be for yes.

So unless you're using ks it's not for you yet.

It's going to take a while for all the variance to come out.

So I've been doing a lot of work with the Air Force has this new initiative called DSP and it's got a bunch of different names but platform one is another name for it.

And this guy Nicholas Shalom he's is the Air Force Chief software officer.

He's this guy from Eats from France.

He's you know he's got this crazy French accent and he's but he's brilliant like he's you know he was a he's a serial entrepreneur.

He was a you know multi-millionaire by like 25.

And he's got a ton of patents and stuff and stuff by he's so he's kind of leading the charge on deficit ops inside DSD.

And so he's got this whole initiative going with you know OCI container a native Kubernetes platform and it's completely vendor agnostic.

So it's you know all my efforts lately have been a land up.

No we're not using that we're using native or whatever you know and/or.

So like this whole bottle rocket thing he would just go you know that's AWS.

That's we're not using that.

No way.

If I can't install it anywhere in the world you know including on a frickin' Humvee I'm not interested.

Interesting so basically going for the lowest common denominator across these systems.

100 percent yes.

So even with they're using a lot of OpenShift but they're not allowed to use their they're specifically saying you're not allowed to use any of open shifts special sauce.

You know you can't use.

You're not allowed to use OpenShift build runtime or whatever it's called and all that other special OpenShift stuff you have to use whatever is f compliant is OpenShift making any strides or is it infeasible it rather to have it installed strictly on top of Kubernetes.

Does it always have to be installed at the same level as the control plane itself OpenShift is a distribution and if Cuba is Yeah.

So it is in place of vanilla capabilities it's there's a bunch of different ones out there now there's like VMware can do there's open chef.

There's there was rancher but my understanding is rancher is now building on top of like you can run it ram ranch or on weekends Yeah.

Right ranchers just ranchers just held deployment.

Now you can deploy it anywhere on any creative cluster with what the dumbest namespace ever so cattle system cars.

It's the cattle system.

Yeah I don't know.

And it's like you can't change it.

You try to change it and it breaks everything.

Exactly you know ranchers nice.

I mean use it for a while now and that's what's going on here.

I'm liking ads.

I don't I've never quite gotten far enough along with the to get the value add.

So for me it's the user management.

It's so easy so easy to provision users and say OK here's this is a new team that I'm you know that I'm bringing on and they need access to these five name spaces.

So I'm going to I can hook it up to El dapper or whatever I want and I can say, OK, these five users have access to these five names faces and done.

It seems like that should be a celebrity type of deployment that you can just do declarative.

When you add a new teams you know there's a Terraform provider I can make Terraform for any of that stuff.

Yeah I just wondered how are using Cuban ice to make business logic type of constructs like that.

I'm looking forward to that day.

So yeah speaking of series any new discoveries.

Zack now.

Now I've been knee deep in trying to figure out whether I'm making a huge mistake in pushing out Postgres marriage progress schemas and users and stuff like that through Terraform or not.

So unfortunately most as opposed to as opposed to just having it all in some custom script somewhere you know.

I mean right now I'm working around managed so you can't that it might be my own ignorance when it comes to Terraform but you know what point you draw the line between the actual configuration of the system and provisioning of it.

Right like so with managed postcards were during deployments where we need to have a firewall rule that's going to only allow certain eyepiece through.

So if you want to have pipelines that also do these updates and run Terraform to do these things you need to be able to apply that basically the cic firewall access to do these changes to this managed postscript instance.

And if you want to do that then you have to have some sort of dependency.

So you can't really use the provider for post growth providers you can't have any real dependencies upon.

So I'm just working around all sorts of weird oddball issues like that and realizing how much Terraform I love and I hate at the same time.

Yeah and that was our experience.

Like in the provider thing you're seeing how like you can't provision the database and set up the permissions in the same project.

Is that the case for in general am I. Yeah no that's the case because.

Well unless it's changed recently we had the same problem basically that the provider errors because the hostname doesn't exist because you haven't created yet.

All right.

So it's like day two operations versus day zero of the same thing doing Amazon mq services you have to create the config annex of all config but you can't create it until you know the host names that you want the networks the brokers to be networked together with.

And it's kind of ugly passing in some count flags to make it do something really simple.

And then incremented to make it do something more like what you want.

Yeah it's interesting that Yeah that's the same kind of scenario it's for a chicken and egg scenarios abound in Terraform.

So yeah it's that cold start thing is one thing therefore we don't overinvest in and focus more on what's the day to day operations going to be like adding or removing stuff and worrying about that.

That being said how would you then move forward and do like a pass system where you add another type of craft came out or something along those lines.

Do you use Terraform or do you just make it into some other pipeline from custom script.

Basically comes down what you're describing is a pipeline of operations so moving that into whatever system your organization has adopted This is the problem with Terraform itself is that there's like no concept of an operator unless you consider a provider to be that and dropping into go.

Every time we want to do that isn't what I think would be our solution for that problem.

Write a lot of people in my company would say to use ServiceNow everyone's ServiceNow there's some say service not cool you them fascinating.

Cool any other questions or talking points.

Oh I think someone cheesy had one right one selfishly.

What's whatever his thoughts on it of your certification.

I'll get one point of view.

I mean my.

Yeah so so eight of your certifications are more valuable if you're going to go into working with enterprises which use that whole resum�� filtering type system for that.

The other is depending on the kind of company you want to work with.

So a consultancy like cloud posse we move up the ranks the more certified engineers we have.

So you know that that makes an engineer who passes all the other requirements.

More interesting if they also happen to have like eight of your certifications because we can then move towards like advanced or Premier tiers.

What do you.

I guess.

What do you mean by.

You move cloud policy moves up the rankings so there are different tiers of data.

Yes partners.

OK OK.

OK Yeah exactly.

OK like what is a Microsoft has the gold partner you know.

Exactly OK.

So that becomes that makes you more competitive if you want to work for a consultancy like that other than that.

What I think is great about our industry in general is it's a meritocracy.

It's based on what you've done recently and your accomplishments.

So that speaks, I think more than just your technical understanding of some of these certifications which makes can't make up for experience for sure.

Yeah I kind of see it the same way as your GPA in college right.

Was less once the last time someone asked for your GPA in college was when you're doing a job in his kitchen reason and hopefully hopefully nobody's been asked that question after their first job.

Right so therefore.

Yeah the I have C I like I don't really have any certifications and it hasn't hurt me yet.

And I've worked with a lot of people who have all kinds of certifications and are terrible so you know, just because you have a certification doesn't mean you're good.

I recently went through the process for cloud tossing and you know I've been doing working with AWS since 2006 or so.

So a really, really long time since I was in a private data.

And the questions many questions were very ambiguous to me based on all my experience and working with up us.

And clearly they're asking or looking they're prodding for one answer and that answer is very much based on their documentation so much so that you can often search for it and find that wording but if you learn it more organically it can be like, well, do you it like this way or this way or this way.

And that was my frustration.

So I had to literally without the flashcards and memorized the wording to get it right.

That Yeah because I'm looking into it now just more like I'm probably working with AWS heavily for the past two years and I know how our company uses it and I almost took it as an opportunity to kind of see I think for the associate Certified Solutions architect click associate level of gives you just a big picture of AWS some general you know just studying for it you know I think it's a great way to rapidly increase your exposure to all of that stuff.

And like anything in life I would do it if it's worth it for you.

I wouldn't do it if it's worth it for to reach some love unless your goal is to reach that company that company requires it then that it's that objective.

But if you're doing it because hypothetically this could help your job prospects that's maybe not a concrete enough goal.

OK good enough.

Yeah I think I'd already decided I just wanted to validate my decision.

No I appreciate it.

And it's also Yeah no let's leave it at that.

So case Casey Kent asks the question in chat here.

Question when you can get to it.

When do you think it's necessary to provision another cluster thinking about doing this.

To put data like airflow GTL et related Kubernetes deployments away from production infrastructure.

I could also add some tainted notes on production cluster as well.

Personally, I like per project clusters.

Zack answers.

Personally, I like per project clusters and possibly dedicated stateful data and shared services clusters as well.

So yeah my two cents on this is so I can say what we've been doing.

And then I can say share kind of some of the pros and cons with our approach that I have to reconcile.

I don't think we have the perfect answer anyone does but so Kubernetes is in itself a platform right.

And there's two ways of looking at it.

One is your operations team or how big your company is depends if you even have that but you can be providing companies as a platform where that platform is almost like Amazon is a platform.

So there is one production tier of Amazon for all of us.

Everyone in this Zoom session here we're all using these same Amazon.

It's not like we have access to a staging Amazon.

Amazon is providing Amazon as a service to all of us at the same tier.

So you as a company, could be doing that with Kubernetes.

That means that you're Kubernetes is your staging environments your preview environments acceptance testing data everything could be running on that same platform and that platform would have the SLA of the minimum or the maximum SLAs corresponding to the service with the highest SLAs now.

Well we've been doing a cloud pass he is not doing that approach because ultimately you need to dog food your clusters you need to have environments where you can be testing operational work at the platform layer that is outside of production and if you're doing this in strictly a test that environment you don't have real world usage going on and it's harder to pick up some of the kinds of things that can go wrong.

So while we predominantly do in a typical engagement as we roll out a minimum of three clusters and up to about five clusters and then work out like this.

So we have a dev cluster that's a sandbox account where we can literally do anything that there is basically zero SLAs on that cluster or that environment.

Then we have a staging cluster.

This is as good as production without the strings attached of having to notify customers or anything like that if anything goes down and it allows the team to build up operational competency in a production kind of environment and then we have a data cluster.

So this is kind of addressing your question directly Casey and the data is more for a specific team and that team tends to be like the data scientists.

The machine learning folks to operate in that environment typically needs elevated access to other kinds of data data perhaps that emanates from production or different resources or endpoints.

So that cluster will have its own VPC and its own PPC gateways and be it in its own actual AWS account.

And then we can add better control I am and egress from that cluster.

And then lastly, there's the production cluster.

What I'd like to in the production cluster what I mean by that is it's production for your end users your customers.

But what I've described here has a there's an Achilles heel to this and it's that every cluster I describe is production for some user group.

So the staging cluster is more or less production inwards facing for the company and your cute engineers and you know everything comes to a grinding halt from a QA process and testing process.

If that cluster is offline.

One other cluster I forgot to mention is a core cluster in the core cluster sits in a separate AWS account and is for internally phase iii.

So it's like production for engineering it run your key cloak servers and perhaps your Jenkins servers et cetera.

Your point about running multiple node pools is a good one and I still think that is another tactic to have in your toolbox that you should use.

A perfect example is if you are for some reason running Atlantis we would probably then say run Atlantis maybe in your corp cluster but should it run in a standard node pool.

Probably not.

You should probably run in a separate node pool that's tighter with more.

That's more locked down in terms of what can run.

And then that cluster as a whole.

You really want to lock that down because you have like this pod there that God pod that you can exact into and do anything.

So this is another example of like when considerations when you want to have really separate segmented clusters and environments where the reality of just using I am and are back in all those things to lock it down is, in my mind, a little bit insufficient for it.

So the problem here is that we have this core cluster that's production for internal lab so you know it's OK if it's down a little bit but it should be down a lot.

And then Yeah your staging cluster which is production secure.

You have your dev cluster which is kind of production for developers to do testing and stuff.

So you know the more unstable that is more and everything else is impacted and they end production for your clusters.

And the configuration of these clusters is more or less different.

Out of necessity because they have different roles and does that mean we need a staging concept of staging for each of these clusters and lawless argument.

Yes it's just that we haven't had found a customer that wants to invest in that level of automation related to this there was a Hacker News post one or two weeks ago announcing your GKE these new pricing change for the control plane and you know a lot of people were up in arms over that change and I forget exactly the context that led to this comment that the comment was by Seth embargo and his comment was wait so why are you guys tweeting your clusters as pets.

They should be you know cattle the clusters themselves as part of me reacts like all right you're coming from Google I know you do some amazing engineering over there and you're able to do all of that type of stuff like Netflix does as well they do full on freaking blue green regions and availability zones and all that stuff.

It's just that for most people they don't have the engineering bandwidth in-house to be able to orchestrate do that with regularity.

And the reality is clusters have integration touch points to all the systems that you depend on.

They API keys and secrets are called web hooks and callbacks and all of that stuff.

And orchestrating that from 0 to the end is a monumental task.

My point is it'd be nice if each of these clusters were production like I described, but also had the equivalent of like a blue green strategy that would allow a staggered rollout of each of these environments.

That was a mouthful.

Any questions on what I said or clarifications somebody also shared the link.

Andrew Roth shared a link on how many clusters I haven't seen that link.

Andrew do you want to summarize it.

I read.

I read this link this article a few months ago.

And it really brought home you know kind of the dilemma because the core of the question is you know you can you can do one big large shared cluster and you it's cheaper and it's easier to manage but then your blast radius is really big and everything or you can go all the way to the other side and you can have like clusters all over the place and really you know.

But this table is like so you just have to for your particular environment you just have to pick which little box in this table you're in.

I'll summarize it didn't give an answer so that I read this article as well.

Google also put forth some recommendations around this and they explicitly recommend her team slash project type of setups.

I'll try to find a link and send it out here shortly but Yeah there isn't an answer.

You know it's completely nebulous at the moment.

Well there can never be an answer I think is the bigger point is it's based on you select which of these you're optimizing for and you can optimize for one for solving one of these.

You can optimize for solving the entire matrix right.

We're just getting started with rancher but it's going to I think it's going to make our lives easier when it comes to this kind of stuff because we're going to be able to centralize user management but decentralized cluster management clusters themselves.

So if I have a new team you know I will manage those users in rancher and create for them a cluster all for themselves you know and use rancher to give them access to that cluster.

And you're running rent bill you and you're we are in you're using cops for all the aforementioned reason reasons you mentioned.

I think we're going to get away from cops because Yeah it's kind of not very secure.

We think our key is what we're looking at right now an OpenShift has a really elaborate permissions like Yeah I think the other thing that they were using was mentioning earlier is OK.

He is read your QB daddy's engine OK.

It's a pretty nice offering there.

It's the.

You have your servers and you.

It's how you install communities.

You know it's a alternative to like two bad men.

Gotcha and it's a lot it's a lot simpler.

It's one it's one text config file gamble and it's you know you pointed at the config file and you say Arkady up and bam there's your Cuba data cluster.

And there is a currently in development kind of beta Terraform provider that I've used that worked really well actually.

So I'm excited about it.

Also Iraqi or Iraqi.

Yep so in one swift stroke I was able to use Terraform to provision easy to nodes and then come along afterwards with Arkady to install Kubernetes and then even come along after that with the helm provider for Terraform to install rancher.

And so with one Terraform apply I went from absolutely nothing to a VPC with nodes in it with Kubernetes installed with rancher installed using helm.

Look at that sounds pretty cool though.

That was very exciting to me and it all works perfectly as the health provider been updated yet to support home 3 I don't know but I'm going to be looking into that soon myself.

Yeah I looked into that maybe a month or so ago.

I was curious.

Any movement on that you can have as many things in the Terraform ecosystem move I was a little bit surprised that the helm 3 hadn't been supported already since it was in beta for quite some time before before going GM.

Well if anything it should be easier because there's no tiller.

Yeah it again.

So what to do with the whole get UPS movement.

You guys are very heavy into file.

Yeah do you see yourselves getting away from home file and using something like Argo a point.

All right.

I see.

I can do home.

I can.

Yeah Yeah.

Argos like those kind of the workflow management so Yeah.

Yeah if you need help find less you can define it in Argo.

I would say part of our challenge is that we need to be able to describe reusable components that we can use across customers and implementations and our files kind of that reusable component that lets us describe the logic of how to do it.

I like the things I love and hate about how file and part of the thing is that it's just been the Swiss army knife for us to solve anything.

In the end the end user experience is not that bad.

Once we once we get to using environments I'll show you an example.

Then it's really quite nice.

So if I go to look at quickly an example of what that is project x And let's go to helm files go to like or castle maybe.

So all we ultimately expose is just a very simple schema like this for what you need to change and everything that you know doesn't matter how like the actual and file for this could be rather nasty just like everything else and Terraform can be pretty nasty.

So this is the schema that the developer exposed the maintainer of that chart, which we have no control over but we reduce that to all we know are opinionated version of all the.

All you need to care about all you need to care about is that Yeah Yeah we're doing something similar with home files now we've done get cut.

Yeah get lab key cloak.

Open LDAP Claire looks a little dirty.

I'm trying to get her engagement.

I've been on.

Oh Yeah.

I use it single handedly to construct and weave out whole batch of clusters for our client as a Maid changes on the fly to their requirements.

So it is definitely a good glue tool.

Where I did struggle is bridging that gap between Argo type of applications and help file.

So I mean I was looking at the home file operator amongst other things to ease that transition.

But I never really got that far.

So yeah I didn't notice that there are ways to make home file work through goes.

It's not something that they do by default but it's something that there's custom tools that you can apply to goes.

You need to make it work.

It is undeniably one of my saving graces in this current project.

I was on mobile and the reason why I'm even on this call is because of your home file repo.

Thank you.

By the way of which you know.

So we're working we're refactoring a lot of our health files stuff to support the latest stuff I just show you here and we'll be contributing that stuff upstream later on this year.

I can't see it when but as soon as it's kind of the dust settles we'll get back to that one thing that kind of made my head spin and I'm not sure how I feel about it.

But this conversation jogged this who shared this was a music arcade at one point.

I don't know if I share if I did if I run across it.

Yeah so this kind of make makes my head hurt a little bit to go down this route and this is kind of like this.

This is somewhere like well why are we using text templating to you know parameter drives values for hell and when is this man who can stop and when are we going to use a more formal language to do it.

And how can we.

And it also speaks to presenting a cry for your team to install things and giving you like the ability to do all the testing exposed by going.

I mean this could be done in any language right.

This could be done in Python or Ruby or whatever.

So Dave Dave picked go obviously for this project and what makes my head hurt.

The end user the end result is, I think, something that blaze might have shared.

He says what in the end makes my head hurt is that every app you want to add to the catalogue you literally whip up a whole bunch of go code to generate the actual to install.

So yeah look no thanks.

You definitely solved light.

Yeah you're not templating yellow anymore but the barrier to entry and the maintenance around this.

I really wonder if this is going to be a long lived project Warning anything that puts it into a seal I make it certainly easier to test the waters.

But it doesn't make it any easier to pipeline yellow or certainly doesn't it makes it more interesting.

But you know it's like the UK us TTL command.

You know like why not.

Why would I use that if I have to reform any cable bill.

You know it flies in the face of using kind of get UPS where you want to have a declarative description a document that describes how to deploy it.

I want to have a t-shirt that says declarative nation on it.

That's the underpinning thing that is make me I mean I don't know exciting so I don't think I don't think it make and I could be wrong but I don't think that the makers of arcade are interested in you know production grade type of deployments different horror.

You know they're there cause this is arcade is is strongly correlated with catch up which is strongly correlated with k 3s which is you know so have a Kubernetes cluster up and running in 10 seconds.

You know it's all about get something going as quickly and easily as possible and well hell Yeah.

Arcade install cert manager if that's all I have to fucking do then.

Cool you know Yeah I guess where it's interesting is like what we see here is to me this is now I know there are other examples.

I'm sure you guys can give some examples but this is kind of like our home files repository where we're distributing an opinionated installation of Ubuntu of files.

Any other distributions like that like our home files but using other tooling that you guys can point to.

I appreciate it.

Just so I can get inspiration feel free to share that.

That's not helpful.

That's not helpful based or what I do to anybody else sees another file repository with a dozen or so help files.

Do let me know about that too because I you know that's how we learn from each other is to see patterns that other people are doing.

Yeah I'm trying to get as the other two I'm going to try to get our eyes open source but no problem.

Yeah thriller tools are similar to home pile but none as comprehensive and able in my mind.

So we used helmsman for a long time over a year and then we switched to him file and help file by way is reached almost 2000 stars now.

So it's pretty exciting.

Hey hey if anybody knows Rob all or whatever his name is you feel free to let him know that I'm OK.

I'd be open to working for Datadog now.

Yeah he's not very involved at all anymore and the helm project I signed.

Chime in briefly when there was talk of contributing the project to a native foundation.

But that was the last engagement I saw from him.

I'd never spoken I was so confused there for a second Andrew I thought you were talking but I think were.

No I'm sorry.

No I was just stretching out those.

No I was saying I was saying we have a home file we go you know.

Of not a dozen but maybe half a dozen now but and trying to get them open source is going to be a challenge.

Yeah but I'm going to try to.

Yeah you do.

Hit me up as soon as you if you do get those opens.

Yeah Yeah.

Because we our whole philosophy was when I. So like we've got our biggest one is get lapped by far you know cause get laps home chart is a fucking monstrosity.

Sorry pardon my French.

That's an understatement.

We want our people when they run you know helm file install get lab or whatever the command ends up being with all the defaults that there's a bunch more required parameters.

But once they have met all the required parameters it deploys production ready like it uses an external post stress database it doesn't use you know caused by default it uses a container you know which sometimes is OK.

But for us it's not right.

So what we say anyway.

So what we say instead is we don't care what you run but you'd have to be external.

So if you want to run a container on the side and that's how you're going to say you're going to run your production ready get lab then fine.

But we will not you know turn on the little flag in the helm chart to run the internal Postgres database.

Same thing with like cert manager we don't turn on the internal cert manager.

Same thing with mania.

We don't turn on many.

We make them you know say, OK, what's the names of the S3 buckets and what's the you know if it's not a AWS's three what's the u.r.l. to the S3 host.

Tend to work against you with home files and a breaking amount as well so and and it's been it's been the little experiment has been very nice you know because that's the but it's been the number one challenge when it comes to helm especially with all these open source charts it's like OK fine I see that there's 500 different drink configuration parameters I can set and that's awesome.

Which ones do I set for this to be production ready.

I have no idea.

And there's no documentation to tell me and it's a little bit like that matrix we saw for how many clusters do I need that matrix is going to be different from your or from that and different from word to word.

Sure there's this concept of compose ability you know which is like the property being able to copy something out from one context and paste it into another.

And that seems to be like I'm having a lot of like PTSD around the helm stuff and like Maven like Palmisano you know like course you have PTSD if you've been you know it's like you want to take this snippet of XML and drop it or you know just moving stuff from one file to another and not having the right context if you know that that was one of the bane of my existence and I don't know if killfile simplifies that or makes things more modular.

I have no experience.

I mean I think alpha gives you the opportunity to be internally consistent within an organization or compose ability that way by having a common interface.

But my our files aren't going decompose go with your hand files are now in that same way you almost always have to put your own framework around what your needs are.

So I found that doing things like not enabling default in growth and things like that for a lot of charts certainly helps you know keeping those things completely segmented and in their own model.

So I got to keep things I got to keep things you know generic but he talked about enabling ingress.

So the reason I asked about what's the process for giving people the keys to the castle and stuff is because we gave the keys to the castle too soon to someone then and a bunch of ingress is were created to things that shouldn't have been created and trust issues.

So now I get to look at admission controllers for all kinds of like 4 4 in dresses for STL virtual services for services.

If it's a service and it's not a cluster IP I want to reject it and that's really and so I'm going to use opa am excited.

Yeah Yeah you can do like come up with a little demo or something that really, really not really like that you do.

Yeah Yeah it looks deceptively easy.

I guess I'll say I'll put it that way.

Because I mean you know the open docs are like hey hey here's eight lines of code and that'll do it you know.

So we'll see you know it is always a big gap between hello world and production.

Yeah but I mean it should be very straightforward.

You know because admission controllers look at a particular type of resource right and just do checks on it.

So it should say for all services if type does not equal cluster IP reject other than a small white list like Nginx ingress needs slow balancer but we'll see Yeah.

That's I think this is interesting because I think this the fourth or fifth office hours in a row where opiates come up.

So I think that is an indicator of how relevant that is.

If you're doing anything serious with production on companies for terrible we are.

We've got five minutes left here.

Are there any closing thoughts or last minute questions.

It was nice to finally meet you at scale.

Oh Yeah.

That's awesome.

Thanks for poking in and saying hi that's Todd red 10 for who did you go to any other interesting talk said scam bad one that we're talking about it was primarily around Java apps inside of cubed and using things like grail to minimize startup time using things to compress Docker images down we've got some apps that take 60 to 90 seconds to start up and the prospect of getting that down to two or three seconds is more than just a little enticing is that of milliseconds.

Is that largely the image size or it's not the greatest Tomcat or whatever.

Spring Yeah spring I'm in the same boat dude.

I hear you.

We've got a bite to spring good apps and they're slow and now and to the point that we don't whenever we're setting our limits we for memory we set the request and the limit.

The same and then for the CPU we set the request and we just don't set the limit because startup times get a little better if we just let it consume everything it can.

I've been reading that you shouldn't set speed limits like really for anything because it's different than memory limits.

It doesn't work the same way and you're just artificially limiting things when they don't need to be limiting.

Limited Yeah originally we were kind of trying to avoid all types of over subscription and that's great for memory but it's kind of screwing us whenever it comes to cb.

Yeah because I read I read a lot I read an article Turner emerged with a numerical weather where I would find it.

But it was you know it was talking about how the CPC used scheduler or whatever it's called does the requests and the limits.

You know what it was saying if you've got five pods and they all requests one 100% CPU the CPI you of the system will happily hand each of the pods 20% of its you.

No problem at all.

You know and it it can just kind of figure it all out.

And that made a lot of sense to me.

So I've stopped I've stopped using CPI limits or if I have to use CPI limits because if there's a limit ranger then I'll do like 10.

I think the problem you limit it to me is less quality of service.

And if you want to provide any form of guarantees to service and maintain latency is such that that would be a bad idea to keep it up.

You have a link to that if you can when you find it.

They'd be great if you share that.

I'll try to find it but I was going to say if you looked at caucus for your Java apps not heard of that.

Personally it looks really interesting where I'm here it sounds like it's caucus.

I want to hang out with one.

I don't think.

I'm not sure if spring is is supported but it compiles your Java app to native code.

So I mean like the joke I made the joke I made not too long ago that you know two to three seconds.

No try.

Two to three milliseconds is real.

Like it's as fast as you know I'd go app that's compiled the native code and you can do the same things like install your Java app that's done using caucus onto the scratch container, which is into it.

It's just the bare Linux kernel with nothing in it right like it.

It's literally the smallest can possibly get.

Basically a native Go app.

The guy did mention caucus and to be perfectly forthcoming I had a hard time following some of what he was said because it had a very heavy accent and that's what he was talking about.

IT to the native app.

That's cool.

Well if you tired if you go continue your explorations on that and how many wins can be great if you follow up and share those with us on a subsequent office hours.

I'll do it again.

Sounds like is quite a bit ahead of me already.

He knows the names of the things.

So all right everyone.

Thanks you for sharing.

I learned a few new things this Office Hours.

As always remember to register for our weekly office hours if you haven't already, go to cloud policy office hours and you receive an invite for your calendar.

Thanks again for all your time.

A recording of this call is going to be posted in the office hours channel and syndicated to our podcast that podcast got a positive outcome.

So you can tune in.

However you listen to your podcasts.

See you guys next week.

Same time, same place thanks.

You guys have a good one everybody.

Public “Office Hours” (2020-03-04)

Erik OstermanOffice Hours

Here's the recording from our DevOps “Office Hours” session on 2020-03-04.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here: cloudposse.com/office-hours

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours.

It's march 4th 2020 my name is Eric Osterman and I'll be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator.

We help startups own their infrastructure in record time by building it for you.

And then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to amuse yourself at anytime if you want to jump in and participate.

If you're tuning in from our podcast or YouTube channel, you can register for these live and interactive sessions by going to cloud posse office hours.

Again, that's cloud posse slash office hours.

We host these calls every week will automatically post a video recording of this session to the office hours channel as well as follow up with an email.

So you can share with your team.

If you want to share something in private just ask.

And we can temporarily suspend the recording.

That's it.

Let's kick this off.

So here are some talking points for today.

They are mostly the same as last week.

There's just a bunch of stuff.

We haven't had a chance to cover because we've had so many good questions.

So if there's ever idle conversation here some talking points.

So before we get to these let's open the floor.

Anybody have any questions problems interesting things they're working on that they'd like to share or ask.

I have a question.

All right.

Go up, down.

All right.

So mostly it can actually be like McFaul took out help developers get through sort of the repeated tasks or build up containers efficiently.

So one thing I started to put more focus on is the way in which we have to attach images.

So I'd like for some of you would go at the get shot at the part of the image taken as well as any particle stutters that you guys find that works for you best for tagging the images.

But the images all the naming convention.

Yeah Yeah.

So while we've been practicing mostly as part of our pipelines is that every push of the repo builds a Docker image and tags it with the get shot.

And the short shot.

Honestly we never use this short hash.

We almost always just use the long commit hash and the way we use that is then for separate pipelines.

So for example, if we have a separate pipeline that kicks off down the road and cuts a release like 3 to 2 to three, what we'll do that pipeline will look up the artifact for that commit shore and tag it.

If there is no artifact then that pipeline fails.

So basically, we decouple the building of images and artifacts from the process of retagging those images with the commission.

Now I see a couple different patterns happen here.

And it depends a little bit of what your continued delivery or deployment strategy looks like in companies that don't practice strict assembler for their releases because they want more like just a streamlined process of things hitting master then automatically going to staging pride or other environments.

Those companies tend to use shores for that instead of using similar.

There is still a way to use ember with that, which is kind of nice where if you do if you cut if you make similar part of your commit history.

So you have like a release file release date gamble.

What we've seen in our pipelines then that apply what's in the release that gamble on merge to master.

So then it'll do that for you, which is kind of nice because then you have a totally get driven workflow for cutting releases a little bit more rigid in the sense that you can't just use the release functionality on GitHub then if you want to be consistent in any of your questions there.

Yeah, you did get it.

So additionally, I must see a full adaptation of replacing the maintainer type Docker file with C using labels which gives you a bit more or still take what can be tied up started to use labels to also get shot within it.

What a reversion as well along with actual builds.

Yeah you see that happening there.

So I think that that is an excellent idea.

If you are able to surface that information as part of your c.I. system and your Docker registry and assuming that it helps perhaps your team or others reconstitute what happened.

So this is a big part of code fresh is code fresh uses these labels and images extensively.

So tagging the image or labeling the images.

If I pass labeling the images.

If what you call it.

If your security scanning see the vulnerability scanning passes labeling it with perhaps build time.

So it almost makes a registry like the source of truth for all that extra metadata about that image and that follows that image around wherever it goes.

How have we been practicing it.

It's not been something we've had we invested in.

But I don't think it's a bad idea.

If it's become a project for you.

Yeah, it just start to look closely at it.

Actually I didn't like build times to it.

Vendor ID.

Oh, yeah.

So exactly.

You're on it.

Yes What I'm going to do.

I'm doing it by passing a bill argument.

Yeah So I'll get that.

Environment verbal.

See I am just.

That's the right way to do it.

And where this gets interesting is then if you I mean, obviously, this stuff is only as secure as your registry and is only as secure as your ability to add those labels or preclude systems or processes for labeling it.

But assuming that that process is secure.

This is a great way to also then enforce policies on what gets deployed inside of certain clusters based on those labels.

I'm not sure.

So I'm pretty sure you're using something like that was a twist lock.

We'll let you do that.

And I'm not sure if there's a way of doing it with OPM right now, but maybe somebody else if somebody else knows the answer to that.

Let me know.

Awesome any other follow up questions to that or other questions.

Who are you doing.

Vulnerability scanning right now deal on those images.

Yes this is quite by the personal side of it that we don't use to the hub or existing cortical testing.

So we admit it, but bill time within our history for other images as well as the runtime, which is something that I like.

Just like everything encapsulated them against other ceilings as well as it also has a kitchen like this with vault to execute the runtime but it doesn't look pretty good.

Well, we have to look into possibly replaced just luck like Claire Falco.

So we're exploring that.

But you know put covered as part of that.

Have you explored the east yas Container Registry scanning and how it compares feature wise, and how effective it is by comparison.

Yeah So this is actually it was clear that they are using clear under the are using.

I don't know the level of control that we do get into everything.

But I'm thinking if we do it in duration.

So long.

So we'll get the reporting aspect of it.

But again, that just be based on just what you're offering what I saw it it didn't seem to be much.

Did you see that kwe has also been open source.

Now Yes.

Yeah Yeah well that compares so one thing I'd like to see in any system like that would be the ability to track kind of the meantime resolution for a CV in the system.

So like you don't want to shut down the service because city is suddenly detected there and caused the blackout.

But you do want to track how quickly that.

How long that persisted and until and when was a result.

That's something that you guys are factoring in as well.

So we've had this year where or communication time varies because we have a private resource that keeps just doing that.

So we've been actually some teams and then we'll try to not limit what's possible.

So yeah.

So without a public plan that we have to put this out there not to build but develop a set of resources behind that, which is must start feeling.

So there's a lot of background noise that where you're Dell.

Any chance.

I know.

I know you're always in a well planned environment.

Yeah much in a caucus space that's like today.

OK How much was a bigoted are all taken by the lifetime squatters there.

Yeah Yeah I know that fighting for the phone booths.

All right.

Well, let's see.

Maybe maybe it quiets down a little bit.

There's a question from Casey Kent.

He's been part of the community for a while now.

Yes, he asks the question on jack.

You touch on the set up and best practices with FFK stack on Kubernetes.

If you have some time.

Sure certainly I can point you in the right direction in that case.

So this is very common question that comes up in the community.

Unfortunately, our office hours notes are not properly tagged on like what we talked about or what to do.

Just be aware of that past office hours have talked about this on how to set it up the.

So the efk stack for everyone.

That's the Elasticsearch flew into and cabana stack.

It's become pretty much the most common open source alternative for something like Splunk or Sumo Logic.

So what we would recommend in this case is, first of all configuring floor d not to log directly to your cabana sorry to your Elasticsearch instances because it's so easy to overload Elasticsearch, and when Elasticsearch is unhappy it's really unhappy and it takes a long time to recover.

Also scaling your Elasticsearch clusters.

So you can send a firehose to them is very expensive.

So what you're going to want to do is set up your fluent d to log directly to it like it can easily stream if you're on the W us, which I think you are.

So if you If you drain too if you send all your logs from fluid directly into cornices Guinness is going to absorb those as fast as you can.

And then you have an excellent option to drain that to S3.

So you're going to want to send those log from isas into S3 for long term storage and you can have all of the lifecycle rules and policies there.

We have some great modules on cloud posse for it like a log storage bucket that helps you manage those lifecycle rules very easily.

So you can consider that.

And then the other thing you're going to want to do is drain it for real time search into Elasticsearch.

So both these modes are supported by the Terraform provider by the Terraform resource for this.

I've taught my head to forget exactly what it's called, but it will write directly into S3 and Elasticsearch.

So then the last thing is for like for certain things you'll be able to use Athena if you want to query the data in S3.

So long as I think your query is complete what is it 30 minutes and then 4 for developers and stuff like that.

They have the build time access to the logs inside of elastic.

Now I want to point out one other thing that we've had a lot of success with that.

I like is that there's a little utility.

It's called cube century.

I think there's two there's two options for this.

There's two open source projects and what it'll do is it'll take all your events happening from the Kubernetes event log it and ship those into century century is the exception tracking tool.

And now what's cool is you see the most common exceptions bubbling up to the top.

The most common events and things happening.

And you can assign those two teams to look into using all the conventions that you have in century.

So centuries also open source or if you're using the hosted version that works as well.

So Casey was at a good overview of the way the architecture for setting that up.

Cool So he says that was what he was looking for.

And we're also two other notes.

I mean, where we're typically using the elect managed Elasticsearch by AWS and that also comes with cabana out of the box.

So if you know the path.

There's some path to it.

I forget what it is something.

But if you know that, then you can just access cabana directly there.

You know, a lot of people speak very highly of elastic code and there hosting of Elasticsearch being more robust newer versions newer releases of Elasticsearch.

So that's a consideration as well.

I just.

And it can be controlled with Terraform like everything else.

The challenge there is if your organization has kind of a blank check to use your services.

Now sadly you've got to go get another vendor approved and maybe that's why you wouldn't use it.

All right.

Any any other.

Oh Andrew, I haven't seen you around.

Good to see you join today.

Where've you been I've been busy.

Well, so How's your.

Any interesting news to share with your projects there side projects perhaps dad's garage.

No, not really.

Not so much other than the.

I got that I got my team on board.

So we're going to work on it.

Oh, excellent.

I was not I was not able to get them on board with open sourcing our work.

But maybe someday Yeah but we're going to build it out.

Yeah I like my company in general.

It's not I wish we did more.

Yeah, I think it's very difficult to go from closed source to open source.

And it makes the in-house counsel very uneasy about that.

But if you can get them to agree that certain new projects will be open sourced from the start maybe components like modules and stuff.

The more clear cut cut and dry path to open space.

It sounds like we're there.

Well, I've got art.

I have our chief intellectual property council on board.

I have our vice president on board.

And it just has gotten pushed over to the back foreigner burner.

And it's really a shame because I'm so passionate about it that every few weeks.

I send out a you know, an email on this thread that has been going back for months now.

And like, hey, what's the status on this.

Oh, now what.

So when I was at CBS Interactive.

I was leaning up the cloud architecture over there.

And that was one of my big drives was getting an open source policy an open source initiative at CES.

And yet, I think it took the better part of a year before we were able to open source one project out of that.

Anybody else have any experience helping your organization open source code.

You have.

It's difficult, but it can be like pulling teeth Yeah, we're able to do any of that at your last place John.

No, there was talk of doing it.

But you know just getting that ball rolling this couple a little utility things here and there.

But you know, to get them to understand the value add of open sourcing is quite difficult. Yeah, I actually this is blaze.

I blaze.

Hi So it turns out Mike.

So I've been working at sumo for the last year, and they're pulling back on their open source initiatives.

Really Yes.

In fact, I am officially looking for another job.

Oh, yeah.

Anybody looking for a community evangelist ladies is your guy here.

That's Sumo Logic.

Yeah Yeah, that's where I was.

Yeah, they have I mean, I sort of get it.

They didn't say as much.

But they wanted to an IPO.

So they're basically just not making any investments that don't have a direct immediate payback.

Yeah, I think it's long term, it's probably a strategic mistake mistake because they want to have a bigger presence in the cloud native environment and the competition is just eating them up.

Yeah, that's an interesting one.

I hope, though, that.

And obviously, this is being reported.

And we shouldn't talk about anything we shouldn't talk about.

So let's just keep that in mind.

But the there open source agent.

I mean, I think it's great that they're the Sumo Logic collector is open source.

Hopefully they continue to invest in that.

I know that a number of companies are frustrated sometimes with missing log events and that the agent can be consuming considerable resources just to consume all those logs and stuff like that.

So I think the more people looking at it, the better.

No, actually they're going to be.

I think there's no question that they're less interested in investing in the collection because they don't really see it as their you know the installed agent.

So maybe relying more therefore on third party agents like fluency Yeah bloated that makes more sense.

Yeah Lee and Prometheus are huge parts.

Although interestingly, they have a relatively limited participation in those projects.

But yeah.

So in terms of things being recorded.

So far everything's fine.

They've been very transparent as far as I can tell about that.

But I think that what would work well for me is someone who really wants to get as much adoption of their product as possible.

So if you guys know anybody who is like super aggressive about developer outreach and making sure that their stuff is easy to use, and it works well Yeah, I'm kind of in a little bit of a bind quite honestly, because I think you know for years at Yale for years at Google three years, it suddenly changes somebody you know it's just not the same.

And I find that if people aren't willing to get into continuous improvement.

And if they're not committed to excellence.

I end up getting into trouble by making suggestions or rubbernecking right away.

Well, I think one thing to look look out for though, is if you do want to work more with open source is look for a company that started that way rather than trying it out to see if they could get more customers.

That's such a good point.

Thank you.

Because in the form they start out that way it's built into their DNA.

You can't really undo that.

But the latter.

It's kind of like instant zero.

Look at that sounds obvious.

Yeah Yeah.

So there.

I think there's a number.

Well, one company comes to mind.

I don't know.

You can check out cube cost the cube cost general see if they're looking for anybody.

They have open for being there and doing some interesting things.

Just hearing a lot of fair winds as well.

Yeah fair ones as well.

Yeah core product base camps coming out with their new hey product for email.

So And I don't think you get more open source than the creator of Rails.

Oh my god.

I would give my left nut to work for base camp.

They've been hiring for a while.

So you might want to take a picture put that on the billboard.

All right.

So any other questions related to cloud parsing repos or DevOps in general or best practices or surveys.

Do you want to get a pulse on what other people are doing.

There's a great chance we've got about 70 people on the call right now.

So haven't been in the last three weeks standing up some Amazon queue isn't particularly around network brokering.

And I tried to use the cloud Osi model.

But I couldn't really control the config file in any way.

And I'm having to write a lot of custom tooling around setting variables and making it result in Excel that doesn't blow up.

So yeah let me talk about that in queue module just for a second.

So all of our modules are borne out of actual engagements customer engagements and then we open source that we have this kind of open source first model where we start the modules open source.

This This the use of active Q was for a enterprise Sas product that we were running on prem.

And it didn't work with the it turned out not to work well with the Amazons and service.

So we had to cut back on it so that you know so therefore, they continue to invest.

We haven't had a reason to continue investment on that.

But I will say maxime on my team we had two weeks ago or three weeks ago, we had 130 open pull request against our tariff modules.

And I think we've gone this down to like 13 or something.

So if you do want to spruce it up.

Do you see any ways we can improve it.

Let us know also in terror.

Let me see.

My guess is that module is still each sealed one not each sealed two and some of the template.

Some of the template file manipulation was really basic right in each cell one.

So if we wanted to do any more advanced parameters of that file you would have not been feasible in each cell and one with a CO2.

Now I think it's totally feasible.

So we could have a better, more powerful config that you could pass there or just provide an escape hatch and that you provide the raw x amount.

That's helpful.

I'm not directly familiar with that module right now.

So I might be misstating some things.

But it didn't clear anything up or there additional thing a feedback you have on that.

And that was pretty much it at this point, I've had to pretty much read everything from the ground up.

And if I can figure out any ways to piece it any of that out of there.

And send it back.

Your way, I'd love to do that.

Yeah, for sure.

I feel free.

And this goes for anyone here.

If you have anything you want to contribute back.

You're not sure about the next steps to start on that.

You can always reach out to me on the sweet op slack do you have to join the black team, by the way.

That's a good chance to promote that for a second.

So if you go to slacked suite ops you can join our Slack team.

And then my name's Eric on there you can find me Eric cool Casey Kent asks in the chat common patterns for machine learning infrastructure for continuously training ingesting data and ETF.

There are there's just a ton of stuff out there.

But it'd be nice to hear what you suggest.

So I can't speak to this personally as a subject matter expert.

I can describe a pretty common architecture pattern that one of our customers is using at a very high level.

But I'm not sure if that's even valuable.

You probably already know to that degree.

What I would say there's the whole suite of obviously Amazon's products for content for training the models for machine learning.

We've not touched or looked at it.

Maybe the people here on the select team have been more with us.

Anybody have some context said I zoned out for the beginning of our question.

But have you checked out completely at cloud plaza has not yet worked in Q4.

Yeah, there's a bunch of different UI or API centric different tools.

I did a sort of Kubeflow workshop at a meetup at some point.

That was the extent of my knowledge and thought I was pretty useful for that beginning part.

And then one of the things is you can plug-in different platforms for how you want to host it.

Once you get the model built specifically models that get retrained to lots like marketing models that have seasonality that you want to run a refit over and over again, something like an investment and composing to make sense.

But if it's model you train a few times, then there's like dozens of different ways to do it.

None of which I've been super excited about.

But definitely if you're not sure where to start with your Q4 itself for a typing Q4 versus then you'll see all the other ways, it seems like a good start.

Yeah, I had one thing I mentioned about conference in town.

Yeah scale.

I think is this week.

Oh, yeah.

Thank you for bringing that up.

That's a good tip.

So if you're in Los Angeles or you're close enough scale is happening towards the end of this week.

I think it starts on maybe Thursday.

Yeah And runs through Sunday.

And then there's DevOps days on Fridays.

I'm going to be a devil these days this Friday at the Pasadena convention center.

Pretty much all day.

So if you're there, please hit me up on Slack and I will find a time to meet up for coffee or hang out.

Are you going to go Todd.

I'm not feeling well enough to go bummer.

My kids at home.

I'll be there Friday and Saturday.

Who that.

Sorry I'll be there got an awesome dog.

Thanks for letting me know.

Dude hit me up on Slack.

If you aren't around.

It's enough.

I mean, you bring up a serious note there, though, that a lot of conferences are being canceled like they're dropping like flies right now.

The conferences and Google canceled their ads and you can Amsterdam just got canceled this morning.

Oh, really.

Yeah delayed Kucinich three months.

Yeah some of exactly some of them are postponing them or postponing indefinitely.

So it's too bad.

I'm going to take my chances and see extreme isn't there.

Probably but the you know bless their hearts.

The scale team works for basically no team no for no team for no pay.

And a very minimal minimal budget for a conference of that size.

Some of the some of the equipment for that reporting is pretty dated.

Eric it's Adam Watson.

Hey not to add anything but Pasadena declared a state of emergency an hour ago.

So just a heads up.

All right.

Hopefully that doesn't need like the messenger.

You automatically have jurisdiction to cancel all conferences and stuff.

So Yeah, that's worth checking out to see if that's going to affect scale at all.

Yeah, just said that that was an hour ago.

Figured I'd float that.

Yeah, thanks Adam for bringing that up.

Nobody shoot the messenger.

On the topic of events.

I think nobody in this group is in Boston.

But if you know any people in Boston.

I'm related to knock at a stream that they try to record the talks as well.

So if I can record them for a friend observed 2020.

It's like a CMC s open telemetry related event that a friend of mine tried to put on.

So it's April 7th.

So my hope is that we can get through the like curve and then it will be back down by then but we'll see.

Worst case, we'll try to figure out rescheduling but the link in the observer shot.

So what's that what's that 24 hours of DevOps conference.

Forget what it's called that that might be our future.

What was that thing.

And that was like in December or something.

Yeah And it was last like November last year all date have UPS.

Yeah, there's a couple of those not related to dev apps that have done the far thing.

Not not my type of conference organizing for a.

I like sleeping occasionally.

Yeah And I do like meeting people face.

Actually I mean, honestly, the reason why I go to conferences is to talk to meet the people and hear their stories less the actual talks themselves.

All right.

Any any other specific questions or otherwise maybe I'll jump into practical tricks for change management and get your feedback there for what you've done.

Let's see here.

No, this came up.

I forget who it was that asked for some ask and asked the community at large kind of what you're doing for change management and change control.

I wanted to kind of inventory those tips and tricks to provide guidance because I think just saying, you know just using GitHub isn't enough just having IAM policies isn't enough just having cloud trail audit logs isn't enough.

So what are the things that you have in place for change control and here's kind of a list of some of the things that came to mind as a common best practices today.

So I guess the obvious thing Like, is to bring up obviously having a version control system.

This is your get out.

This is you get lab or bucket.

This is what allows you, if you're practicing infrastructure as code, then to point to the code that should have resulted in a change along the process here.

The next one being infrastructure as code defining the business logic of your infrastructure and using reusable modules for that.

So there's one thing just to write infrastructure code like raw Terraform resources.

But then I do want to capture that a module like a tier from module or help chart is a discrete unit of business logic, which you can kind of sign off organizationally on that this is how you do things.

And then reduce the scope of change control when you're using reusable components there, especially ones that you've signed off on in the organization automation.

Obviously taking what you have now in source control and having a way of getting humans out of the equation because humans are difficult to automate but source control is easy to audit and thinks that anything that is machine control automation, you can continuously refine and improve and have controls in place.

Pull request workflow.

So basically how you enforce that every change is reviewed and approvals on that and related to that.

Having approval steps within your pipeline.

So you might have all the checks and balances in your get out with branch protections and code owners requiring certain checks to pass and a certain number of reviewers.

But in the end, you might want to have still additional controls that are arbitrary and having the ability to have approval steps in your pipelines is an excellent way to have control over when things change and visibility when they change notifications.

I'm sure everyone.

I'm sure a lot of people here are already sending a lot of this stuff to slack.

One thing that we've really liked.

I was surprised how much I liked it was the ability to add a get up comments on pretty much any comic shot.

And then you had that history there.

So if you have a pull request, you can also comment on that on commits and see when that pull request windows commits and that request was deployed into what environment.

So that provides a nice living record changes.

So as I was talking about earlier is kind of using branch protections.

This is very, very, very, very much key to enforcing when stuff change.

So this is something GitHub supports very much.

I'm not I'm less familiar with get lab in this bucket.

Any users here using get lab and big bucket.

How much of the branch protection functionality do they do they have compared to get a bit assuming open source are paid both.

And if you can make the delineation that be great between a recall.

Yes So my expert with lab is that it does actually have the enforcement.

I think can actually set up by bit by default.

Then it starts with oh you can set it up organizationally.

That's nice.

Yeah, that really sucks that.

You can't do that with GitHub.

Yeah, I believe, get lab.

So I have the most experience with on prem get lab open source and I believe I believe free.

Get lab is actually different.

It gives you more on prem get lab open source gives you you can't merge if the pipeline hasn't passed.

But it does not give you a pull request approvals.

No you've got to pay for it.

Wow you've got to pay for the poor credit approvals.

It's the very bottom tier it's only like $4 per user per month.

Yeah, but you got to pay for the poor credit approvals.

I'm not sure about the get left.

So what.

OK, that's good.

That's good.

Does that does get lab have the concept of code owners.

I think that's a good thing, isn't it.

Isn't that just to get thing.

Know what I mean is while entered.

But it's got to be enforced at the pool request approval.

So code owners relate to approvals.

Yeah Yeah Yeah.

OK I got.

So get lab does allow her branch merge protection controls who can merge you have maintainers, developers and maintainers or no one was a role.

And then you also have control who can push through it.

OK with the same role.

Yeah, that's Yeah, that's correct.

Absolutely Yeah.

So code owners.

Is this where you can basically have a file.

And that file will map a team to a path on the file system of that repository.

So you can say that anything in your Terraform IAM project, for example, has to be signed off by SEC ops.

Example get lab does support code owners in the bottom tier of.

Not freedom.

OK, cool.

So in the starter or bronze tier, which is the $4 a month per user.

So the next step is kind of everything we've discussed so far is a little bit at the mercy of your business solution.

Then we get the ability to enforce policies and policy enforcement has been a really hot topic getting a lot of attention especially towards the end of 2019 and I think it's going to even get bigger.

Now in 2020 with tools like open Policy Agent contest, which builds on p.a. and TFC like the tools at your disposal to enforce broader level policies that make it easier to administer change control at an organizational level are reaching greater maturity still early days, but it's at a point.

It's usable now.

And some great videos and demos out there of it.

In fact, we have one on top second.

A basic example that John whipped up will link to John were you.

Did you want to talk about sex today or does she just share that video.

It's up to you.

OK, I guess she another team to an flat.

Like a show.

OK big question came I forgot to ask.

But yeah.

Can I show a quick example.

Who any users interested in seeing a demo right now of a t sac t opsec is a purpose built static static analysis tool for Terraform to enforce policies on your code there.

And using that together with action.

All right.

We got a thumbs up from Adam.

Yeah, sure.

OK before I do that, I would actually add a line to use version pinning that using sender when it comes to some things like Helm charts is not even enough.

You have to use shots.

But that is a valid point.

Let me just add that to Ken you can you write what you said in the officers channels so I don't forget it.

And I'll update this with that with some of the caveats there because there are some caveats like timber is only as good as the maintainers ability to practice it.

And the problem with like Helm is that many maintainers don't actively bump their members.

So they're constantly squashing their version.

And that's the problem.

Like I could push up, one that one.

And you can use it.

And then I could push up another one that one that one with changes.

And you could add in another good point here is that symbol is not cryptographic fully secure versus using Sean's are.

So it's much.

It has been shown you can if you are really bent on Messing people up.

You can probably find some version of a history to cause duplicate Shaw or something.

But generally, it's secure or as John recommends you can tag if you're using some tagging scheme Yeah.

Plus plus the Shah.

That might.

Yeah So that's about this endeavor to add something to the end there.

So the hard thing was some very especially if you're looking at a repository is knowing which one of those is the specific version that I want and putting the version number in there kind of makes that a little bit easier.

You still have to dig into the specific child.

But it can help by tagging on the shot in.

That's good.

Thanks Yeah.

Thanks for telling me that.

It's more of a security topic than it is a changed man.

Yeah, absolutely.

It's just a question of yourself.

Yeah, I think it's hard to have one without considering the other.

Oh, sure.

Damage control.

So this list here is not exhaustive.

So my this was I whipped this up in about 20 minutes.

So if anybody has you know points out what you know things that basically, I want to add to this with things that you're doing and recommendations you have.

So please, if you maybe add to the thread here.

There's a link.

I posted with the change management in the office hours channel.

You add any suggestions.

There is a threat.

I will try and incorporate those into this.

All right.

So John, are you setup.

Awesome I got to hand over the reins and we're going to get a little nice here.

You know accidents.

This was this is unscripted and unplanned.

So forget it.

We will thank the dental gods for a successful.

Yeah, exactly.

Cheering so I kind of wanted to go through.

I guess I should share here right.

I kind of wanted to go through in terms of what TFC and Teflon are kind of showing the actions here as opposed to waiting for the actions to run and things.

But kind of speaking through that.

So one of the questions that came up in the chat after the video was posted was about using TF land as opposed to TFC.

And so you have sick essentially as static processing static analysis for you Terraform.

So it has a set of rules.

It's not super exhaustive, but it does have a pretty good set of rules of things that you want to watch out for, especially along the lines of like security groups security rules out on the internet.

So I have some basic Terraform here.

This meant to fail.

So I have a spider block that's wide open.

Nothing really specific there.

CDP missing some configuration here.

And this and Azure managed disk is actually set to false.

So in running t of SEC here and expand this a little bit.

It basically looks at my code and determines hey, this site or blog actually should not be wide open.

This actually you should use HGTV as not ETP this one here is actually missing a VPC configuration.

And this one needs to be secured or encrypted given out.

But there are those times where you actually need something like if it's a web server right.

You need to actually utilize this open CIDR block here.

So it has the syntax to where you can actually tell it to ignore one of these one or more.

And you can see it actually is missing that one.

Now Same thing with you.

Laughs thanks you too.

Yes things like that.

But what you see here is that there wasn't a catch on my specific t linked code over here that is utilizing a T12.

So the T12 extra large is a size that does not exist.

Right So if I run t offline here it doesn't catch any of these security issues, but it did catch that my specific instance type is invalid.

So I think there's not a direct one to one comparison to say, hey, you should use Teflon instead of TFC.

I think they both are useful for different purposes, even though there may be at times a little bit of overlap.

But as you can see, I'm using a ws ball.

I just have a shortcut to a B because I don't like typing a lot.

But if I put in like I did wrong.

Am I. It actually is talking to a ws so I like to set my region and all that.

So it's actually talking to a ws and looking to see is this a valid.

So there is a little bit of a cost here in the sense of like speed.

So if you hook it up to like a premium it hook or something like that.

It may take a second, depending on how big your actual tariff on project actually is.

But it's definitely a pretty good.

How well does it work.

If you're using like almost exclusively modules and stuff like that.

I think it's actually operating at the resource level it does.

But they actually do have modules support.

And I started working to kind of get this up and I'll add another video that goes through these fully, but it actually can check a module to see if the actual types exist and actually go into the module to make sure that the resources in the modules actually are valid.

That can have a little pro and con depending on the open source module.

But you may be using.

There may be some issues there.

But the good thing is that they do have this ignore flag that you can use there as well as a full ACL config file that you can use.

And you can tell it to ignore certain modules in here provider of our files variables specific credentials and also tell it to disable certain rules.

But the rules are actually, I think it's 700 or something rules.

Yeah 700 plus rules wow that's a lot.

Yeah Yeah, that's good.

That's cool.

Are you able to add additional rules.

Yeah, they actually have a way to configure it actually didn't go through that part.

But there's a way to extend it beyond just the basic configuration.

Yeah, they give you get an action run that you've teed up.

We're almost out of time.

I think so.

I'm sure we have time to go through the full run.

But I do get some and so I have this one for the TFC SEC which was basically the same thing that we just looked at there.

So now we actually go to the other one.

So let's see a seconds here.

This is one of the passing runs.

It's very simple.

It's not a lot that is happening.

It's basically just running TFC.

But the configuration usage is basically just this.

There are configuration changes that you can add in here variables those sort of things.

But this will pretty much run the 45 SEC on the current directory and let you know whether or not it's passing or failing.

You can do it on the PR as well.

Of course.

But it's very useful to actually get a heads up that something actually happened at this time.

I know I can't see it because it's not signed in.

And this is browser.

But it does give before output.

So related.

Also relate to this you keep your training to SEC repo that's open.

That's public right.

Yes Yeah.

Yeah All of those are public.

Yeah So this is.

So we'll share that officers.

This is the full output here.

So yeah, it's very useful.

Very quick to set up.

Very easy to use, and it definitely will help catch some of those issues that you just may miss.

Where I see this being exceptionally valuable is if you're practicing a traditional get workflow where you deploy on merge to master meaning that I've already lost the recourse to make any corrections by the time you've already merged and you want to mitigate the failures after merge to master.

So I think this is a really nice way to avoid that.

If you're not applying before merge to master, which is like at length.

This workflow.

Exactly And especially with, like TFA land here.

If I just fat finger that.

And it's not like some malicious issue whenever I run it, I find out, oh, I actually have an issue here before I go through and apply it just as useful to find that stuff out as early as possible in terms of like a software development cycle.

Something else that I've looked at in this space.

I haven't had a chance to use it yet, but it looks very interesting is you can use opa to evaluate Terraform yellow and contests.

So the contest builds on opa in a more opinionated way as well.

So that's kind of cool.

The example, they gave us is actually kind of a useful one.

Yeah the example, they give is you know your cut.

You have decided that you don't want your Terraform scripts to be too big and you create an opa policy that says you're not allowed to create more than x number of resources with one Terraform apply and opa runs in your pipeline or whatever.

But before you apply and can actually stop you from.

The term you like to use is blast radius.

You know your blast radius has to be smaller than a certain limit.

Yeah which actually has as I work on it because I work on the first apply where you may be generating it.

It creates it looks at the plan.

It doesn't look at the Apply at like it.

It will create a play a Terraform plan and then look at the plan.

Yeah why does value.

Well, I like it.

I wonder how well it works in practice.

But yeah in principle, I like it.

The reason why is like you using a lot of modules and modules you modules modules and it's very easy.

The plan doesn't care.

Well right.

When you create it when you do Terraform plant even if you're just doing modules.

It's going to tell you exactly what's actually being.

Yeah but it's created the convention.

But usually I'll have 25 resources, at least getting graded if I'm using a module, for example, anything that does something more serious is going to be creating a lot of resources.

So the cold start problem like once it's been provisioned I can imagine that we don't want to see too many new resources created all the time.

But let's finish it up from scratch.

I wonder like I agree with small pull requests.

But even a small pull request can have a big plan.

Yeah, we're almost at the end of the hour, we got 5 minutes left any questions.

Related maybe to the staff for the security of plants of John D. What would you say the learning curve is on over here.

I can't speak from firsthand account.

It's in line with the rest of the industry like to pretend it is.

And it's high along with everything else in there.

I think it's very readable.

But like any Arrigo reads it.

I think the hardest part about it is developing opinions that can be codified.

Into policies, not what like.

We know we want to do it.

What should we actually be doing.

Like that's the hard part.

The language is super readable very easy.

You'll pick it up in less than a half hour or something.

But get that with the same Walker actually you know where you to look like predefined plots.

That would make sense.

Like a use case.

Yeah, that was right where that's done.

Start small though.

I mean, we're starting.

We haven't really used it much.

But the first thing we're going to do is we're going to use it to create a mutating admission controller that all it's going to do is check that every pot in the cluster has a label with a Charge Code so that we can track back for billing Reverend.

That's all at us like that's all it's going to do.

Yeah quite a small win.

Let's go.

And that kind of a policy Agent would deter the use of the policy is a little different, right.

Because you wouldn't be doing that at Ci would you.

Well, you're doing this that that would go into a mutating edition controller.

Inside the cluster.

But you can use that you can use up it that way.

Mm-hmm Oh but opa can be used.

I mean opus plus great.

OK You can use Rigo with opa all over the place.

Yeah, it's I think he's giving some enterprise vendors a run for the money.

We're trying to do policy stuff.

Why I could tell you for sure.

Hershey corp. sends the sentinel.

It's very much so kind of locked in that same vein without Opie.

And I don't use it much.

But probably much for other issues.

OK Just scanning it real quick.

There's the home page open Policy Agent dot org has a tiny little example for an emission controller.

It's eight lines one right.

Yeah, it's eight lines that checks that it's eight lines and in those eight lines it checks that all pods come from trusted registries.

Wow That's good.

That's great.

That's a that's a really that's a great policy there.

I can see it to have.

And similarly to that it would be like even if you need to use public images that you're pushing them either to private ones or you have all through to you that entered in their graphics specifically how armor.

Yeah all sorts.

Yeah And it's just a common engine for it.

I like that coming was pretty good.

And because it's a resource we can all build on it as a community just like Terraform and Helm registries and stuff like that.

All right, everyone looks like we've reached the end of the hour.

And that about wraps things up for this week.

Remember to register for our weekly office hours if you haven't already, go to cloud posse office hours.

Again thanks for sharing everything.

John for the live demo there of the SEC and tee off lint stuff that was really interesting recording of this call will be posted in the office hours channel and syndicated to our podcast at podcast.asco.org dot cloud posse.

See you next week.

Same place, same time.