Public “Office Hours” (2020-03-11)

Erik OstermanOffice Hours

1 min read

Here's the recording from our DevOps “Office Hours” session on 2020-03-11.

We hold public “Office Hours” every Wednesday at 11:30am PST to answer questions on all things DevOps/Terraform/Kubernetes/CICD related.

These “lunch & learn” style sessions are totally free and really just an opportunity to talk shop, ask questions and get answers.

Register here:

Basically, these sessions are an opportunity to get a free weekly consultation with Cloud Posse where you can literally “ask me anything” (AMA). Since we're all engineers, this also helps us better understand the challenges our users have so we can better focus on solving the real problems you have and address the problems/gaps in our tools.

Machine Generated Transcript

Let's get the show started.

Welcome to Office hours.

It's march 1120 20.

My name is Eric Osterman and I'll be leading the conversation.

I'm the CEO and founder of cloud posse.

We are a DevOps accelerator that helps startups own their infrastructure in record time by building it for you and then showing you the ropes.

For those of you new to the call the format is very informal.

My goal is to get your questions answered.

So feel free to unseat yourself at any time you want to jump in to participate.

If you're tuning in from our podcast or YouTube channel you can register for these live interactive sessions by going to cloud posse office hours.

Again, that's cloud posse office hours.

These calls every week will automatically post a video of this recording to the office hours channel on our slack team as well as follow up with an email so you can share it with your team.

If you want to share something in private just ask.

And we can temporarily suspend the recording.

With that said let's kick this off.

So what we have today are a couple of talking points that came up in the last day or so at least in my own dabbling here.

One thing I'm really excited about is that it was just announced yesterday or something that case now has envelope encryption for secrets in case that is that those secrets are separately encrypted with a KMS key.

Not only that the e the Terraform.

Yes OK I linked to the wrong issue here.

That's the JMS module by the W us by the guys at firma ws modules.

There is a reference pull request into the Terraform provider to add support for this it's already supported and then the other thing is the helm too.

I'm excited about this one.

Helm 3, 2 is going to restore that functionality to create namespace as automatically for you.

I totally get your point.

Andrew I saw your comment there by baking gobbler on why it's nice sometimes not to have this functionality.

It are use cases very frequently in preview environments where we bring up environments from scratch that we want to have that namespace created for us and in that case having to do it just reduces the number of escape hatches we need to use to get stuff deployed and that's the nice any other news you guys have seen.

We'd like to call out people around the world are doing a lot of work from home stuff.

Yeah about that.

I'm hoping that might kick off some kind of new wave of revolution.

More working.

Yeah work from home revolution might be cool.

Yeah and create even more problems in the commercial real estate sector.

As if retail store is shutting down enough now.

Next thing we know Google announces they're closing off this isn't going away from home for the past 5 6 years and it's definitely better than going to the office.

Yeah 2 and 1/2 years for me I think it goes both ways obviously.

I'm going right now.

But that's why we have these office hours so we get some of the same banter.

So I've done.

I did two years.

I started an LLC you know my own business.

It was like you know tiny little consultant shop.

I did two years of that and got super lonely because it was literally the only people I talked to all day were customers.

And you have to be like on all the time when you're talking to a customer.

Now I'm going on two years with a team of like 10 people and we talk every day on Zoom and slack and everything and that's been 100% better.

Do you guys have like just channels you can join at any time and talk or hang out or get there or is it always.

Absolutely it was definitely not a random channel and we have like video channels or any like video route.

So to say where people are just hanging out working my team has a couple of different theme accounts and you know someone's usually in one of them but no one not usually.

I'm just curious if that's work for anyone.

One thing we used to do as a part of our helped us team since helped us was global we'd have everybody on a Zoom all day long.

Meaning from like when your shift started to when your shift ended you were on a resume the whole time and like for the first like two weeks you're kind of like, what the hell is going on like what these people are just constantly watching me.

But then you realize it's so helpful because you could literally look up and be like oh like Tom's online like hey Tom.

Can you you know it is just so much easier, especially in a global health team you know compared to somebody sitting next to you.

So I know that is one thing that I've done in the past where you just had this one zoom by d that helped us zoom and everybody was just always and everyone's just muted by default and when you want to I guess that's kind of interesting and it's kind of a take on the Slack channel instead of people protest that at all.

I guess as people join the team like the first week or two is definitely kind of weird but you definitely realize the benefit of it though because like let's say a guy that's sitting next to you is actually at lunch you can just like hey you know I'm in New York you're in Austin or you're in London like I need help with this.

Oh Yeah I got you right now.

And it's just I feel like you solve problems a lot quicker.

Yeah that's kind of the benefit that we got out of it.

And you know I think there were.

Yeah I totally agree with that.

And then it's just like all right if you feel uncomfortable it's like you know just turn off your video when you're not at your desk you know.

Exactly but some people barely have to give up any privacy at all.

I mean turn you turn Video it's like Yeah.

And then if you're not there, then it's like all right you're not available.

But like being on the Zoom it's like hey can I bug you for this.

And like if you're missing it's like oh you might be with another customer or something like it's my team we've started doing like I won't say we're doing XP yet but like we started doing a lot of programming.

And so that's been really nice on Zoom to you know typical XP is like you know two desks two chairs two keyboards two monitors one computer.

That's like typical XP and you can you can mimic it with Zoom.

One person hosts and the other person clicks requests keyboard and Mouse Control and that way they can break in whenever they want.

And it's been nice yet that that goes in line a little bit with what you're you were asking before we kicked off office hours actually like you know what's your protocol for.

Your question was kind of what's your protocol for when it's OK to give somebody the keys to the kingdom.

Like what's the process for that.

How do you determine that somebody is ready for that level of responsibility and trust when they don't want it.

You know I mean over.

I don't know if people are over eager for it.

Then there's this I'm you know it's like how much do you really need.

You know I try to get eliminate those rights for myself where I can.

And you know Yeah typically that overeager disappears.

So it's a little bit like on a need to know basis.

And I think those need to know basis is come up as their responsibility naturally increases.

So I don't think that one needs to give that out automatically or by any compulsory milestone lessons more Yeah for sure.

Cool any questions or interesting observations.

Anybody else hides from the community.

This US bottle rocket container.

What's interesting is that it's been around for a while.

I just stumbled across it today.

They just announced it.

Yeah but I think I've seen some mentions of it.

I'm a little burned out on the like container native os thing just with the number of OS that's out there and then the number.

And then like you know so like cockroaches that's what it was and it's just went well.

Like last week I was on the same with rancher os.

You know what.

When I looked at it when I looked at bottle rocket I was like, man, this sounds a lot like rancher os.

And I was looking at ranger OS and then all of a sudden I found out they're not gay.

You know they're not working on it anymore.

And I was like oh OK.

Yeah so Yeah the timing is a little bit off to come out with another OS when so many of us are getting killed.

Well if you're all in your own Kubernetes clusters what OS would you use.

Well on Amazon I'm just going to use the standard Amazon Linux whatever they ship default.

And if they're going to make this the default fine so be it.

I just I don't I basically I don't want to be concerned with it at the level that we operate in.

Different companies have many different requirements.

Yeah I think certain enterprises maybe like Disney or something require that you must run this version of enterprise red.

But we try not to play that game if we can avoid it.

And companies get into that because they have their own APM distributions or whatever and their own signed packages and their own way of doing it.

Teams that manage that.

Which then makes us even less palatable to pick.

OK we're going to now suddenly bottle rocket which has no historical proof of that, then it's going to stick around for a while.

Now Amazon lets this be really interesting.

I don't know what services has Amazon deprecated in the last 12 months for example compared to like say, Google, or others I don't have an answer for the stops making them any more money well that's Google's thing right.

That's Google's strategy right.

But Amazon has been a little bit more commit haven't you know more relationship commitment based on the.

The concern I would have with bottle rocket is.

I mean just go to the GitHub repo and read the read me and you can immediately tell that the vast majority of their efforts are going to be focused on UK as a native US only.

And so if you come in like there's a comment that got added.

Remember when it got added but it was like hey this would be awesome on Raspberry Pi.

Not a single response.

But it's.

But it's like Yeah.

OK Amazon's making bottle rocket.

And they've already said their first you know.

Very into whatever is going to be for yes.

So unless you're using ks it's not for you yet.

It's going to take a while for all the variance to come out.

So I've been doing a lot of work with the Air Force has this new initiative called DSP and it's got a bunch of different names but platform one is another name for it.

And this guy Nicholas Shalom he's is the Air Force Chief software officer.

He's this guy from Eats from France.

He's you know he's got this crazy French accent and he's but he's brilliant like he's you know he was a he's a serial entrepreneur.

He was a you know multi-millionaire by like 25.

And he's got a ton of patents and stuff and stuff by he's so he's kind of leading the charge on deficit ops inside DSD.

And so he's got this whole initiative going with you know OCI container a native Kubernetes platform and it's completely vendor agnostic.

So it's you know all my efforts lately have been a land up.

No we're not using that we're using native or whatever you know and/or.

So like this whole bottle rocket thing he would just go you know that's AWS.

That's we're not using that.

No way.

If I can't install it anywhere in the world you know including on a frickin' Humvee I'm not interested.

Interesting so basically going for the lowest common denominator across these systems.

100 percent yes.

So even with they're using a lot of OpenShift but they're not allowed to use their they're specifically saying you're not allowed to use any of open shifts special sauce.

You know you can't use.

You're not allowed to use OpenShift build runtime or whatever it's called and all that other special OpenShift stuff you have to use whatever is f compliant is OpenShift making any strides or is it infeasible it rather to have it installed strictly on top of Kubernetes.

Does it always have to be installed at the same level as the control plane itself OpenShift is a distribution and if Cuba is Yeah.

So it is in place of vanilla capabilities it's there's a bunch of different ones out there now there's like VMware can do there's open chef.

There's there was rancher but my understanding is rancher is now building on top of like you can run it ram ranch or on weekends Yeah.

Right ranchers just ranchers just held deployment.

Now you can deploy it anywhere on any creative cluster with what the dumbest namespace ever so cattle system cars.

It's the cattle system.

Yeah I don't know.

And it's like you can't change it.

You try to change it and it breaks everything.

Exactly you know ranchers nice.

I mean use it for a while now and that's what's going on here.

I'm liking ads.

I don't I've never quite gotten far enough along with the to get the value add.

So for me it's the user management.

It's so easy so easy to provision users and say OK here's this is a new team that I'm you know that I'm bringing on and they need access to these five name spaces.

So I'm going to I can hook it up to El dapper or whatever I want and I can say, OK, these five users have access to these five names faces and done.

It seems like that should be a celebrity type of deployment that you can just do declarative.

When you add a new teams you know there's a Terraform provider I can make Terraform for any of that stuff.

Yeah I just wondered how are using Cuban ice to make business logic type of constructs like that.

I'm looking forward to that day.

So yeah speaking of series any new discoveries.

Zack now.

Now I've been knee deep in trying to figure out whether I'm making a huge mistake in pushing out Postgres marriage progress schemas and users and stuff like that through Terraform or not.

So unfortunately most as opposed to as opposed to just having it all in some custom script somewhere you know.

I mean right now I'm working around managed so you can't that it might be my own ignorance when it comes to Terraform but you know what point you draw the line between the actual configuration of the system and provisioning of it.

Right like so with managed postcards were during deployments where we need to have a firewall rule that's going to only allow certain eyepiece through.

So if you want to have pipelines that also do these updates and run Terraform to do these things you need to be able to apply that basically the cic firewall access to do these changes to this managed postscript instance.

And if you want to do that then you have to have some sort of dependency.

So you can't really use the provider for post growth providers you can't have any real dependencies upon.

So I'm just working around all sorts of weird oddball issues like that and realizing how much Terraform I love and I hate at the same time.

Yeah and that was our experience.

Like in the provider thing you're seeing how like you can't provision the database and set up the permissions in the same project.

Is that the case for in general am I. Yeah no that's the case because.

Well unless it's changed recently we had the same problem basically that the provider errors because the hostname doesn't exist because you haven't created yet.

All right.

So it's like day two operations versus day zero of the same thing doing Amazon mq services you have to create the config annex of all config but you can't create it until you know the host names that you want the networks the brokers to be networked together with.

And it's kind of ugly passing in some count flags to make it do something really simple.

And then incremented to make it do something more like what you want.

Yeah it's interesting that Yeah that's the same kind of scenario it's for a chicken and egg scenarios abound in Terraform.

So yeah it's that cold start thing is one thing therefore we don't overinvest in and focus more on what's the day to day operations going to be like adding or removing stuff and worrying about that.

That being said how would you then move forward and do like a pass system where you add another type of craft came out or something along those lines.

Do you use Terraform or do you just make it into some other pipeline from custom script.

Basically comes down what you're describing is a pipeline of operations so moving that into whatever system your organization has adopted This is the problem with Terraform itself is that there's like no concept of an operator unless you consider a provider to be that and dropping into go.

Every time we want to do that isn't what I think would be our solution for that problem.

Write a lot of people in my company would say to use ServiceNow everyone's ServiceNow there's some say service not cool you them fascinating.

Cool any other questions or talking points.

Oh I think someone cheesy had one right one selfishly.

What's whatever his thoughts on it of your certification.

I'll get one point of view.

I mean my.

Yeah so so eight of your certifications are more valuable if you're going to go into working with enterprises which use that whole resum�� filtering type system for that.

The other is depending on the kind of company you want to work with.

So a consultancy like cloud posse we move up the ranks the more certified engineers we have.

So you know that that makes an engineer who passes all the other requirements.

More interesting if they also happen to have like eight of your certifications because we can then move towards like advanced or Premier tiers.

What do you.

I guess.

What do you mean by.

You move cloud policy moves up the rankings so there are different tiers of data.

Yes partners.


OK Yeah exactly.

OK like what is a Microsoft has the gold partner you know.

Exactly OK.

So that becomes that makes you more competitive if you want to work for a consultancy like that other than that.

What I think is great about our industry in general is it's a meritocracy.

It's based on what you've done recently and your accomplishments.

So that speaks, I think more than just your technical understanding of some of these certifications which makes can't make up for experience for sure.

Yeah I kind of see it the same way as your GPA in college right.

Was less once the last time someone asked for your GPA in college was when you're doing a job in his kitchen reason and hopefully hopefully nobody's been asked that question after their first job.

Right so therefore.

Yeah the I have C I like I don't really have any certifications and it hasn't hurt me yet.

And I've worked with a lot of people who have all kinds of certifications and are terrible so you know, just because you have a certification doesn't mean you're good.

I recently went through the process for cloud tossing and you know I've been doing working with AWS since 2006 or so.

So a really, really long time since I was in a private data.

And the questions many questions were very ambiguous to me based on all my experience and working with up us.

And clearly they're asking or looking they're prodding for one answer and that answer is very much based on their documentation so much so that you can often search for it and find that wording but if you learn it more organically it can be like, well, do you it like this way or this way or this way.

And that was my frustration.

So I had to literally without the flashcards and memorized the wording to get it right.

That Yeah because I'm looking into it now just more like I'm probably working with AWS heavily for the past two years and I know how our company uses it and I almost took it as an opportunity to kind of see I think for the associate Certified Solutions architect click associate level of gives you just a big picture of AWS some general you know just studying for it you know I think it's a great way to rapidly increase your exposure to all of that stuff.

And like anything in life I would do it if it's worth it for you.

I wouldn't do it if it's worth it for to reach some love unless your goal is to reach that company that company requires it then that it's that objective.

But if you're doing it because hypothetically this could help your job prospects that's maybe not a concrete enough goal.

OK good enough.

Yeah I think I'd already decided I just wanted to validate my decision.

No I appreciate it.

And it's also Yeah no let's leave it at that.

So case Casey Kent asks the question in chat here.

Question when you can get to it.

When do you think it's necessary to provision another cluster thinking about doing this.

To put data like airflow GTL et related Kubernetes deployments away from production infrastructure.

I could also add some tainted notes on production cluster as well.

Personally, I like per project clusters.

Zack answers.

Personally, I like per project clusters and possibly dedicated stateful data and shared services clusters as well.

So yeah my two cents on this is so I can say what we've been doing.

And then I can say share kind of some of the pros and cons with our approach that I have to reconcile.

I don't think we have the perfect answer anyone does but so Kubernetes is in itself a platform right.

And there's two ways of looking at it.

One is your operations team or how big your company is depends if you even have that but you can be providing companies as a platform where that platform is almost like Amazon is a platform.

So there is one production tier of Amazon for all of us.

Everyone in this Zoom session here we're all using these same Amazon.

It's not like we have access to a staging Amazon.

Amazon is providing Amazon as a service to all of us at the same tier.

So you as a company, could be doing that with Kubernetes.

That means that you're Kubernetes is your staging environments your preview environments acceptance testing data everything could be running on that same platform and that platform would have the SLA of the minimum or the maximum SLAs corresponding to the service with the highest SLAs now.

Well we've been doing a cloud pass he is not doing that approach because ultimately you need to dog food your clusters you need to have environments where you can be testing operational work at the platform layer that is outside of production and if you're doing this in strictly a test that environment you don't have real world usage going on and it's harder to pick up some of the kinds of things that can go wrong.

So while we predominantly do in a typical engagement as we roll out a minimum of three clusters and up to about five clusters and then work out like this.

So we have a dev cluster that's a sandbox account where we can literally do anything that there is basically zero SLAs on that cluster or that environment.

Then we have a staging cluster.

This is as good as production without the strings attached of having to notify customers or anything like that if anything goes down and it allows the team to build up operational competency in a production kind of environment and then we have a data cluster.

So this is kind of addressing your question directly Casey and the data is more for a specific team and that team tends to be like the data scientists.

The machine learning folks to operate in that environment typically needs elevated access to other kinds of data data perhaps that emanates from production or different resources or endpoints.

So that cluster will have its own VPC and its own PPC gateways and be it in its own actual AWS account.

And then we can add better control I am and egress from that cluster.

And then lastly, there's the production cluster.

What I'd like to in the production cluster what I mean by that is it's production for your end users your customers.

But what I've described here has a there's an Achilles heel to this and it's that every cluster I describe is production for some user group.

So the staging cluster is more or less production inwards facing for the company and your cute engineers and you know everything comes to a grinding halt from a QA process and testing process.

If that cluster is offline.

One other cluster I forgot to mention is a core cluster in the core cluster sits in a separate AWS account and is for internally phase iii.

So it's like production for engineering it run your key cloak servers and perhaps your Jenkins servers et cetera.

Your point about running multiple node pools is a good one and I still think that is another tactic to have in your toolbox that you should use.

A perfect example is if you are for some reason running Atlantis we would probably then say run Atlantis maybe in your corp cluster but should it run in a standard node pool.

Probably not.

You should probably run in a separate node pool that's tighter with more.

That's more locked down in terms of what can run.

And then that cluster as a whole.

You really want to lock that down because you have like this pod there that God pod that you can exact into and do anything.

So this is another example of like when considerations when you want to have really separate segmented clusters and environments where the reality of just using I am and are back in all those things to lock it down is, in my mind, a little bit insufficient for it.

So the problem here is that we have this core cluster that's production for internal lab so you know it's OK if it's down a little bit but it should be down a lot.

And then Yeah your staging cluster which is production secure.

You have your dev cluster which is kind of production for developers to do testing and stuff.

So you know the more unstable that is more and everything else is impacted and they end production for your clusters.

And the configuration of these clusters is more or less different.

Out of necessity because they have different roles and does that mean we need a staging concept of staging for each of these clusters and lawless argument.

Yes it's just that we haven't had found a customer that wants to invest in that level of automation related to this there was a Hacker News post one or two weeks ago announcing your GKE these new pricing change for the control plane and you know a lot of people were up in arms over that change and I forget exactly the context that led to this comment that the comment was by Seth embargo and his comment was wait so why are you guys tweeting your clusters as pets.

They should be you know cattle the clusters themselves as part of me reacts like all right you're coming from Google I know you do some amazing engineering over there and you're able to do all of that type of stuff like Netflix does as well they do full on freaking blue green regions and availability zones and all that stuff.

It's just that for most people they don't have the engineering bandwidth in-house to be able to orchestrate do that with regularity.

And the reality is clusters have integration touch points to all the systems that you depend on.

They API keys and secrets are called web hooks and callbacks and all of that stuff.

And orchestrating that from 0 to the end is a monumental task.

My point is it'd be nice if each of these clusters were production like I described, but also had the equivalent of like a blue green strategy that would allow a staggered rollout of each of these environments.

That was a mouthful.

Any questions on what I said or clarifications somebody also shared the link.

Andrew Roth shared a link on how many clusters I haven't seen that link.

Andrew do you want to summarize it.

I read.

I read this link this article a few months ago.

And it really brought home you know kind of the dilemma because the core of the question is you know you can you can do one big large shared cluster and you it's cheaper and it's easier to manage but then your blast radius is really big and everything or you can go all the way to the other side and you can have like clusters all over the place and really you know.

But this table is like so you just have to for your particular environment you just have to pick which little box in this table you're in.

I'll summarize it didn't give an answer so that I read this article as well.

Google also put forth some recommendations around this and they explicitly recommend her team slash project type of setups.

I'll try to find a link and send it out here shortly but Yeah there isn't an answer.

You know it's completely nebulous at the moment.

Well there can never be an answer I think is the bigger point is it's based on you select which of these you're optimizing for and you can optimize for one for solving one of these.

You can optimize for solving the entire matrix right.

We're just getting started with rancher but it's going to I think it's going to make our lives easier when it comes to this kind of stuff because we're going to be able to centralize user management but decentralized cluster management clusters themselves.

So if I have a new team you know I will manage those users in rancher and create for them a cluster all for themselves you know and use rancher to give them access to that cluster.

And you're running rent bill you and you're we are in you're using cops for all the aforementioned reason reasons you mentioned.

I think we're going to get away from cops because Yeah it's kind of not very secure.

We think our key is what we're looking at right now an OpenShift has a really elaborate permissions like Yeah I think the other thing that they were using was mentioning earlier is OK.

He is read your QB daddy's engine OK.

It's a pretty nice offering there.

It's the.

You have your servers and you.

It's how you install communities.

You know it's a alternative to like two bad men.

Gotcha and it's a lot it's a lot simpler.

It's one it's one text config file gamble and it's you know you pointed at the config file and you say Arkady up and bam there's your Cuba data cluster.

And there is a currently in development kind of beta Terraform provider that I've used that worked really well actually.

So I'm excited about it.

Also Iraqi or Iraqi.

Yep so in one swift stroke I was able to use Terraform to provision easy to nodes and then come along afterwards with Arkady to install Kubernetes and then even come along after that with the helm provider for Terraform to install rancher.

And so with one Terraform apply I went from absolutely nothing to a VPC with nodes in it with Kubernetes installed with rancher installed using helm.

Look at that sounds pretty cool though.

That was very exciting to me and it all works perfectly as the health provider been updated yet to support home 3 I don't know but I'm going to be looking into that soon myself.

Yeah I looked into that maybe a month or so ago.

I was curious.

Any movement on that you can have as many things in the Terraform ecosystem move I was a little bit surprised that the helm 3 hadn't been supported already since it was in beta for quite some time before before going GM.

Well if anything it should be easier because there's no tiller.

Yeah it again.

So what to do with the whole get UPS movement.

You guys are very heavy into file.

Yeah do you see yourselves getting away from home file and using something like Argo a point.

All right.

I see.

I can do home.

I can.

Yeah Yeah.

Argos like those kind of the workflow management so Yeah.

Yeah if you need help find less you can define it in Argo.

I would say part of our challenge is that we need to be able to describe reusable components that we can use across customers and implementations and our files kind of that reusable component that lets us describe the logic of how to do it.

I like the things I love and hate about how file and part of the thing is that it's just been the Swiss army knife for us to solve anything.

In the end the end user experience is not that bad.

Once we once we get to using environments I'll show you an example.

Then it's really quite nice.

So if I go to look at quickly an example of what that is project x And let's go to helm files go to like or castle maybe.

So all we ultimately expose is just a very simple schema like this for what you need to change and everything that you know doesn't matter how like the actual and file for this could be rather nasty just like everything else and Terraform can be pretty nasty.

So this is the schema that the developer exposed the maintainer of that chart, which we have no control over but we reduce that to all we know are opinionated version of all the.

All you need to care about all you need to care about is that Yeah Yeah we're doing something similar with home files now we've done get cut.

Yeah get lab key cloak.

Open LDAP Claire looks a little dirty.

I'm trying to get her engagement.

I've been on.

Oh Yeah.

I use it single handedly to construct and weave out whole batch of clusters for our client as a Maid changes on the fly to their requirements.

So it is definitely a good glue tool.

Where I did struggle is bridging that gap between Argo type of applications and help file.

So I mean I was looking at the home file operator amongst other things to ease that transition.

But I never really got that far.

So yeah I didn't notice that there are ways to make home file work through goes.

It's not something that they do by default but it's something that there's custom tools that you can apply to goes.

You need to make it work.

It is undeniably one of my saving graces in this current project.

I was on mobile and the reason why I'm even on this call is because of your home file repo.

Thank you.

By the way of which you know.

So we're working we're refactoring a lot of our health files stuff to support the latest stuff I just show you here and we'll be contributing that stuff upstream later on this year.

I can't see it when but as soon as it's kind of the dust settles we'll get back to that one thing that kind of made my head spin and I'm not sure how I feel about it.

But this conversation jogged this who shared this was a music arcade at one point.

I don't know if I share if I did if I run across it.

Yeah so this kind of make makes my head hurt a little bit to go down this route and this is kind of like this.

This is somewhere like well why are we using text templating to you know parameter drives values for hell and when is this man who can stop and when are we going to use a more formal language to do it.

And how can we.

And it also speaks to presenting a cry for your team to install things and giving you like the ability to do all the testing exposed by going.

I mean this could be done in any language right.

This could be done in Python or Ruby or whatever.

So Dave Dave picked go obviously for this project and what makes my head hurt.

The end user the end result is, I think, something that blaze might have shared.

He says what in the end makes my head hurt is that every app you want to add to the catalogue you literally whip up a whole bunch of go code to generate the actual to install.

So yeah look no thanks.

You definitely solved light.

Yeah you're not templating yellow anymore but the barrier to entry and the maintenance around this.

I really wonder if this is going to be a long lived project Warning anything that puts it into a seal I make it certainly easier to test the waters.

But it doesn't make it any easier to pipeline yellow or certainly doesn't it makes it more interesting.

But you know it's like the UK us TTL command.

You know like why not.

Why would I use that if I have to reform any cable bill.

You know it flies in the face of using kind of get UPS where you want to have a declarative description a document that describes how to deploy it.

I want to have a t-shirt that says declarative nation on it.

That's the underpinning thing that is make me I mean I don't know exciting so I don't think I don't think it make and I could be wrong but I don't think that the makers of arcade are interested in you know production grade type of deployments different horror.

You know they're there cause this is arcade is is strongly correlated with catch up which is strongly correlated with k 3s which is you know so have a Kubernetes cluster up and running in 10 seconds.

You know it's all about get something going as quickly and easily as possible and well hell Yeah.

Arcade install cert manager if that's all I have to fucking do then.

Cool you know Yeah I guess where it's interesting is like what we see here is to me this is now I know there are other examples.

I'm sure you guys can give some examples but this is kind of like our home files repository where we're distributing an opinionated installation of Ubuntu of files.

Any other distributions like that like our home files but using other tooling that you guys can point to.

I appreciate it.

Just so I can get inspiration feel free to share that.

That's not helpful.

That's not helpful based or what I do to anybody else sees another file repository with a dozen or so help files.

Do let me know about that too because I you know that's how we learn from each other is to see patterns that other people are doing.

Yeah I'm trying to get as the other two I'm going to try to get our eyes open source but no problem.

Yeah thriller tools are similar to home pile but none as comprehensive and able in my mind.

So we used helmsman for a long time over a year and then we switched to him file and help file by way is reached almost 2000 stars now.

So it's pretty exciting.

Hey hey if anybody knows Rob all or whatever his name is you feel free to let him know that I'm OK.

I'd be open to working for Datadog now.

Yeah he's not very involved at all anymore and the helm project I signed.

Chime in briefly when there was talk of contributing the project to a native foundation.

But that was the last engagement I saw from him.

I'd never spoken I was so confused there for a second Andrew I thought you were talking but I think were.

No I'm sorry.

No I was just stretching out those.

No I was saying I was saying we have a home file we go you know.

Of not a dozen but maybe half a dozen now but and trying to get them open source is going to be a challenge.

Yeah but I'm going to try to.

Yeah you do.

Hit me up as soon as you if you do get those opens.

Yeah Yeah.

Because we our whole philosophy was when I. So like we've got our biggest one is get lapped by far you know cause get laps home chart is a fucking monstrosity.

Sorry pardon my French.

That's an understatement.

We want our people when they run you know helm file install get lab or whatever the command ends up being with all the defaults that there's a bunch more required parameters.

But once they have met all the required parameters it deploys production ready like it uses an external post stress database it doesn't use you know caused by default it uses a container you know which sometimes is OK.

But for us it's not right.

So what we say anyway.

So what we say instead is we don't care what you run but you'd have to be external.

So if you want to run a container on the side and that's how you're going to say you're going to run your production ready get lab then fine.

But we will not you know turn on the little flag in the helm chart to run the internal Postgres database.

Same thing with like cert manager we don't turn on the internal cert manager.

Same thing with mania.

We don't turn on many.

We make them you know say, OK, what's the names of the S3 buckets and what's the you know if it's not a AWS's three what's the u.r.l. to the S3 host.

Tend to work against you with home files and a breaking amount as well so and and it's been it's been the little experiment has been very nice you know because that's the but it's been the number one challenge when it comes to helm especially with all these open source charts it's like OK fine I see that there's 500 different drink configuration parameters I can set and that's awesome.

Which ones do I set for this to be production ready.

I have no idea.

And there's no documentation to tell me and it's a little bit like that matrix we saw for how many clusters do I need that matrix is going to be different from your or from that and different from word to word.

Sure there's this concept of compose ability you know which is like the property being able to copy something out from one context and paste it into another.

And that seems to be like I'm having a lot of like PTSD around the helm stuff and like Maven like Palmisano you know like course you have PTSD if you've been you know it's like you want to take this snippet of XML and drop it or you know just moving stuff from one file to another and not having the right context if you know that that was one of the bane of my existence and I don't know if killfile simplifies that or makes things more modular.

I have no experience.

I mean I think alpha gives you the opportunity to be internally consistent within an organization or compose ability that way by having a common interface.

But my our files aren't going decompose go with your hand files are now in that same way you almost always have to put your own framework around what your needs are.

So I found that doing things like not enabling default in growth and things like that for a lot of charts certainly helps you know keeping those things completely segmented and in their own model.

So I got to keep things I got to keep things you know generic but he talked about enabling ingress.

So the reason I asked about what's the process for giving people the keys to the castle and stuff is because we gave the keys to the castle too soon to someone then and a bunch of ingress is were created to things that shouldn't have been created and trust issues.

So now I get to look at admission controllers for all kinds of like 4 4 in dresses for STL virtual services for services.

If it's a service and it's not a cluster IP I want to reject it and that's really and so I'm going to use opa am excited.

Yeah Yeah you can do like come up with a little demo or something that really, really not really like that you do.

Yeah Yeah it looks deceptively easy.

I guess I'll say I'll put it that way.

Because I mean you know the open docs are like hey hey here's eight lines of code and that'll do it you know.

So we'll see you know it is always a big gap between hello world and production.

Yeah but I mean it should be very straightforward.

You know because admission controllers look at a particular type of resource right and just do checks on it.

So it should say for all services if type does not equal cluster IP reject other than a small white list like Nginx ingress needs slow balancer but we'll see Yeah.

That's I think this is interesting because I think this the fourth or fifth office hours in a row where opiates come up.

So I think that is an indicator of how relevant that is.

If you're doing anything serious with production on companies for terrible we are.

We've got five minutes left here.

Are there any closing thoughts or last minute questions.

It was nice to finally meet you at scale.

Oh Yeah.

That's awesome.

Thanks for poking in and saying hi that's Todd red 10 for who did you go to any other interesting talk said scam bad one that we're talking about it was primarily around Java apps inside of cubed and using things like grail to minimize startup time using things to compress Docker images down we've got some apps that take 60 to 90 seconds to start up and the prospect of getting that down to two or three seconds is more than just a little enticing is that of milliseconds.

Is that largely the image size or it's not the greatest Tomcat or whatever.

Spring Yeah spring I'm in the same boat dude.

I hear you.

We've got a bite to spring good apps and they're slow and now and to the point that we don't whenever we're setting our limits we for memory we set the request and the limit.

The same and then for the CPU we set the request and we just don't set the limit because startup times get a little better if we just let it consume everything it can.

I've been reading that you shouldn't set speed limits like really for anything because it's different than memory limits.

It doesn't work the same way and you're just artificially limiting things when they don't need to be limiting.

Limited Yeah originally we were kind of trying to avoid all types of over subscription and that's great for memory but it's kind of screwing us whenever it comes to cb.

Yeah because I read I read a lot I read an article Turner emerged with a numerical weather where I would find it.

But it was you know it was talking about how the CPC used scheduler or whatever it's called does the requests and the limits.

You know what it was saying if you've got five pods and they all requests one 100% CPU the CPI you of the system will happily hand each of the pods 20% of its you.

No problem at all.

You know and it it can just kind of figure it all out.

And that made a lot of sense to me.

So I've stopped I've stopped using CPI limits or if I have to use CPI limits because if there's a limit ranger then I'll do like 10.

I think the problem you limit it to me is less quality of service.

And if you want to provide any form of guarantees to service and maintain latency is such that that would be a bad idea to keep it up.

You have a link to that if you can when you find it.

They'd be great if you share that.

I'll try to find it but I was going to say if you looked at caucus for your Java apps not heard of that.

Personally it looks really interesting where I'm here it sounds like it's caucus.

I want to hang out with one.

I don't think.

I'm not sure if spring is is supported but it compiles your Java app to native code.

So I mean like the joke I made the joke I made not too long ago that you know two to three seconds.

No try.

Two to three milliseconds is real.

Like it's as fast as you know I'd go app that's compiled the native code and you can do the same things like install your Java app that's done using caucus onto the scratch container, which is into it.

It's just the bare Linux kernel with nothing in it right like it.

It's literally the smallest can possibly get.

Basically a native Go app.

The guy did mention caucus and to be perfectly forthcoming I had a hard time following some of what he was said because it had a very heavy accent and that's what he was talking about.

IT to the native app.

That's cool.

Well if you tired if you go continue your explorations on that and how many wins can be great if you follow up and share those with us on a subsequent office hours.

I'll do it again.

Sounds like is quite a bit ahead of me already.

He knows the names of the things.

So all right everyone.

Thanks you for sharing.

I learned a few new things this Office Hours.

As always remember to register for our weekly office hours if you haven't already, go to cloud policy office hours and you receive an invite for your calendar.

Thanks again for all your time.

A recording of this call is going to be posted in the office hours channel and syndicated to our podcast that podcast got a positive outcome.

So you can tune in.

However you listen to your podcasts.

See you guys next week.

Same time, same place thanks.

You guys have a good one everybody.

Author Details
Sorry! The Author has not filled his profile.
Author Details
Erik Osterman is a technical evangelist and insanely passionate DevOps guru with over a decade of hands-on experience architecting systems for AWS. After leading major cloud initiatives at CBS Interactive as the Director of Cloud Architecture, he founded Cloud Posse, a DevOps Accelerator that helps high-growth Startups and Fortune 500 Companies own their infrastructure in record time by building it together with customers and showing them the ropes.