20 messages
Brad McCoyalmost 5 years ago
Hi everyone, I have two online events coming up that will be recording on CKAD/CKA study and exam tips, here are the links for those interested:
Brad McCoyalmost 5 years ago
Brad McCoyalmost 5 years ago
Matt Gowiealmost 5 years ago
Not exactly a Kubernetes question, but figured folks in this channel would know what I’m talking about exists — Does anyone know if there is a Network / TCP proxy tool out there that will do a manage-and-forward pattern (my own made up term for describing this) for long lived TCP connections?
I have a client running on K8s and one of their primary microservices holds long lived TCP socket connections with many thousands of clients through an AWS NLB. The problem is that whenever we do a deployment and update those pods the TCP connections require a re-connection which results in problems on the client side. So to provide an better experience for the clients we’re looking at what we can do to have those TCP connections always stay alive. My first thought is for a proxy layer that manages the socket connections with the client and then forwards socket connections to the actual service pods. That way even if the pods are swapped out behind the scenes, the original socket connection is still up and has no adverse affects on the clients.
I have a client running on K8s and one of their primary microservices holds long lived TCP socket connections with many thousands of clients through an AWS NLB. The problem is that whenever we do a deployment and update those pods the TCP connections require a re-connection which results in problems on the client side. So to provide an better experience for the clients we’re looking at what we can do to have those TCP connections always stay alive. My first thought is for a proxy layer that manages the socket connections with the client and then forwards socket connections to the actual service pods. That way even if the pods are swapped out behind the scenes, the original socket connection is still up and has no adverse affects on the clients.
Shreyank Sharmaalmost 5 years ago
Hi All,
we have 4 node kubernetes cluster in production deployed using kops in AWS,
3 worker node and one master
nodes are c4.2xlarge 16gb memory each
with other pods we have elasticsearch deployed using helm, there we have
3 elasticsearch-data pods consumes 4000mb of memory each
3 elasticsearch-master pods consumes 2600mb of memory each
3 elasticsearch-client pod consumes 2600mb of memory each
all are distribed amoung nodes but one of the in one node, one of the elasticsearch-data pod is restarting daily like 2-3 times in a same node
i described the restarted pod which says just
there is no events
and when i checked the syslogs of the nodes in which the pod restarted it shows
the version of elasticsearch is 6.7.0
anyone experienced same issue how to solve this pod restart issue
we have 4 node kubernetes cluster in production deployed using kops in AWS,
3 worker node and one master
nodes are c4.2xlarge 16gb memory each
with other pods we have elasticsearch deployed using helm, there we have
3 elasticsearch-data pods consumes 4000mb of memory each
3 elasticsearch-master pods consumes 2600mb of memory each
3 elasticsearch-client pod consumes 2600mb of memory each
all are distribed amoung nodes but one of the in one node, one of the elasticsearch-data pod is restarting daily like 2-3 times in a same node
i described the restarted pod which says just
Last State: TerminatedReason: OOMKilledExit Code: 137Started: Tue, 02 Mar 2021 20:31:07 +0530Finished: Wed, 03 Mar 2021 17:46:02 +0530there is no events
and when i checked the syslogs of the nodes in which the pod restarted it shows
C2 CompilerThre invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), nodemask=0, order=0, oom_score_adj=901C2 CompilerThre cpuset=6126d0823d683f51d04603c4c6464c030464d3748c916c1a46621936846aac01 mems_allowed=0CPU: 2 PID: 7743 Comm: C2 CompilerThre Not tainted 4.9.0-9-amd64 #1 Debian 4.9.168-1Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017..............the version of elasticsearch is 6.7.0
anyone experienced same issue how to solve this pod restart issue
Shreyank Sharmaalmost 5 years ago
Hi All,
Under what condition pod will exceeds its memory limit,
in my Kubernets cluster i have deployed elasticsearch deployed using helm, elasticsearch version is 6.7.0,
we have 3 elasticsearch-data pods and 2 elasticsearch-master pods and 1 client,
memory limit for elasticseach-data pod is 4gb, but one of the data pod is restarted everyday about 5-6 times(oom kill), when i checked in Grafana for pod’s memory and cpu usage, i can see that one of the elasticsearch-data pod is using twice the memory limit(8gb) ,
So i wanted to know
Under what condition pod will exceeds its memory limit,
also in syslogs- when oom_kill happened
Here at shows aroung 4gb
but at last
but here its showing total-vm is 8gb
am confused why its showing 4 in one place and 8 in another place
Under what condition pod will exceeds its memory limit,
in my Kubernets cluster i have deployed elasticsearch deployed using helm, elasticsearch version is 6.7.0,
we have 3 elasticsearch-data pods and 2 elasticsearch-master pods and 1 client,
memory limit for elasticseach-data pod is 4gb, but one of the data pod is restarted everyday about 5-6 times(oom kill), when i checked in Grafana for pod’s memory and cpu usage, i can see that one of the elasticsearch-data pod is using twice the memory limit(8gb) ,
So i wanted to know
Under what condition pod will exceeds its memory limit,
also in syslogs- when oom_kill happened
C2 CompilerThre invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), nodemask=0, order=0, oom_score_adj=901[28621138.637578] C2 CompilerThre cpuset=441fa5603f64f86888937bc911269fca47dfcdb318648cc1ac0832cdfb07134d mems_allowed=0[28621138.639850] CPU: 5 PID: 7749 Comm: C2 CompilerThre Not tainted 4.9.0-9-amd64 #1 Debian 4.9.168-1[28621138.641757] Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017[28621138.643152] 0000000000000000 ffffffff85335284 ffffa53882de7dd8 ffff8dda11dec040.........[28621138.662399] [<ffffffff85615f82>] ? schedule+0x32/0x80[28621138.663485] [<ffffffff8561bc48>] ? async_page_fault+0x28/0x30[28621138.669097] memory: usage 4096000kB, limit 4096000kB, failcnt 383494862Here at shows aroung 4gb
but at last
[ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name[28621138.876368] [24691] 0 24691 256 1 4 2 0 -998 pause[28621138.878201] [ 7436] 1000 7436 2141564 989996 2342 11 0 901 java[28621138.879983] Memory cgroup out of memory: Kill process 7436 (java) score 1870 or sacrifice child[28621138.881978] Killed process 7436 (java) total-vm:8566256kB, anon-rss:3941732kB, file-rss:18252kB, shmem-rss:0kBbut here its showing total-vm is 8gb
am confused why its showing 4 in one place and 8 in another place
Issifalmost 5 years ago
I just released a
kubectl plugin I developed at my day job, maybe someone will find it useful https://github.com/qonto/kubectl-duplicatebtaialmost 5 years ago(edited)
whats the recommendation for pod level iam roles? I know the ones being used the most initially was kube2iam and kiam but iirc one or both of them had rate limiting issues (which is why i avoided them initially). I know AWS came out with one here and was just curious what people are using nowadays or if theres a general consensus of best one https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/ Personally I’m using kops, so curious if theres issues integrating w/ aws irsa
Mr.Devopsalmost 5 years ago
Hi does anyone have a recommended approach on injecting passwords to kops templates using cloud int?
Fernanda Martinsalmost 5 years ago
Hello everyone, I am configuring Federated Prometheus to monitor multiple Cluster for the first time. Any tips? On how to organize operators and etc? Tks!
Azulalmost 5 years ago
anyone using EKS with fargate profiles? I am on a project where we started using it and we had to submit a support request to increase the maximum number of profiles in the cluster from 10 to 20. A fargate profile maps to a kubernetes namespace, so I'm essentially looking into 20 namespaces in these EKS clusters. That's in my view a fairly small number, and I expect these to increase as we add more apps onto the cluster. Maybe be worth mentioning at this point, that by default I can launch about 1000 fargate nodes on a EKS cluster, so EKS was designed to scale. Anyway the docs list the fargate profiles quota as possible to be raised through the console, but that's incorrect, so I raised a support request to do this. The feedback I received was that they would raise it with the fargate service team as it was a fairly large increase. My thought here, is he serious ? we're talking about 20 namespaces/fargate profiles. What exactly is large about this request? A google search didn't show any relevant posts of the number of fargate profiles, so I thought of coming here to ask: who here is using fargate on EKS, and how many namespaces are you using?
M Hunteralmost 5 years ago
Hi. Can anyone recommend articles/videos on configuring k8s on an "airgapped" system? The OS is Centos7. Thanks!
Eric Bergalmost 5 years ago
Any body have thoughts on where I should use
.Release.Name vs .Chart.Name?Shreyank Sharmaalmost 5 years ago
Hi All, we have Elasticsearch cluster running in our Kubernetes cluster and it is deployed using helm chart, and fluentd is sending logs from each nodes,
we have
2 data nodes, 2 master node and a client node, and from yesterday data nodes are in not ready state and because of the client keep getting restarted. so as fluentd and kibana
after describeing the fluentd pod i come to know that,
after referring to some links i found ------------
Data nodes — stores data and executes data-related operations such as search and aggregation
Master nodes — in charge of cluster-wide management and configuration actions such as adding and removing nodes
Client nodes — forwards cluster requests to the master node and data-related requests to data nodes
-------------------------------------------------------
in the kubectl get events output it says rediness probe failed for elasticsearch-data pods(we increased timeout values and recreated all pods again)
so am assuming client is failing because of elasticsearch-data pods are in not ready state, and also in one of the data pod i can see
data pod’s memory limit is 4gb and heap is 1.9 which is fine i think,,,
Since mater node is responsible for adding and removing the nodes i went inside the master pod and did
data pods are not listed here, after checking the logs of master node, i can see lot of
and in master-1 logs i can see
we did a
is there any workaround or solution for this behaviour, Thanks in advance
we have
2 data nodes, 2 master node and a client node, and from yesterday data nodes are in not ready state and because of the client keep getting restarted. so as fluentd and kibana
elastichq-7cf55c6bbc-998pq 1/1 Running 0 1yelasticsearch-client-5dbccbd776-7kpwk 1/1 Running 79 1delasticsearch-data-0 0/1 Running 18 1delasticsearch-data-1 0/1 Running 21 1delasticsearch-master-0 1/1 Running 0 1delasticsearch-master-1 1/1 Running 0 1dfluentd-fluentd-elasticsearch-hhh8v 1/1 Running 147 1yfluentd-fluentd-elasticsearch-ksfnx 1/1 Running 110 1yfluentd-fluentd-elasticsearch-lnbll 1/1 Running 94 1ykibana-b7768db9d-r57st 1/1 Running 347 1ylogstash-0 1/1 Running 6 1yafter describeing the fluentd pod i come to know that,
Killing container with id <docker://fluentd-fluentd-elasticsearch>:Container failed liveness probe.. Container will be killed and recreated.after referring to some links i found ------------
Data nodes — stores data and executes data-related operations such as search and aggregation
Master nodes — in charge of cluster-wide management and configuration actions such as adding and removing nodes
Client nodes — forwards cluster requests to the master node and data-related requests to data nodes
-------------------------------------------------------
in the kubectl get events output it says rediness probe failed for elasticsearch-data pods(we increased timeout values and recreated all pods again)
so am assuming client is failing because of elasticsearch-data pods are in not ready state, and also in one of the data pod i can see
java.lang.OutOfMemoryError: Java heap spaceDumping heap to data/java_pid1.hprof ...Unable to create data/java_pid1.hprof: File existsdata pod’s memory limit is 4gb and heap is 1.9 which is fine i think,,,
Since mater node is responsible for adding and removing the nodes i went inside the master pod and did
curl localhost:9200/_cat/nodeselasticsearch-client-5dbccbd776-7kpwk* elasticsearch-master-1elasticsearch-master-0data pods are not listed here, after checking the logs of master node, i can see lot of
[INFO ][o.e.c.s.ClusterApplierService] [elasticsearch-master-0] removed {{elasticsearch-data-1} master is keep adding and removing the data pods .and in master-1 logs i can see
org.elasticsearch.transport.NodeDisconnectedException:we did a
helm upgrade <chartname> -f custom_valuefile.yaml --recreate-pods which did not worked.is there any workaround or solution for this behaviour, Thanks in advance
Matt Gowiealmost 5 years ago
btaialmost 5 years ago
i wasn’t aware of this, but the ridiculously low allowed pod count on EKS (i.e 29 pods on m4.large) is tied specifically to the AWS VPC CNI. apparently we can skirt around that issue by uninstalling the default CNI and installing a different one. anyone try doing this? https://docs.projectcalico.org/getting-started/kubernetes/managed-public-cloud/eks
Padarnalmost 5 years ago
We’re looking for a nice way to orchestrate performance tests in a k8s cluster, any suggestions?
An example scenario: We want to test the performance of using redis vs using minio as an object cache. Would like to be able to easily setup, run the test, and teardown
An example scenario: We want to test the performance of using redis vs using minio as an object cache. Would like to be able to easily setup, run the test, and teardown
Andreaalmost 5 years ago
Hi, to anyone who's running Windows worker nodes, can you please share/suggest how to collect the pod logs? On the linux nodes I've been fairly happy with fluent-bit (deployed as a helm chart). fluent-bit collect the logs and send them to elasticsearch. I'm not having much luck with the same procedure on Windows though...
Christianalmost 5 years ago
Do people generally use managed node groups now?