prometheus
Archived0401
E
erik12 months ago
archived the channel
Seanover 2 years ago(edited)
Anyone know how to query Prometheus to determine:
β’ a) What the current cardinality is of a metric?
β’ b) What the cardinality will be with some labels removed?
For example, a metric like this (with millions of entries):
Then that same metric reduced down to only these labels:
^ thinking a script that parses the group
β’ a) What the current cardinality is of a metric?
β’ b) What the cardinality will be with some labels removed?
For example, a metric like this (with millions of entries):
mymetric_seconds_sum{
app="myapp",
instance="192.168.33.93:8983",
job="kubernetes-pods",
kubernetes_namespace="myapp",
kubernetes_pod_name="myapp-76b58548cb-8sd8h",
method="CreateMagicEvent",
pod_template_hash="76b58548cb",
status="ABORTED"
}Then that same metric reduced down to only these labels:
app
kubernetes_namespace
method
status^ thinking a script that parses the group
expr: and queries the Prometheus API for the series count.Seanover 2 years ago
How are yβall scaling
We are adopting
β’ a) βone prometheus per namespaceβ: benefit for us would be that we have >100 namespaces and a few dozen teams, so can have reporting and isolate impact to each namespace.
β’ b) βfunctional shardingβ: the Prometheus shard X scrapes all pods of Service A, B and C while shard Y scrapes pods from Service D, E and F.
β’ c) βautomatic shardingβ: the targets will be assigned to Prometheus shards based on their addresses.
prometheus-server when 1 instance canβt handle the huge amount of metrics (so starts OOMing)?We are adopting
Prometheus-Operator and debating between:β’ a) βone prometheus per namespaceβ: benefit for us would be that we have >100 namespaces and a few dozen teams, so can have reporting and isolate impact to each namespace.
β’ b) βfunctional shardingβ: the Prometheus shard X scrapes all pods of Service A, B and C while shard Y scrapes pods from Service D, E and F.
β’ c) βautomatic shardingβ: the targets will be assigned to Prometheus shards based on their addresses.
H
Hamzaover 2 years ago
Hey, I've deployed Prometheus using the prometheus-community/kube-prometheus-stack helm chart, but for some reason, the targets are not up as you can see, any idea ?
it's deployed on k3s, but I assume it will be the same as k8s regarding this issue of targets
it's deployed on k3s, but I assume it will be the same as k8s regarding this issue of targets
Samalmost 3 years ago
Hello everyone, does anyone have instructions on how to monitor EC2 and RDS instances using AWS managed Prometheus and Grafana?
zadkielabout 3 years ago
Hey there, I'd like to delete a label from a single metrics: nginx_ingress_controller_ssl_certificate_info.
i'm using the labeldrop action in metricsRelabelings section, tried the following but it's not supported by prom and i get an error in the logs saying that only regex otpion can be used with labeldrop:
If I simply use
It removed the pod label from all metrics exported by nginx.
Are you aware of an other way to delete a single label from a single metric?
Thank you
i'm using the labeldrop action in metricsRelabelings section, tried the following but it's not supported by prom and i get an error in the logs saying that only regex otpion can be used with labeldrop:
- action: labeldrop
sourceLabels:
- __name__
- pod
regex: 'nginx_ingress_controller_ssl_certificate_info;pod'If I simply use
- action: labeldrop
regex: 'pod'It removed the pod label from all metrics exported by nginx.
Are you aware of an other way to delete a single label from a single metric?
Thank you
ghostfaceover 3 years ago
i'm using the kube-prometheus-stack chart and i'm trying to get my pods labels into the metrics.
see my pod labels below.
i added the below to the values file for the chart, in order to bring all of the labels from the pod into the metrics.
but this hasn't worked...the only difference it's made is the below:
it seems to have added only labels and the values from the
see my pod labels below.
Labels: app=xx-failing-nginx
<http://app.kubernetes.io/instance=xx-failing-nginx|app.kubernetes.io/instance=xx-failing-nginx>
<http://app.kubernetes.io/name=xx-failing-nginx|app.kubernetes.io/name=xx-failing-nginx>
<http://xx.net/dataclassification=confidential|xx.net/dataclassification=confidential>
<http://xx.net/environment=uat|xx.net/environment=uat>
<http://xx.net/networkposition=internal|xx.net/networkposition=internal>
<http://xx.net/owner=edge|xx.net/owner=edge>
<http://xx.net/priority=P3|xx.net/priority=P3>
<http://xx.net/product=hft|xx.net/product=hft>
<http://xx.net/service=xx-failing|xx.net/service=xx-failing>
env=uat
<http://helm.sh/chart=xx-service-0.0.33|helm.sh/chart=xx-service-0.0.33>i added the below to the values file for the chart, in order to bring all of the labels from the pod into the metrics.
kube-state-metrics:
prometheus:
monitor:
relabelings:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)but this hasn't worked...the only difference it's made is the below:
kube_pod_container_status_waiting_reason{app_kubernetes_io_component="metrics", app_kubernetes_io_instance="kube-prometheus-stack", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_part_of="kube-state-metrics", app_kubernetes_io_version="2.6.0", container="app", endpoint="http", helm_sh_chart="kube-state-metrics-4.20.2", instance="10.1.1.xx:xx", job="kube-state-metrics", namespace="uat", pod="xx-failing-nginx-5cb58d8977-8r46t", pod_template_hash="646b9498dd", reason="CrashLoopBackOff", release="kube-prometheus-stack", service="kube-prometheus-stack-kube-state-metrics"}it seems to have added only labels and the values from the
kube-state-metrics service monitor instead of the labels from my pod above.Alex Boxover 3 years ago
Hi all, wondering if anyone has a solution they could share for running analytics over historical alerts from alertmanager? For example, βalert X fired 10 times in July and 30 times in Augustβ. This would allow the monitoring team in company Y to investigate the biggest drain for the on-call shift over time. I understand itβs a design decision of AM to not persist any state but in medium/large environments I think itβs an important area that often gets overlooked. Thanks.
michael sewover 3 years ago
Q: has anybody setup AWS cloudwatch metrics to Prometheus using Cloudwatch Metric Streams? Newbie DBA, trying to get RDS metrics into prometheus/grafana. I'm told that a std exporter would start to cost $$ because of constant polling. Would the following architecture work?
cloudwatch metric streams => Kinesis Data Firehose => Prometheus ??
cloudwatch metric streams => Kinesis Data Firehose => Prometheus ??
Niv Weissover 3 years ago
Hey all, I been trying to install prometheus (prometheus-community/prometheus) on EKS fargate (serverless) with AMP (amazon managed prometheus) and Iβm getting this error:
Status:
This is the command that Iβm running when Iβm installing prometheus:
prometheus-variables.yml:
Can someone help me understand what can be the problem?
Readiness probe failed: Get "<http://10.0.xx.xxx:9090/-/ready>": context deadline exceeded (Client.Timeout exceeded while awaiting headers)Liveness probe failed: Get "<http://10.0.xx.xxx:9090/-/healthy>": context deadline exceeded (Client.Timeout exceeded while awaiting headers)Failed to create pod sandbox: rpc error: code = Unavailable desc = error reading from server: read unix @->/run/containerd/containerd.sock: read: connection reset by peerStatus:
terminated - OOMKilled (exit code: 255)This is the command that Iβm running when Iβm installing prometheus:
helm install -n prometheus prometheus -f prometheus-variables.yml prometheus-community/prometheusprometheus-variables.yml:
nodeExporter:
enabled: false
alertmanager:
enabled: false
serviceAccounts:
server:
name: amp-iamproxy-ingest-service-account
annotations:
<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: "arn:aws:iam::xxxxxxxxxx:role/amp-iamproxy-ingest-role"
server:
persistentVolume:
enabled: false
remoteWrite:
- url: <https://aps-workspaces.us-east-1.amazonaws.com/workspaces/xxxxxxxxxxxxx/api/v1/remote_write>
sigv4:
region: us-east-1
queue_config:
max_samples_per_send: 1000
max_shards: 200
capacity: 2500
Can someone help me understand what can be the problem?
Balazs Vargaover 4 years ago
How and where can I see that my metrics relabel worksand removed the unused labels?
michael sewover 4 years ago
hi guys, can somebody help explain how exporters work ? i'm trying to setup a database exporter (postgres, oracle).
am i supposed to run an exporter binary on the prometheus VM or container?
what if the target DBs are in separate AWS Accounts and AWS regions? Just curious what the topology is supposed to look like.
am i supposed to run an exporter binary on the prometheus VM or container?
what if the target DBs are in separate AWS Accounts and AWS regions? Just curious what the topology is supposed to look like.
Balazs Vargaover 4 years ago
hello all, We have a prometheus in our cluster on AWS and as I see the node where prometheus is running has a massive incomming data. over 80MB/sec. is that expected? Where can I find docs about that ?
Abel Luckalmost 6 years ago
I'm writing a small exporter for a custom service, does anyone know what an exporter's
/metrics should return when requests to the underlying service fail entirely, and so no metrics can be returned?Erik Osterman (Cloud Posse)almost 6 years ago
Adding @U010XGY9B46 bot
Erik Osterman (Cloud Posse)almost 6 years ago
@UUB28NLDS help keep tabs! π
Mark Howardover 6 years ago
Question, Anyone using prometheus to monitor Azure Paas resources?
tamskyover 6 years ago
@Tamlyn Rhodes Avoiding Cloudwatch might be helpful as well. What's behind the personal or business requirement to use Cloudwatch?
joshmyersover 6 years ago
@Tamlyn Rhodes perhaps influx/telegraf maybe helpful, it is like a pipe on steroids and can do prom > cloudwatch with filters and processors etc
Andriy Knysh (Cloud Posse)over 6 years ago
if you have any improvements, PRs are welcome π
Tamlyn Rhodesover 6 years ago
Have a good weekend π
Tamlyn Rhodesover 6 years ago
OK, thanks for you help. I'll investigate other approaches.
Andriy Knysh (Cloud Posse)over 6 years ago
Andriy Knysh (Cloud Posse)over 6 years ago
Andriy Knysh (Cloud Posse)over 6 years ago
also take a look at these releases, it might help
Andriy Knysh (Cloud Posse)over 6 years ago(edited)
@Tamlyn Rhodes Iβm not sure how to change the metric types between prometheus and CloudWatch. https://github.com/cloudposse/prometheus-to-cloudwatch is just a proxy that scrapes prometheus URLs, converts the format, and sends the metrics to CloudWatch. It does not assume anything. It might be possible to change the module to do some logic.
Tamlyn Rhodesover 6 years ago
Thanks. I think gauge type metrics (e.g. current memory usage) work OK but not all metrics can be tracked that way. For instance "total number of requests" needs a counter because it tracks events rather than a value. The problem arises because the Prometheus client in my container reports "123 requests have occurred since the container was restarted" but when this gets forwarded to Cloudwatch it is interpreted as "123 requests occurred right now" and in the next update 30 seconds later it thinks there have been another 123 requests whereas there have been none.
Erik Osterman (Cloud Posse)over 6 years ago
@Andriy Knysh (Cloud Posse) can probably help. But it probably comes down to using something like counters vs gauges. Prometheus supports multiple types of metrics whereas I am not sure if CloudWatch does (if so they donβt call it gauge). From working with other monitoring systems it is common to support both. Iβd be surprised if there isnβt a way to achieve it.
T
Tamlyn Rhodesover 6 years ago
That leads to funny looking graphs like this in Cloudwatch when a container is restarted.
Tamlyn Rhodesover 6 years ago
How do deal with Prometheus and Cloudwatch's different model of metrics gathering? Prometheus assumes reported metrics are summed whereas Cloudwatch assumes each reflects current values.
Tamlyn Rhodesover 6 years ago
Hello, I have a question about https://github.com/cloudposse/prometheus-to-cloudwatch
rohitover 6 years ago
thanks
rohitover 6 years ago
bad timing for me(i am in CST), will try to join
Erik Osterman (Cloud Posse)over 6 years ago
Every Wednesday at 11:30 am PST
rohitover 6 years ago
time please
Erik Osterman (Cloud Posse)over 6 years ago
Our next office hours is tomorrow
Erik Osterman (Cloud Posse)over 6 years ago
I'd be happy to give a demo
rohitover 6 years ago
nothing specific, we are thinking about using prometheus
Erik Osterman (Cloud Posse)over 6 years ago
is there something specific you're interested in?
Erik Osterman (Cloud Posse)over 6 years ago
#office-hours topics are really driven by who ever attends
rohitover 6 years ago
@Erik Osterman (Cloud Posse) Hi. Are you planning to talk about prometheus anytime soon during office-hours?
I
Igor Rodionovover 6 years ago
@Igor Rodionov has joined the channel
Erik Osterman (Cloud Posse)over 6 years ago
@Igor Rodionov deployed something like that. not specifically for lambdas though.
tamskyover 6 years ago
Has anyone here used https://github.com/weaveworks/prom-aggregation-gateway for aggregating metrics from Lambda functions?
Curious if anyone has field notes to share.
Curious if anyone has field notes to share.
tamskyover 6 years ago
afaik, there's no custom SRV record type that provides more than service port and weight.
tamskyover 6 years ago
You don't typically need to adjust the path, as it's always
/metrics on standard exporters. What's your use case where the /metrics endpoint also needs to be discoverable?tamskyover 6 years ago
Port should be automatic if you're using SRV records
Abel Luckover 6 years ago
@tamsky thanks for the link to the example configs, I didn't know about that.
However, when i meant custom path/port I meant a config such that the path and port are discovered from the DNS SRV entry
However, when i meant custom path/port I meant a config such that the path and port are discovered from the DNS SRV entry
tamskyover 6 years ago
@Abel Luck have you checked out the example configs for DNS service discovery?
- https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config
- https://github.com/prometheus/prometheus/blob/release-2.10/config/testdata/conf.good.yml#L79-L123
That's a full example that touches most of the things you mentioned
- endpoint lookup by DNS name (Lines 96-99)
- custom
- and to get a custom port, I'd replace Line 91 (
- https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config
- https://github.com/prometheus/prometheus/blob/release-2.10/config/testdata/conf.good.yml#L79-L123
That's a full example that touches most of the things you mentioned
- endpoint lookup by DNS name (Lines 96-99)
- custom
/metrics path (Line 90)- and to get a custom port, I'd replace Line 91 (
scheme: https) with port: <custom#>