prometheus

Archived

0401

Archive: https://archive.sweetops.com/prometheus/

erikover 1 year ago

archived the channel

Seanalmost 3 years ago(edited)

Anyone know how to query Prometheus to determine:
• a) What the current cardinality is of a metric?
• b) What the cardinality will be with some labels removed?
For example, a metric like this (with millions of entries):

mymetric_seconds_sum{
  app="myapp",
  instance="192.168.33.93:8983",
  job="kubernetes-pods",
  kubernetes_namespace="myapp",
  kubernetes_pod_name="myapp-76b58548cb-8sd8h",
  method="CreateMagicEvent",
  pod_template_hash="76b58548cb",
  status="ABORTED"
}

Then that same metric reduced down to only these labels:

app
kubernetes_namespace
method
status

^ thinking a script that parses the group expr: and queries the Prometheus API for the series count.

Seanalmost 3 years ago

How are y’all scaling prometheus-server when 1 instance can’t handle the huge amount of metrics (so starts OOMing)?

We are adopting Prometheus-Operator and debating between:
• a) “one prometheus per namespace”: benefit for us would be that we have >100 namespaces and a few dozen teams, so can have reporting and isolate impact to each namespace.
• b) “functional sharding”: the Prometheus shard X scrapes all pods of Service A, B and C while shard Y scrapes pods from Service D, E and F.
• c) “automatic sharding”: the targets will be assigned to Prometheus shards based on their addresses.

Hamzaalmost 3 years ago

Hey, I've deployed Prometheus using the prometheus-community/kube-prometheus-stack helm chart, but for some reason, the targets are not up as you can see, any idea ?
it's deployed on k3s, but I assume it will be the same as k8s regarding this issue of targets

Samover 3 years ago

Hello everyone, does anyone have instructions on how to monitor EC2 and RDS instances using AWS managed Prometheus and Grafana?

zadkielover 3 years ago

Hey there, I'd like to delete a label from a single metrics: nginx_ingress_controller_ssl_certificate_info.
i'm using the labeldrop action in metricsRelabelings section, tried the following but it's not supported by prom and i get an error in the logs saying that only regex otpion can be used with labeldrop:

                   - action: labeldrop
                    sourceLabels:
                      - __name__
                      - pod
                    regex: 'nginx_ingress_controller_ssl_certificate_info;pod'

If I simply use

                  - action: labeldrop
                    regex: 'pod'

It removed the pod label from all metrics exported by nginx.
Are you aware of an other way to delete a single label from a single metric?
Thank you

ghostfaceover 3 years ago

i'm using the kube-prometheus-stack chart and i'm trying to get my pods labels into the metrics.

see my pod labels below.

Labels:       app=xx-failing-nginx
              <http://app.kubernetes.io/instance=xx-failing-nginx|app.kubernetes.io/instance=xx-failing-nginx>
              <http://app.kubernetes.io/name=xx-failing-nginx|app.kubernetes.io/name=xx-failing-nginx>
              <http://xx.net/dataclassification=confidential|xx.net/dataclassification=confidential>
              <http://xx.net/environment=uat|xx.net/environment=uat>
              <http://xx.net/networkposition=internal|xx.net/networkposition=internal>
              <http://xx.net/owner=edge|xx.net/owner=edge>
              <http://xx.net/priority=P3|xx.net/priority=P3>
              <http://xx.net/product=hft|xx.net/product=hft>
              <http://xx.net/service=xx-failing|xx.net/service=xx-failing>
              env=uat
              <http://helm.sh/chart=xx-service-0.0.33|helm.sh/chart=xx-service-0.0.33>

i added the below to the values file for the chart, in order to bring all of the labels from the pod into the metrics.

kube-state-metrics:
  prometheus:
    monitor:
      relabelings:
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)

but this hasn't worked...the only difference it's made is the below:

kube_pod_container_status_waiting_reason{app_kubernetes_io_component="metrics", app_kubernetes_io_instance="kube-prometheus-stack", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_part_of="kube-state-metrics", app_kubernetes_io_version="2.6.0", container="app", endpoint="http", helm_sh_chart="kube-state-metrics-4.20.2", instance="10.1.1.xx:xx", job="kube-state-metrics", namespace="uat", pod="xx-failing-nginx-5cb58d8977-8r46t", pod_template_hash="646b9498dd", reason="CrashLoopBackOff", release="kube-prometheus-stack", service="kube-prometheus-stack-kube-state-metrics"}

it seems to have added only labels and the values from the kube-state-metrics service monitor instead of the labels from my pod above.

Alex Boxalmost 4 years ago

Hi all, wondering if anyone has a solution they could share for running analytics over historical alerts from alertmanager? For example, “alert X fired 10 times in July and 30 times in August”. This would allow the monitoring team in company Y to investigate the biggest drain for the on-call shift over time. I understand it’s a design decision of AM to not persist any state but in medium/large environments I think it’s an important area that often gets overlooked. Thanks.

michael sewalmost 4 years ago

Q: has anybody setup AWS cloudwatch metrics to Prometheus using Cloudwatch Metric Streams? Newbie DBA, trying to get RDS metrics into prometheus/grafana. I'm told that a std exporter would start to cost $$ because of constant polling. Would the following architecture work?

cloudwatch metric streams => Kinesis Data Firehose => Prometheus ??

Niv Weissalmost 4 years ago

Hey all, I been trying to install prometheus (prometheus-community/prometheus) on EKS fargate (serverless) with AMP (amazon managed prometheus) and I’m getting this error:
Readiness probe failed: Get "<http://10.0.xx.xxx:9090/-/ready>": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Liveness probe failed: Get "<http://10.0.xx.xxx:9090/-/healthy>": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Failed to create pod sandbox: rpc error: code = Unavailable desc = error reading from server: read unix @->/run/containerd/containerd.sock: read: connection reset by peer
Status: terminated - OOMKilled (exit code: 255)
This is the command that I’m running when I’m installing prometheus:
helm install -n prometheus prometheus -f prometheus-variables.yml prometheus-community/prometheus
prometheus-variables.yml:

nodeExporter:
    enabled: false

alertmanager:
    enabled: false

serviceAccounts:
  server:
    name: amp-iamproxy-ingest-service-account
    annotations: 
      <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: "arn:aws:iam::xxxxxxxxxx:role/amp-iamproxy-ingest-role"

server:
  persistentVolume:
    enabled: false
  remoteWrite:
    - url: <https://aps-workspaces.us-east-1.amazonaws.com/workspaces/xxxxxxxxxxxxx/api/v1/remote_write>
      sigv4:
        region: us-east-1
      queue_config:
        max_samples_per_send: 1000
        max_shards: 200
        capacity: 2500

Can someone help me understand what can be the problem?

Balazs Vargaalmost 5 years ago

How and where can I see that my metrics relabel worksand removed the unused labels?

michael sewalmost 5 years ago

hi guys, can somebody help explain how exporters work ? i'm trying to setup a database exporter (postgres, oracle).
am i supposed to run an exporter binary on the prometheus VM or container?
what if the target DBs are in separate AWS Accounts and AWS regions? Just curious what the topology is supposed to look like.

Balazs Vargaalmost 5 years ago

hello all, We have a prometheus in our cluster on AWS and as I see the node where prometheus is running has a massive incomming data. over 80MB/sec. is that expected? Where can I find docs about that ?

Abel Luckover 6 years ago

I'm writing a small exporter for a custom service, does anyone know what an exporter's /metrics should return when requests to the underlying service fail entirely, and so no metrics can be returned?

Erik Osterman (Cloud Posse)over 6 years ago

Adding @U010XGY9B46 bot

Erik Osterman (Cloud Posse)over 6 years ago

@UUB28NLDS help keep tabs! 😉

Mark Howardalmost 7 years ago

Question, Anyone using prometheus to monitor Azure Paas resources?

tamskyabout 7 years ago

@Tamlyn Rhodes Avoiding Cloudwatch might be helpful as well. What's behind the personal or business requirement to use Cloudwatch?

joshmyersabout 7 years ago

@Tamlyn Rhodes perhaps influx/telegraf maybe helpful, it is like a pipe on steroids and can do prom > cloudwatch with filters and processors etc

Andriy Knysh (Cloud Posse)about 7 years ago

if you have any improvements, PRs are welcome 🙂

Tamlyn Rhodesabout 7 years ago

Have a good weekend 🙂

Tamlyn Rhodesabout 7 years ago

OK, thanks for you help. I'll investigate other approaches.

Andriy Knysh (Cloud Posse)about 7 years ago

https://github.com/cloudposse/prometheus-to-cloudwatch/releases/tag/0.3.0

Andriy Knysh (Cloud Posse)about 7 years ago

https://github.com/cloudposse/prometheus-to-cloudwatch/releases/tag/0.2.0

Andriy Knysh (Cloud Posse)about 7 years ago

also take a look at these releases, it might help

Andriy Knysh (Cloud Posse)about 7 years ago(edited)

@Tamlyn Rhodes I’m not sure how to change the metric types between prometheus and CloudWatch. https://github.com/cloudposse/prometheus-to-cloudwatch is just a proxy that scrapes prometheus URLs, converts the format, and sends the metrics to CloudWatch. It does not assume anything. It might be possible to change the module to do some logic.

Tamlyn Rhodesabout 7 years ago

Thanks. I think gauge type metrics (e.g. current memory usage) work OK but not all metrics can be tracked that way. For instance "total number of requests" needs a counter because it tracks events rather than a value. The problem arises because the Prometheus client in my container reports "123 requests have occurred since the container was restarted" but when this gets forwarded to Cloudwatch it is interpreted as "123 requests occurred right now" and in the next update 30 seconds later it thinks there have been another 123 requests whereas there have been none.

Erik Osterman (Cloud Posse)about 7 years ago

@Andriy Knysh (Cloud Posse) can probably help. But it probably comes down to using something like counters vs gauges. Prometheus supports multiple types of metrics whereas I am not sure if CloudWatch does (if so they don’t call it gauge). From working with other monitoring systems it is common to support both. I’d be surprised if there isn’t a way to achieve it.

Tamlyn Rhodesabout 7 years ago

That leads to funny looking graphs like this in Cloudwatch when a container is restarted.

Tamlyn Rhodesabout 7 years ago

How do deal with Prometheus and Cloudwatch's different model of metrics gathering? Prometheus assumes reported metrics are summed whereas Cloudwatch assumes each reflects current values.

Tamlyn Rhodesabout 7 years ago

Hello, I have a question about https://github.com/cloudposse/prometheus-to-cloudwatch

rohitabout 7 years ago

thanks

rohitabout 7 years ago

bad timing for me(i am in CST), will try to join

Erik Osterman (Cloud Posse)about 7 years ago

Every Wednesday at 11:30 am PST

Erik Osterman (Cloud Posse)about 7 years ago(edited)

https://zoom.us/j/684901853

rohitabout 7 years ago

time please

Erik Osterman (Cloud Posse)about 7 years ago

Our next office hours is tomorrow

Erik Osterman (Cloud Posse)about 7 years ago

I'd be happy to give a demo

rohitabout 7 years ago

nothing specific, we are thinking about using prometheus

Erik Osterman (Cloud Posse)about 7 years ago

is there something specific you're interested in?

Erik Osterman (Cloud Posse)about 7 years ago

#office-hours topics are really driven by who ever attends

rohitabout 7 years ago

@Erik Osterman (Cloud Posse) Hi. Are you planning to talk about prometheus anytime soon during office-hours?

Igor Rodionovabout 7 years ago

@Igor Rodionov has joined the channel

Erik Osterman (Cloud Posse)about 7 years ago

@Igor Rodionov deployed something like that. not specifically for lambdas though.

tamskyabout 7 years ago

Has anyone here used https://github.com/weaveworks/prom-aggregation-gateway for aggregating metrics from Lambda functions?
Curious if anyone has field notes to share.

tamskyabout 7 years ago

afaik, there's no custom SRV record type that provides more than service port and weight.

tamskyabout 7 years ago

You don't typically need to adjust the path, as it's always /metrics on standard exporters. What's your use case where the /metrics endpoint also needs to be discoverable?

tamskyabout 7 years ago

Port should be automatic if you're using SRV records

Abel Luckabout 7 years ago

@tamsky thanks for the link to the example configs, I didn't know about that.

However, when i meant custom path/port I meant a config such that the path and port are discovered from the DNS SRV entry

tamskyabout 7 years ago

@Abel Luck have you checked out the example configs for DNS service discovery?
- https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config
- https://github.com/prometheus/prometheus/blob/release-2.10/config/testdata/conf.good.yml#L79-L123

That's a full example that touches most of the things you mentioned
- endpoint lookup by DNS name (Lines 96-99)
- custom /metrics path (Line 90)
- and to get a custom port, I'd replace Line 91 (scheme: https) with port: <custom#>

#prometheus

prometheus