#prometheus - August 2023 | Slack Archive

How are y’all scaling prometheus-server when 1 instance can’t handle the huge amount of metrics (so starts OOMing)?

We are adopting Prometheus-Operator and debating between:
• a) “one prometheus per namespace”: benefit for us would be that we have >100 namespaces and a few dozen teams, so can have reporting and isolate impact to each namespace.
• b) “functional sharding”: the Prometheus shard X scrapes all pods of Service A, B and C while shard Y scrapes pods from Service D, E and F.
• c) “automatic sharding”: the targets will be assigned to Prometheus shards based on their addresses.