- Add /api/metrics/prometheus endpoint using prom-client (unauthenticated for scraping) - Update middleware to allow unauthenticated access to prometheus endpoint - Add /api/metrics permission routing (platform:read for GET) - Install prom-client dependency - Add metrics.md with Grafana dashboard JSON, Prometheus config, alerting rules
6.6 KiB
6.6 KiB
Admin 平台指标 — Grafana / Prometheus 配置指南
概述
Admin 服务暴露两个指标端点:
| 端点 | 格式 | 用途 |
|---|---|---|
GET /api/metrics |
JSON | 前端页面 / 人工查看 / API 消费 |
GET /api/metrics/prometheus |
Prometheus Text | Prometheus 采集 |
Prometheus 端点 无需认证,可直接 scrape。
采集的指标
所有指标通过 platform_entity_count Gauge 暴露,带 entity 和 window 两个 label:
# HELP platform_entity_count Platform entity counts by time window
# TYPE platform_entity_count gauge
platform_entity_count{entity="users",window="total"} 1000
platform_entity_count{entity="users",window="27h"} 5
platform_entity_count{entity="users",window="7d"} 32
platform_entity_count{entity="users",window="30d"} 150
platform_entity_count{entity="workspaces",window="total"} 50
platform_entity_count{entity="workspaces",window="27h"} 1
...
platform_entity_count{entity="skills",window="30d"} 45
Entity 列表:users、workspaces、projects、repos、rooms、skills
Window 列表:total(累计)、27h(近27小时)、7d(近7天)、30d(近30天)
Prometheus 配置
prometheus.yml
scrape_configs:
- job_name: 'admin-metrics'
scrape_interval: 60s
metrics_path: '/api/metrics/prometheus'
static_configs:
- targets: ['<admin-host>:<port>']
labels:
env: 'production'
service: 'admin'
K8s ServiceMonitor(如果用 prometheus-operator)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: admin-metrics
namespace: monitoring
labels:
release: prometheus
spec:
selector:
matchLabels:
app: admin
endpoints:
- port: http
path: /api/metrics/prometheus
interval: 60s
Grafana Dashboard
推荐 Panel 配置
Panel 1: 实体总量(Stat Panel)
Query:
platform_entity_count{window="total"}
Visualization: Stat
- Show: Value
- Color mode: Background
- Thresholds: 按实际业务设定
Panel 2: 27 小时增长趋势(Time Series / Bar Gauge)
Query:
platform_entity_count{window="27h"}
Visualization: Bar Gauge
- Display: Basic
- Show: Value
Panel 3: 7 天 / 30 天对比(Bar Chart)
Query:
platform_entity_count{window=~"7d|30d"}
Visualization: Bar Chart
- Group by: entity
- Bar mode: grouped
Panel 4: 总量汇总表(Table Panel)
Query:
platform_entity_count
Transform:
1. Labels to fields
2. Pivot by entity
3. Organize fields
Visualization: Table
Dashboard JSON 模板
将以下 JSON 导入 Grafana(Dashboard → Import → Paste JSON):
注:
uid和datasource需要根据实际 Prometheus 数据源修改。
{
"dashboard": {
"title": "Admin 平台指标",
"tags": ["admin", "platform"],
"timezone": "browser",
"panels": [
{
"id": 1,
"title": "实体总量",
"type": "stat",
"gridPos": { "h": 4, "w": 24, "x": 0, "y": 0 },
"targets": [
{
"expr": "platform_entity_count{window=\"total\"}",
"legendFormat": "{{entity}}"
}
],
"options": {
"colorMode": "background",
"graphMode": "none",
"justifyMode": "auto"
},
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 100 },
{ "color": "red", "value": 1000 }
]
}
}
}
},
{
"id": 2,
"title": "近 27 小时新增",
"type": "bargauge",
"gridPos": { "h": 6, "w": 12, "x": 0, "y": 4 },
"targets": [
{
"expr": "platform_entity_count{window=\"27h\"}",
"legendFormat": "{{entity}}"
}
],
"options": {
"displayMode": "gradient",
"orientation": "horizontal"
}
},
{
"id": 3,
"title": "近 7 天 / 30 天对比",
"type": "barchart",
"gridPos": { "h": 6, "w": 12, "x": 12, "y": 4 },
"targets": [
{
"expr": "platform_entity_count{window=~\"7d|30d\"}",
"legendFormat": "{{entity}} ({{window}})"
}
],
"options": {
"barRadius": 0.05,
"groupWidth": 0.7,
"orientation": "auto"
}
},
{
"id": 4,
"title": "指标汇总表",
"type": "table",
"gridPos": { "h": 8, "w": 24, "x": 0, "y": 10 },
"targets": [
{
"expr": "platform_entity_count",
"format": "table",
"instant": true
}
],
"transformations": [
{ "id": "labelsToFields", "options": {} },
{
"id": "organize",
"options": {
"excludeByName": { "Time": true, "__name__": true },
"indexByName": { "entity": 0, "window": 1, "Value": 2 }
}
}
],
"options": {
"showHeader": true,
"sortBy": [{ "desc": false, "displayName": "entity" }]
}
}
],
"time": { "from": "now-24h", "to": "now" },
"refresh": "1m"
},
"overwrite": true
}
告警规则(可选)
prometheus rules
groups:
- name: admin-entity-growth
rules:
# 27 小时内用户增长超过 100 告警
- alert: HighUserGrowth27h
expr: platform_entity_count{entity="users", window="27h"} > 100
for: 5m
labels:
severity: warning
annotations:
summary: "27 小时内新增用户 {{ $value }} 超过阈值"
# 仓库 7 天零增长告警
- alert: NoRepoGrowth7d
expr: platform_entity_count{entity="repos", window="7d"} == 0
and on() platform_entity_count{entity="repos", window="total"} > 0
for: 1h
labels:
severity: info
annotations:
summary: "近 7 天无新增仓库"
验证
# 1. JSON 格式
curl http://localhost:3000/api/metrics | jq .
# 2. Prometheus 格式
curl http://localhost:3000/api/metrics/prometheus
# 预期输出:
# HELP platform_entity_count Platform entity counts by time window
# TYPE platform_entity_count gauge
platform_entity_count{entity="users",window="27h"} 0
platform_entity_count{entity="users",window="30d"} 0
platform_entity_count{entity="users",window="7d"} 0
platform_entity_count{entity="users",window="total"} 5
...