- Add /api/metrics/prometheus endpoint using prom-client (unauthenticated for scraping) - Update middleware to allow unauthenticated access to prometheus endpoint - Add /api/metrics permission routing (platform:read for GET) - Install prom-client dependency - Add metrics.md with Grafana dashboard JSON, Prometheus config, alerting rules
279 lines
6.6 KiB
Markdown
279 lines
6.6 KiB
Markdown
# Admin 平台指标 — Grafana / Prometheus 配置指南
|
||
|
||
## 概述
|
||
|
||
Admin 服务暴露两个指标端点:
|
||
|
||
| 端点 | 格式 | 用途 |
|
||
|------|------|------|
|
||
| `GET /api/metrics` | JSON | 前端页面 / 人工查看 / API 消费 |
|
||
| `GET /api/metrics/prometheus` | Prometheus Text | Prometheus 采集 |
|
||
|
||
Prometheus 端点 **无需认证**,可直接 scrape。
|
||
|
||
## 采集的指标
|
||
|
||
所有指标通过 `platform_entity_count` Gauge 暴露,带 `entity` 和 `window` 两个 label:
|
||
|
||
```
|
||
# HELP platform_entity_count Platform entity counts by time window
|
||
# TYPE platform_entity_count gauge
|
||
platform_entity_count{entity="users",window="total"} 1000
|
||
platform_entity_count{entity="users",window="27h"} 5
|
||
platform_entity_count{entity="users",window="7d"} 32
|
||
platform_entity_count{entity="users",window="30d"} 150
|
||
platform_entity_count{entity="workspaces",window="total"} 50
|
||
platform_entity_count{entity="workspaces",window="27h"} 1
|
||
...
|
||
platform_entity_count{entity="skills",window="30d"} 45
|
||
```
|
||
|
||
Entity 列表:`users`、`workspaces`、`projects`、`repos`、`rooms`、`skills`
|
||
|
||
Window 列表:`total`(累计)、`27h`(近27小时)、`7d`(近7天)、`30d`(近30天)
|
||
|
||
## Prometheus 配置
|
||
|
||
### prometheus.yml
|
||
|
||
```yaml
|
||
scrape_configs:
|
||
- job_name: 'admin-metrics'
|
||
scrape_interval: 60s
|
||
metrics_path: '/api/metrics/prometheus'
|
||
static_configs:
|
||
- targets: ['<admin-host>:<port>']
|
||
labels:
|
||
env: 'production'
|
||
service: 'admin'
|
||
```
|
||
|
||
### K8s ServiceMonitor(如果用 prometheus-operator)
|
||
|
||
```yaml
|
||
apiVersion: monitoring.coreos.com/v1
|
||
kind: ServiceMonitor
|
||
metadata:
|
||
name: admin-metrics
|
||
namespace: monitoring
|
||
labels:
|
||
release: prometheus
|
||
spec:
|
||
selector:
|
||
matchLabels:
|
||
app: admin
|
||
endpoints:
|
||
- port: http
|
||
path: /api/metrics/prometheus
|
||
interval: 60s
|
||
```
|
||
|
||
## Grafana Dashboard
|
||
|
||
### 推荐 Panel 配置
|
||
|
||
#### Panel 1: 实体总量(Stat Panel)
|
||
|
||
```
|
||
Query:
|
||
platform_entity_count{window="total"}
|
||
|
||
Visualization: Stat
|
||
- Show: Value
|
||
- Color mode: Background
|
||
- Thresholds: 按实际业务设定
|
||
```
|
||
|
||
#### Panel 2: 27 小时增长趋势(Time Series / Bar Gauge)
|
||
|
||
```
|
||
Query:
|
||
platform_entity_count{window="27h"}
|
||
|
||
Visualization: Bar Gauge
|
||
- Display: Basic
|
||
- Show: Value
|
||
```
|
||
|
||
#### Panel 3: 7 天 / 30 天对比(Bar Chart)
|
||
|
||
```
|
||
Query:
|
||
platform_entity_count{window=~"7d|30d"}
|
||
|
||
Visualization: Bar Chart
|
||
- Group by: entity
|
||
- Bar mode: grouped
|
||
```
|
||
|
||
#### Panel 4: 总量汇总表(Table Panel)
|
||
|
||
```
|
||
Query:
|
||
platform_entity_count
|
||
|
||
Transform:
|
||
1. Labels to fields
|
||
2. Pivot by entity
|
||
3. Organize fields
|
||
|
||
Visualization: Table
|
||
```
|
||
|
||
### Dashboard JSON 模板
|
||
|
||
将以下 JSON 导入 Grafana(Dashboard → Import → Paste JSON):
|
||
|
||
> 注:`uid` 和 `datasource` 需要根据实际 Prometheus 数据源修改。
|
||
|
||
```json
|
||
{
|
||
"dashboard": {
|
||
"title": "Admin 平台指标",
|
||
"tags": ["admin", "platform"],
|
||
"timezone": "browser",
|
||
"panels": [
|
||
{
|
||
"id": 1,
|
||
"title": "实体总量",
|
||
"type": "stat",
|
||
"gridPos": { "h": 4, "w": 24, "x": 0, "y": 0 },
|
||
"targets": [
|
||
{
|
||
"expr": "platform_entity_count{window=\"total\"}",
|
||
"legendFormat": "{{entity}}"
|
||
}
|
||
],
|
||
"options": {
|
||
"colorMode": "background",
|
||
"graphMode": "none",
|
||
"justifyMode": "auto"
|
||
},
|
||
"fieldConfig": {
|
||
"defaults": {
|
||
"thresholds": {
|
||
"mode": "absolute",
|
||
"steps": [
|
||
{ "color": "green", "value": null },
|
||
{ "color": "yellow", "value": 100 },
|
||
{ "color": "red", "value": 1000 }
|
||
]
|
||
}
|
||
}
|
||
}
|
||
},
|
||
{
|
||
"id": 2,
|
||
"title": "近 27 小时新增",
|
||
"type": "bargauge",
|
||
"gridPos": { "h": 6, "w": 12, "x": 0, "y": 4 },
|
||
"targets": [
|
||
{
|
||
"expr": "platform_entity_count{window=\"27h\"}",
|
||
"legendFormat": "{{entity}}"
|
||
}
|
||
],
|
||
"options": {
|
||
"displayMode": "gradient",
|
||
"orientation": "horizontal"
|
||
}
|
||
},
|
||
{
|
||
"id": 3,
|
||
"title": "近 7 天 / 30 天对比",
|
||
"type": "barchart",
|
||
"gridPos": { "h": 6, "w": 12, "x": 12, "y": 4 },
|
||
"targets": [
|
||
{
|
||
"expr": "platform_entity_count{window=~\"7d|30d\"}",
|
||
"legendFormat": "{{entity}} ({{window}})"
|
||
}
|
||
],
|
||
"options": {
|
||
"barRadius": 0.05,
|
||
"groupWidth": 0.7,
|
||
"orientation": "auto"
|
||
}
|
||
},
|
||
{
|
||
"id": 4,
|
||
"title": "指标汇总表",
|
||
"type": "table",
|
||
"gridPos": { "h": 8, "w": 24, "x": 0, "y": 10 },
|
||
"targets": [
|
||
{
|
||
"expr": "platform_entity_count",
|
||
"format": "table",
|
||
"instant": true
|
||
}
|
||
],
|
||
"transformations": [
|
||
{ "id": "labelsToFields", "options": {} },
|
||
{
|
||
"id": "organize",
|
||
"options": {
|
||
"excludeByName": { "Time": true, "__name__": true },
|
||
"indexByName": { "entity": 0, "window": 1, "Value": 2 }
|
||
}
|
||
}
|
||
],
|
||
"options": {
|
||
"showHeader": true,
|
||
"sortBy": [{ "desc": false, "displayName": "entity" }]
|
||
}
|
||
}
|
||
],
|
||
"time": { "from": "now-24h", "to": "now" },
|
||
"refresh": "1m"
|
||
},
|
||
"overwrite": true
|
||
}
|
||
```
|
||
|
||
## 告警规则(可选)
|
||
|
||
### prometheus rules
|
||
|
||
```yaml
|
||
groups:
|
||
- name: admin-entity-growth
|
||
rules:
|
||
# 27 小时内用户增长超过 100 告警
|
||
- alert: HighUserGrowth27h
|
||
expr: platform_entity_count{entity="users", window="27h"} > 100
|
||
for: 5m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "27 小时内新增用户 {{ $value }} 超过阈值"
|
||
|
||
# 仓库 7 天零增长告警
|
||
- alert: NoRepoGrowth7d
|
||
expr: platform_entity_count{entity="repos", window="7d"} == 0
|
||
and on() platform_entity_count{entity="repos", window="total"} > 0
|
||
for: 1h
|
||
labels:
|
||
severity: info
|
||
annotations:
|
||
summary: "近 7 天无新增仓库"
|
||
```
|
||
|
||
## 验证
|
||
|
||
```bash
|
||
# 1. JSON 格式
|
||
curl http://localhost:3000/api/metrics | jq .
|
||
|
||
# 2. Prometheus 格式
|
||
curl http://localhost:3000/api/metrics/prometheus
|
||
|
||
# 预期输出:
|
||
# HELP platform_entity_count Platform entity counts by time window
|
||
# TYPE platform_entity_count gauge
|
||
platform_entity_count{entity="users",window="27h"} 0
|
||
platform_entity_count{entity="users",window="30d"} 0
|
||
platform_entity_count{entity="users",window="7d"} 0
|
||
platform_entity_count{entity="users",window="total"} 5
|
||
...
|
||
```
|