gitdataai/admin/metrics.md
ZhenYi 27cd4ea83c
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
feat(admin/metrics): add Prometheus-compatible metrics endpoint and ops documentation
- Add /api/metrics/prometheus endpoint using prom-client (unauthenticated for scraping)
- Update middleware to allow unauthenticated access to prometheus endpoint
- Add /api/metrics permission routing (platform:read for GET)
- Install prom-client dependency
- Add metrics.md with Grafana dashboard JSON, Prometheus config, alerting rules
2026-04-26 14:49:25 +08:00

279 lines
6.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Admin 平台指标 — Grafana / Prometheus 配置指南
## 概述
Admin 服务暴露两个指标端点:
| 端点 | 格式 | 用途 |
|------|------|------|
| `GET /api/metrics` | JSON | 前端页面 / 人工查看 / API 消费 |
| `GET /api/metrics/prometheus` | Prometheus Text | Prometheus 采集 |
Prometheus 端点 **无需认证**,可直接 scrape。
## 采集的指标
所有指标通过 `platform_entity_count` Gauge 暴露,带 `entity``window` 两个 label
```
# HELP platform_entity_count Platform entity counts by time window
# TYPE platform_entity_count gauge
platform_entity_count{entity="users",window="total"} 1000
platform_entity_count{entity="users",window="27h"} 5
platform_entity_count{entity="users",window="7d"} 32
platform_entity_count{entity="users",window="30d"} 150
platform_entity_count{entity="workspaces",window="total"} 50
platform_entity_count{entity="workspaces",window="27h"} 1
...
platform_entity_count{entity="skills",window="30d"} 45
```
Entity 列表:`users`、`workspaces`、`projects`、`repos`、`rooms`、`skills`
Window 列表:`total`(累计)、`27h`近27小时、`7d`近7天、`30d`近30天
## Prometheus 配置
### prometheus.yml
```yaml
scrape_configs:
- job_name: 'admin-metrics'
scrape_interval: 60s
metrics_path: '/api/metrics/prometheus'
static_configs:
- targets: ['<admin-host>:<port>']
labels:
env: 'production'
service: 'admin'
```
### K8s ServiceMonitor如果用 prometheus-operator
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: admin-metrics
namespace: monitoring
labels:
release: prometheus
spec:
selector:
matchLabels:
app: admin
endpoints:
- port: http
path: /api/metrics/prometheus
interval: 60s
```
## Grafana Dashboard
### 推荐 Panel 配置
#### Panel 1: 实体总量Stat Panel
```
Query:
platform_entity_count{window="total"}
Visualization: Stat
- Show: Value
- Color mode: Background
- Thresholds: 按实际业务设定
```
#### Panel 2: 27 小时增长趋势Time Series / Bar Gauge
```
Query:
platform_entity_count{window="27h"}
Visualization: Bar Gauge
- Display: Basic
- Show: Value
```
#### Panel 3: 7 天 / 30 天对比Bar Chart
```
Query:
platform_entity_count{window=~"7d|30d"}
Visualization: Bar Chart
- Group by: entity
- Bar mode: grouped
```
#### Panel 4: 总量汇总表Table Panel
```
Query:
platform_entity_count
Transform:
1. Labels to fields
2. Pivot by entity
3. Organize fields
Visualization: Table
```
### Dashboard JSON 模板
将以下 JSON 导入 GrafanaDashboard → Import → Paste JSON
> 注:`uid` 和 `datasource` 需要根据实际 Prometheus 数据源修改。
```json
{
"dashboard": {
"title": "Admin 平台指标",
"tags": ["admin", "platform"],
"timezone": "browser",
"panels": [
{
"id": 1,
"title": "实体总量",
"type": "stat",
"gridPos": { "h": 4, "w": 24, "x": 0, "y": 0 },
"targets": [
{
"expr": "platform_entity_count{window=\"total\"}",
"legendFormat": "{{entity}}"
}
],
"options": {
"colorMode": "background",
"graphMode": "none",
"justifyMode": "auto"
},
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 100 },
{ "color": "red", "value": 1000 }
]
}
}
}
},
{
"id": 2,
"title": "近 27 小时新增",
"type": "bargauge",
"gridPos": { "h": 6, "w": 12, "x": 0, "y": 4 },
"targets": [
{
"expr": "platform_entity_count{window=\"27h\"}",
"legendFormat": "{{entity}}"
}
],
"options": {
"displayMode": "gradient",
"orientation": "horizontal"
}
},
{
"id": 3,
"title": "近 7 天 / 30 天对比",
"type": "barchart",
"gridPos": { "h": 6, "w": 12, "x": 12, "y": 4 },
"targets": [
{
"expr": "platform_entity_count{window=~\"7d|30d\"}",
"legendFormat": "{{entity}} ({{window}})"
}
],
"options": {
"barRadius": 0.05,
"groupWidth": 0.7,
"orientation": "auto"
}
},
{
"id": 4,
"title": "指标汇总表",
"type": "table",
"gridPos": { "h": 8, "w": 24, "x": 0, "y": 10 },
"targets": [
{
"expr": "platform_entity_count",
"format": "table",
"instant": true
}
],
"transformations": [
{ "id": "labelsToFields", "options": {} },
{
"id": "organize",
"options": {
"excludeByName": { "Time": true, "__name__": true },
"indexByName": { "entity": 0, "window": 1, "Value": 2 }
}
}
],
"options": {
"showHeader": true,
"sortBy": [{ "desc": false, "displayName": "entity" }]
}
}
],
"time": { "from": "now-24h", "to": "now" },
"refresh": "1m"
},
"overwrite": true
}
```
## 告警规则(可选)
### prometheus rules
```yaml
groups:
- name: admin-entity-growth
rules:
# 27 小时内用户增长超过 100 告警
- alert: HighUserGrowth27h
expr: platform_entity_count{entity="users", window="27h"} > 100
for: 5m
labels:
severity: warning
annotations:
summary: "27 小时内新增用户 {{ $value }} 超过阈值"
# 仓库 7 天零增长告警
- alert: NoRepoGrowth7d
expr: platform_entity_count{entity="repos", window="7d"} == 0
and on() platform_entity_count{entity="repos", window="total"} > 0
for: 1h
labels:
severity: info
annotations:
summary: "近 7 天无新增仓库"
```
## 验证
```bash
# 1. JSON 格式
curl http://localhost:3000/api/metrics | jq .
# 2. Prometheus 格式
curl http://localhost:3000/api/metrics/prometheus
# 预期输出:
# HELP platform_entity_count Platform entity counts by time window
# TYPE platform_entity_count gauge
platform_entity_count{entity="users",window="27h"} 0
platform_entity_count{entity="users",window="30d"} 0
platform_entity_count{entity="users",window="7d"} 0
platform_entity_count{entity="users",window="total"} 5
...
```