CNPG's cluster-ro service already handles load balancing and failover,
so the application-level Vec + random_range is redundant.
- db_read: Vec<DatabaseConnection> → Option<DatabaseConnection>
- database_read_replicas returns Option<String> instead of Vec<String>
- health checks now explicitly ping both writer() and reader()
- remove unused rand dependency from libs/db
Hook tasks and email metrics were missing from /metrics because
describe_counter! was never called before install_recorder(), so
unincremented counters were not exported. Room metrics appeared
because RoomMetrics::new() already described them.
- apps/git-hook: describe 8 hook_tasks_*/hook_sync_* counters
- apps/email: describe 8 email_* counters
- both: add metrics = "0.22" as direct dependency
- Main sitemap index at /sitemap.xml referencing 4 sub-sitemaps
- /sidemap/static: fixed routes (homepage, auth, marketing pages)
- /sidemap/users: public user profiles sorted alphabetically
- /sidemap/projects: public projects sorted alphabetically
- /sidemap/repos: public repos sorted alphabetically
- Redis cache with 8h TTL (no refresh on access), key: sidemap:{type}
- robots.txt Sitemap URL uses main_domain() with https:// forced
- All sitemap loc entries use https:// base URL
Health monitoring:
- gitserver: /health endpoint on port 8021 (DB + Redis ping)
- git-hook: hyper health server on port 8083 with /health
- email-worker: hyper health server on port 8084 with /health
- K8s probes updated to httpGet for all three services
Metrics (via /metrics endpoint):
- git-hook: hook_tasks_total/success/failed/locked/retried/exhausted,
hook_sync_branches/tags_changed_total
- email: email_queued/consumed/sent/failed_total,
email_validation_skipped/build_errors/send_attempts_total
Skip terminal access logs for noisy K8s probe and monitoring
endpoints: /health, /metrics, and /ws path prefix.
Applied to both the main app and static file server.
Actix-web's Data<T> extractor expects T to implement Clone,
so PrometheusHandle (which is Clone) does not need to be
wrapped in Arc. The Arc type mismatch caused /metrics to
return 500.
- Delete admin metrics dashboard page (admin/metrics/page.tsx)
- Delete LineChart component (used only by metrics)
- Remove "指标监控" nav link from Sidebar
- Remove getMetrics/exportMetricsCsv from admin-rpc.ts client
- Remove /api/admin/metrics and /api/admin/metrics/export HTTP routes
from adminrpc (was also leaking metrics via HTTP)
- Remove metrics RPC methods (get_metrics, export_metrics_csv) from
adminrpc gRPC server and their helper functions
- Remove spawn_redis_metrics_flusher from app/main.rs
- Remove redis_metrics module from observability crate
- Remove redis/deadpool-redis deps from observability Cargo.toml
When OTLP is enabled, init_tracing_subscriber() must defer so that
init_otlp() is the sole caller of try_init(). Without this, the adminrpc
binary crashes with "global default trace dispatcher already set".
Both init_tracing_subscriber() and init_otlp() were calling try_init()
on the global tracing dispatcher, causing "global default trace dispatcher
has already been set" at runtime when APP_OTEL_ENABLED=true.
Fix: simplify the API so init_tracing_subscriber() never installs the
subscriber — it either calls try_init() immediately (non-OTLP mode) or
returns without installing (OTLP mode, defer=true). init_otlp() now
builds the complete subscriber stack (registry + env_filter + fmt_layer +
otel_layer) and calls try_init() once.
init_tracing_subscriber() signature: (level, defer) → ()
init_otlp() signature: (endpoint, service_name, _, log_level) → Result
The fmt layer is replicated inside init_otlp() for the OTLP path.
Separate binary for Kubernetes internal admin RPC communication
(SessionAdmin service on port 9090). Includes:
- Redis cluster pool via session_manager
- OTLP tracing with env-driven configuration
- Tracing subscriber init (JSON to stderr)
- Graceful startup with connection verification
PrometheusHandle was moved into the HttpServer Fn closure but Fn
closures require Clone (not FnOnce). Wrap in web::Data before
cloning into the closure.
- apps/app: remove mod logging, replace init_tracing_subscriber() call,
remove slog macros from main.rs, remove logging.rs
- apps/gitserver: remove slog usage from main.rs
- apps/git-hook: remove slog from main.rs
- apps/email: remove slog from main.rs
GitHook controller was generating a Deployment without any persistent
storage — only a ConfigMap volume at /config. The worker needs /data to
access repo storage paths (APP_REPOS_ROOT defaults to /data/repos).
Changes:
- GitHookSpec: added storage_size field (default 10Gi), matching the
pattern already used by GitServerSpec
- git_hook.rs reconcile(): now creates a PVC ({name}-data) before the
Deployment, mounts it at /data, and sets APP_REPOS_ROOT=/data/repos
- git-hook-crd.yaml: synced storageSize field into the CRD schema
GitServiceHooks was renamed to HookService in the hook module refactor.
Updated main.rs to:
- Use HookService::new() with correct parameters (no http client)
- Call start_worker() which returns CancellationToken
- Wait on cancel.cancelled() instead of double-awaiting ctrl_c
- Clone token before moving into signal handler task
- Add `start_sync_task()` in agent/sync.rs: spawns a background task
that syncs immediately on app startup, then every 10 minutes.
- `sync_once()` performs a single pass; errors are logged and swallowed
so the periodic task never stops.
- Remove authentication requirement from OpenRouter API (no API key needed).
- Call `service.start_sync_task()` from main.rs after AppService init.
- Also update the existing `sync_upstream_models` (HTTP API) to remove
the now-unnecessary API key requirement for consistency.
- Time does not implement Display, use .0 (inner DateTime<Utc>) and
to_rfc3339() instead.
- phase is &str, convert to String to match JobStatusResult.phase.