Commit Graph

25 Commits

Author SHA1 Message Date
ZhenYi
468007177f fix(hooks,email): add describe_counter! to pre-register metrics
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
Hook tasks and email metrics were missing from /metrics because
describe_counter! was never called before install_recorder(), so
unincremented counters were not exported. Room metrics appeared
because RoomMetrics::new() already described them.

- apps/git-hook: describe 8 hook_tasks_*/hook_sync_* counters
- apps/email: describe 8 email_* counters
- both: add metrics = "0.22" as direct dependency
2026-04-26 00:42:59 +08:00
ZhenYi
d593354ba9 feat: add sitemap index with static/users/projects/repos sub-sitemaps
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
- Main sitemap index at /sitemap.xml referencing 4 sub-sitemaps
- /sidemap/static: fixed routes (homepage, auth, marketing pages)
- /sidemap/users: public user profiles sorted alphabetically
- /sidemap/projects: public projects sorted alphabetically
- /sidemap/repos: public repos sorted alphabetically
- Redis cache with 8h TTL (no refresh on access), key: sidemap:{type}
- robots.txt Sitemap URL uses main_domain() with https:// forced
- All sitemap loc entries use https:// base URL
2026-04-26 00:06:18 +08:00
ZhenYi
da9e96f6dd feat: add /robots.txt blocking sensitive paths from crawlers
Disallows: /api/, /health, /metrics, /ws/, /avatar/, /blob/,
/media/, /static/, /assets/
2026-04-25 23:49:50 +08:00
ZhenYi
10836730ed feat: add health endpoints and Prometheus metrics to git-hook and email-worker
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
Health monitoring:
- gitserver: /health endpoint on port 8021 (DB + Redis ping)
- git-hook: hyper health server on port 8083 with /health
- email-worker: hyper health server on port 8084 with /health
- K8s probes updated to httpGet for all three services

Metrics (via /metrics endpoint):
- git-hook: hook_tasks_total/success/failed/locked/retried/exhausted,
  hook_sync_branches/tags_changed_total
- email: email_queued/consumed/sent/failed_total,
  email_validation_skipped/build_errors/send_attempts_total
2026-04-25 23:45:48 +08:00
ZhenYi
8b47f677bb fix(avatar): add upload API routes and fix URL path prefix
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
- Add /api/users/me/avatar and /api/projects/{name}/avatar multipart upload endpoints
- Fix avatar URL path: missing /avatar prefix (static.gitdata.ai/avatar/{file})
- Fix project avatar: Utc::now() → .timestamp(), missing extension, wrong return type
- Replace broken SkipNoisyPaths middleware with self-contained RequestLogger
  (actix-web 4.13 body type incompatibility with newer actix-http)
- Exclude /assets/* requests from main app logger
- Exclude /avatar/*, /blob/*, /media/*, /static/* from static server logger
- Fix TypingEvent missing sender_type field in ws_universal.rs and connection.rs
- Wire real fetch-based upload in user profile settings
- Add project avatar upload UI to project settings page
2026-04-25 23:19:22 +08:00
ZhenYi
b00d42ee8d chore(app): exclude health/metrics/WS from access logger output
Skip terminal access logs for noisy K8s probe and monitoring
endpoints: /health, /metrics, and /ws path prefix.
Applied to both the main app and static file server.
2026-04-25 22:51:21 +08:00
ZhenYi
91bebba45e fix(app): remove redundant Arc wrapper around PrometheusHandle
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
Actix-web's Data<T> extractor expects T to implement Clone,
so PrometheusHandle (which is Clone) does not need to be
wrapped in Arc. The Arc type mismatch caused /metrics to
return 500.
2026-04-25 21:09:32 +08:00
ZhenYi
0a9dfef9b4 chore: remove unused instance_id import in main.rs
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
2026-04-25 09:55:08 +08:00
ZhenYi
ae601774df chore(admin): remove all metrics/observability features
- Delete admin metrics dashboard page (admin/metrics/page.tsx)
- Delete LineChart component (used only by metrics)
- Remove "指标监控" nav link from Sidebar
- Remove getMetrics/exportMetricsCsv from admin-rpc.ts client
- Remove /api/admin/metrics and /api/admin/metrics/export HTTP routes
  from adminrpc (was also leaking metrics via HTTP)
- Remove metrics RPC methods (get_metrics, export_metrics_csv) from
  adminrpc gRPC server and their helper functions
- Remove spawn_redis_metrics_flusher from app/main.rs
- Remove redis_metrics module from observability crate
- Remove redis/deadpool-redis deps from observability Cargo.toml
2026-04-23 15:42:00 +08:00
ZhenYi
f125fb0c02 fix(adminrpc): pass otel_enabled as defer arg to avoid double-init
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
When OTLP is enabled, init_tracing_subscriber() must defer so that
init_otlp() is the sole caller of try_init(). Without this, the adminrpc
binary crashes with "global default trace dispatcher already set".
2026-04-22 23:47:15 +08:00
ZhenYi
acd7fe8f6c fix(email): pass defer argument to init_tracing_subscriber
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
2026-04-22 23:36:40 +08:00
ZhenYi
6310dfda2f fix(gitserver,git-hook): pass defer argument to init_tracing_subscriber
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
The init_tracing_subscriber() function now takes a second `defer: bool`
argument. These binaries do not use OTLP, so pass false.
2026-04-22 23:32:43 +08:00
ZhenYi
8defac98ad fix(observability): resolve tracing double-init runtime panic
Both init_tracing_subscriber() and init_otlp() were calling try_init()
on the global tracing dispatcher, causing "global default trace dispatcher
has already been set" at runtime when APP_OTEL_ENABLED=true.

Fix: simplify the API so init_tracing_subscriber() never installs the
subscriber — it either calls try_init() immediately (non-OTLP mode) or
returns without installing (OTLP mode, defer=true).  init_otlp() now
builds the complete subscriber stack (registry + env_filter + fmt_layer +
otel_layer) and calls try_init() once.

init_tracing_subscriber() signature: (level, defer) → ()
init_otlp() signature: (endpoint, service_name, _, log_level) → Result

The fmt layer is replicated inside init_otlp() for the OTLP path.
2026-04-22 23:28:56 +08:00
ZhenYi
f67c788cbe feat(gRPC): migrate admin RPC from Redis Pub/Sub to Tonic gRPC
- libs/rpc/admin: tonic-prost generated server + client wrappers
- apps/adminrpc: standalone binary with all 8 admin RPC methods
- Redis Pub/Sub JSON-RPC code removed from admin module
- libs/agent: add React agent loop for ReAct pattern
- proto/admin.proto: updated with list_workspace_sessions, is_user_online
2026-04-22 22:39:06 +08:00
ZhenYi
962bf0312d feat(observability): Phase 6 OTLP tracing + Prometheus metrics endpoint
OTLP tracing:
- libs/observability/otlp.rs: SdkTracerProvider via HTTP/proto OTLP exporter
- libs/observability/tracing_middleware.rs: Actix-web span with trace_id propagation
- libs/observability/tracing_fmt.rs: JSON fmt + registry.try_init for layered init
- libs/rpc: gRPC method spans via info_span
- libs/agent, libs/room, libs/service, libs/api: structured tracing throughout

Prometheus metrics:
- libs/observability/prometheus_exporter.rs: /metrics HTTP handler + metrics crate
- libs/observability/metrics_middleware.rs: HttpMetrics middleware + AtomicU64
- libs/observability/redis_metrics.rs: Redis counter poller via RedisMetrics
- libs/room/metrics.rs: RoomMetrics (connections, messages, presence counters)

Config env vars: APP_OTEL_ENABLED, APP_OTEL_ENDPOINT, APP_OTEL_SERVICE_NAME
2026-04-22 10:27:54 +08:00
ZhenYi
fbd228f17e feat(adminrpc): new standalone binary for admin gRPC service
Separate binary for Kubernetes internal admin RPC communication
(SessionAdmin service on port 9090). Includes:

- Redis cluster pool via session_manager
- OTLP tracing with env-driven configuration
- Tracing subscriber init (JSON to stderr)
- Graceful startup with connection verification
2026-04-21 23:06:11 +08:00
ZhenYi
4aaee59fa4 fix(app): PrometheusHandle must be Data-wrapped before Fn closure capture
PrometheusHandle was moved into the HttpServer Fn closure but Fn
closures require Clone (not FnOnce). Wrap in web::Data before
cloning into the closure.
2026-04-21 23:05:54 +08:00
ZhenYi
236aebe4ea refactor(apps): migrate app, gitserver, git-hook, email from slog to tracing
- apps/app: remove mod logging, replace init_tracing_subscriber() call,
  remove slog macros from main.rs, remove logging.rs
- apps/gitserver: remove slog usage from main.rs
- apps/git-hook: remove slog from main.rs
- apps/email: remove slog from main.rs
2026-04-21 22:30:01 +08:00
ZhenYi
81e6ee3d48 feat(observability): Phase 1-5 slog structured logging across platform
Phase 1: add libs/observability crate (build_logger, instance_id);
  remove duplicate logger init from 4 crates
Phase 2: Actix-web RequestLogger with trace_id; MetricsMiddleware + HttpMetrics
Phase 3: Git SSH handle.rs slog struct; HTTP handler Logger kv
Phase 4: AI client eprintln -> slog warn; billing ai_usage_recorded log
Phase 5: SessionManager slog; workspace alert slog 2.x syntax
2026-04-21 13:44:12 +08:00
ZhenYi
fb91f5a6c5 feat(admin): add admin panel with billing alerts and model sync
- Add libs/api/admin with admin API endpoints:
  sync models, workspace credit, billing alert check
- Add workspace_alert_config model and alert service
- Add Session::no_op() for background tasks without user context
- Add admin/ Next.js admin panel (AI models, billing, workspaces, audit)
- Start billing alert background task every 30 minutes
2026-04-19 20:48:59 +08:00
ZhenYi
3354055e6d fix(operator): mount /data PVC into git-hook deployment
GitHook controller was generating a Deployment without any persistent
storage — only a ConfigMap volume at /config. The worker needs /data to
access repo storage paths (APP_REPOS_ROOT defaults to /data/repos).

Changes:
- GitHookSpec: added storage_size field (default 10Gi), matching the
  pattern already used by GitServerSpec
- git_hook.rs reconcile(): now creates a PVC ({name}-data) before the
  Deployment, mounts it at /data, and sets APP_REPOS_ROOT=/data/repos
- git-hook-crd.yaml: synced storageSize field into the CRD schema
2026-04-17 14:15:38 +08:00
ZhenYi
7c042c7b9d fix(git-hook): use HookService instead of non-existent GitServiceHooks
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
GitServiceHooks was renamed to HookService in the hook module refactor.
Updated main.rs to:
- Use HookService::new() with correct parameters (no http client)
- Call start_worker() which returns CancellationToken
- Wait on cancel.cancelled() instead of double-awaiting ctrl_c
- Clone token before moving into signal handler task
2026-04-17 13:54:17 +08:00
ZhenYi
9368df54da feat(service): auto-sync OpenRouter models on app startup and every 10 minutes
- Add `start_sync_task()` in agent/sync.rs: spawns a background task
  that syncs immediately on app startup, then every 10 minutes.
- `sync_once()` performs a single pass; errors are logged and swallowed
  so the periodic task never stops.
- Remove authentication requirement from OpenRouter API (no API key needed).
- Call `service.start_sync_task()` from main.rs after AppService init.
- Also update the existing `sync_upstream_models` (HTTP API) to remove
  the now-unnecessary API key requirement for consistency.
2026-04-16 22:35:34 +08:00
ZhenYi
df976d16cb fix(operator): Time to_string and phase type mismatch
- Time does not implement Display, use .0 (inner DateTime<Utc>) and
  to_rfc3339() instead.
- phase is &str, convert to String to match JobStatusResult.phase.
2026-04-15 09:38:46 +08:00
ZhenYi
93cfff9738 init 2026-04-15 09:08:09 +08:00