Commit Graph

16 Commits

Author SHA1 Message Date
ZhenYi
f125fb0c02 fix(adminrpc): pass otel_enabled as defer arg to avoid double-init
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
When OTLP is enabled, init_tracing_subscriber() must defer so that
init_otlp() is the sole caller of try_init(). Without this, the adminrpc
binary crashes with "global default trace dispatcher already set".
2026-04-22 23:47:15 +08:00
ZhenYi
acd7fe8f6c fix(email): pass defer argument to init_tracing_subscriber
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
2026-04-22 23:36:40 +08:00
ZhenYi
6310dfda2f fix(gitserver,git-hook): pass defer argument to init_tracing_subscriber
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
The init_tracing_subscriber() function now takes a second `defer: bool`
argument. These binaries do not use OTLP, so pass false.
2026-04-22 23:32:43 +08:00
ZhenYi
8defac98ad fix(observability): resolve tracing double-init runtime panic
Both init_tracing_subscriber() and init_otlp() were calling try_init()
on the global tracing dispatcher, causing "global default trace dispatcher
has already been set" at runtime when APP_OTEL_ENABLED=true.

Fix: simplify the API so init_tracing_subscriber() never installs the
subscriber — it either calls try_init() immediately (non-OTLP mode) or
returns without installing (OTLP mode, defer=true).  init_otlp() now
builds the complete subscriber stack (registry + env_filter + fmt_layer +
otel_layer) and calls try_init() once.

init_tracing_subscriber() signature: (level, defer) → ()
init_otlp() signature: (endpoint, service_name, _, log_level) → Result

The fmt layer is replicated inside init_otlp() for the OTLP path.
2026-04-22 23:28:56 +08:00
ZhenYi
f67c788cbe feat(gRPC): migrate admin RPC from Redis Pub/Sub to Tonic gRPC
- libs/rpc/admin: tonic-prost generated server + client wrappers
- apps/adminrpc: standalone binary with all 8 admin RPC methods
- Redis Pub/Sub JSON-RPC code removed from admin module
- libs/agent: add React agent loop for ReAct pattern
- proto/admin.proto: updated with list_workspace_sessions, is_user_online
2026-04-22 22:39:06 +08:00
ZhenYi
962bf0312d feat(observability): Phase 6 OTLP tracing + Prometheus metrics endpoint
OTLP tracing:
- libs/observability/otlp.rs: SdkTracerProvider via HTTP/proto OTLP exporter
- libs/observability/tracing_middleware.rs: Actix-web span with trace_id propagation
- libs/observability/tracing_fmt.rs: JSON fmt + registry.try_init for layered init
- libs/rpc: gRPC method spans via info_span
- libs/agent, libs/room, libs/service, libs/api: structured tracing throughout

Prometheus metrics:
- libs/observability/prometheus_exporter.rs: /metrics HTTP handler + metrics crate
- libs/observability/metrics_middleware.rs: HttpMetrics middleware + AtomicU64
- libs/observability/redis_metrics.rs: Redis counter poller via RedisMetrics
- libs/room/metrics.rs: RoomMetrics (connections, messages, presence counters)

Config env vars: APP_OTEL_ENABLED, APP_OTEL_ENDPOINT, APP_OTEL_SERVICE_NAME
2026-04-22 10:27:54 +08:00
ZhenYi
fbd228f17e feat(adminrpc): new standalone binary for admin gRPC service
Separate binary for Kubernetes internal admin RPC communication
(SessionAdmin service on port 9090). Includes:

- Redis cluster pool via session_manager
- OTLP tracing with env-driven configuration
- Tracing subscriber init (JSON to stderr)
- Graceful startup with connection verification
2026-04-21 23:06:11 +08:00
ZhenYi
4aaee59fa4 fix(app): PrometheusHandle must be Data-wrapped before Fn closure capture
PrometheusHandle was moved into the HttpServer Fn closure but Fn
closures require Clone (not FnOnce). Wrap in web::Data before
cloning into the closure.
2026-04-21 23:05:54 +08:00
ZhenYi
236aebe4ea refactor(apps): migrate app, gitserver, git-hook, email from slog to tracing
- apps/app: remove mod logging, replace init_tracing_subscriber() call,
  remove slog macros from main.rs, remove logging.rs
- apps/gitserver: remove slog usage from main.rs
- apps/git-hook: remove slog from main.rs
- apps/email: remove slog from main.rs
2026-04-21 22:30:01 +08:00
ZhenYi
81e6ee3d48 feat(observability): Phase 1-5 slog structured logging across platform
Phase 1: add libs/observability crate (build_logger, instance_id);
  remove duplicate logger init from 4 crates
Phase 2: Actix-web RequestLogger with trace_id; MetricsMiddleware + HttpMetrics
Phase 3: Git SSH handle.rs slog struct; HTTP handler Logger kv
Phase 4: AI client eprintln -> slog warn; billing ai_usage_recorded log
Phase 5: SessionManager slog; workspace alert slog 2.x syntax
2026-04-21 13:44:12 +08:00
ZhenYi
fb91f5a6c5 feat(admin): add admin panel with billing alerts and model sync
- Add libs/api/admin with admin API endpoints:
  sync models, workspace credit, billing alert check
- Add workspace_alert_config model and alert service
- Add Session::no_op() for background tasks without user context
- Add admin/ Next.js admin panel (AI models, billing, workspaces, audit)
- Start billing alert background task every 30 minutes
2026-04-19 20:48:59 +08:00
ZhenYi
3354055e6d fix(operator): mount /data PVC into git-hook deployment
GitHook controller was generating a Deployment without any persistent
storage — only a ConfigMap volume at /config. The worker needs /data to
access repo storage paths (APP_REPOS_ROOT defaults to /data/repos).

Changes:
- GitHookSpec: added storage_size field (default 10Gi), matching the
  pattern already used by GitServerSpec
- git_hook.rs reconcile(): now creates a PVC ({name}-data) before the
  Deployment, mounts it at /data, and sets APP_REPOS_ROOT=/data/repos
- git-hook-crd.yaml: synced storageSize field into the CRD schema
2026-04-17 14:15:38 +08:00
ZhenYi
7c042c7b9d fix(git-hook): use HookService instead of non-existent GitServiceHooks
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
GitServiceHooks was renamed to HookService in the hook module refactor.
Updated main.rs to:
- Use HookService::new() with correct parameters (no http client)
- Call start_worker() which returns CancellationToken
- Wait on cancel.cancelled() instead of double-awaiting ctrl_c
- Clone token before moving into signal handler task
2026-04-17 13:54:17 +08:00
ZhenYi
9368df54da feat(service): auto-sync OpenRouter models on app startup and every 10 minutes
- Add `start_sync_task()` in agent/sync.rs: spawns a background task
  that syncs immediately on app startup, then every 10 minutes.
- `sync_once()` performs a single pass; errors are logged and swallowed
  so the periodic task never stops.
- Remove authentication requirement from OpenRouter API (no API key needed).
- Call `service.start_sync_task()` from main.rs after AppService init.
- Also update the existing `sync_upstream_models` (HTTP API) to remove
  the now-unnecessary API key requirement for consistency.
2026-04-16 22:35:34 +08:00
ZhenYi
df976d16cb fix(operator): Time to_string and phase type mismatch
- Time does not implement Display, use .0 (inner DateTime<Utc>) and
  to_rfc3339() instead.
- phase is &str, convert to String to match JobStatusResult.phase.
2026-04-15 09:38:46 +08:00
ZhenYi
93cfff9738 init 2026-04-15 09:08:09 +08:00