Commit Graph

38 Commits

Author SHA1 Message Date
ZhenYi
ec29673d14 chore: update dependencies and project config
Update Cargo dependencies and bump gingress actix-web version.
Adjust package.json scripts.
2026-05-11 17:05:39 +08:00
ZhenYi
bf7e6cf0a0 fix(app): initialize tracing immediately and fix replica timeout units
- Change init_tracing_subscriber defer=false so logs appear before DB
  connection fails (was deferred when OTEL enabled, producing no output)
- Fix replica connection pool timeouts from from_millis to from_secs
  (write connection was fixed earlier but replica was missed)
2026-05-11 00:43:36 +08:00
ZhenYi
7148c8fd39 feat(gingress): add Git UA routing and convert gingress to Helm templates
- Route requests with git/JGit User-Agent directly to gitserver backend
- Parse gingress.io/git-backend annotation (format: namespace/name:port)
- Convert static gingress YAML to Helm templates under deploy/templates/gingress/
- Add gingress config block to values.yaml (namespace, replicas, ports, resources)
2026-05-10 22:47:18 +08:00
ZhenYi
4e2a39a5c0 fix(workspace): resolve all cargo check warnings across workspace
Remove unused imports and add #[allow(dead_code)] annotations to
intentionally retained fields/methods. Also add deploy/.server.yaml
to .gitignore to prevent accidental credential exposure.
2026-05-10 21:56:08 +08:00
ZhenYi
ba2490dab4 feat(core): initialize project with access control and AI integration 2026-05-10 21:01:21 +08:00
ZhenYi
14f6e1e500 feat(core): initialize project with access control and AI integration
- Add gitignore and prettier configuration files for project scaffolding
- Implement room access control service with project member verification
- Create user access key management with CRUD operations and activity logging
- Add accordion UI component for frontend expandable sections
- Implement room AI configuration with list, upsert, and delete operations
- Add AI event types for agent join/leave/status change tracking
- Create streaming AI processing services for mode and react patterns
- Build room AI service with model detection and idempotency handling
- Integrate chat service orchestration for AI message processing
- Add typing indicators and stream cancellation for AI interactions
- Implement mention parsing and context extraction for AI agents
2026-05-03 06:04:31 +08:00
ZhenYi
2a9ec6d509 feat(tag): vectorize repo tags after hook sync with incremental embedding + FC tool
- HookWorker gains optional embed_service field
- Captures changed tag names during webhook dispatch, batch-embeds after completion
- HookService auto-inits EmbedService from config for standalone git-hook binary
- Adds agent dep to git crate (no circular dep)
- SSH/HTTP servers no longer call start_worker (dedicated git-hook handles it)
- git_tag_search FC tool for agent semantic tag search with project isolation
2026-04-28 13:04:10 +08:00
ZhenYi
65627a8662 fix(app): fix session key to use SHA-512 (64 bytes)
cookie::Key requires exactly 64 bytes, SHA-256 only produces 32 bytes
Change to SHA-512 and slice to 64 bytes for correct key length
2026-04-27 16:40:20 +08:00
ZhenYi
6a123170a1 fix: harden session key derivation from APP_SESSION_SECRET
- Reject secrets shorter than 32 bytes (fall back to generated key)
- Use SHA-256 hash instead of naive byte cycling to derive the key
  (cycling "password" to 64 bytes gave extremely low entropy)
2026-04-27 13:59:31 +08:00
ZhenYi
09645d8641 fix: resolve multiple bugs across backend and frontend
Security fixes:
- Remove WS token from plaintext log output (ws_universal.rs)
- Replace weak LCG PRNG with rand::thread_rng() for access key generation
- Add project membership check to issue triage endpoint (prevent unauthorized AI usage)
- Validate deepLinkUrl to prevent javascript: navigation (XSS defense-in-depth)

Data integrity fixes:
- Fix UUID truncation in AI model sync (as_u128() as i64 -> timestamp_millis)
- Wrap PR cascade delete in database transaction
- Add missing cascade deletes for room_message_reaction, room_message_edit_history, room_notifications
- Fix N+1 query for last_commit_times (single grouped query instead of per-repo)

Panic prevention:
- Replace unwrap() with safe fallbacks in health/metrics endpoints (email, git-hook apps)
- Replace unwrap() in access key scopes serialization
- Replace expect() in tool executor result map with synthetic error
- Replace expect() in log level parsing with default fallback

Logic bugs:
- Fix users_online metric double-decrement (decrement only when count reaches 0)
- Fix Map iteration + deletion bug in universal-ws.ts onclose handler
- Fix stale audioStream reference in catch block (use local stream variable)
- Add missing reInit event cleanup in carousel.tsx
- Fix email retry backoff integer overflow ((1 << i) as u64 -> 1u64 << i)

React fixes:
- Use message.id instead of index as key in message-list
- Add audio stream cleanup on unmount in use-audio-recording
2026-04-27 13:54:21 +08:00
ZhenYi
bdb5393835 fix: resolve 30+ bugs from security audit
Critical:
- CORS: replace allow_any_origin + credentials with env-configured origins
- XSS: escape HTML before dangerouslySetInnerHTML in search results
- Path traversal: sanitize storage keys to reject ".." components
- Auth missing: add Session requirement to git init/open/is-repo endpoints
- Transaction: wrap issue cascade delete in DB transaction

High:
- Mutex poisoning: replace unwrap() with poison-recovering guards
- Drop tokio::spawn: use runtime handle or fallback thread for lock release
- Redis KEYS: replace with non-blocking SCAN for typing events
- SSH panic: handle missing stdin/stdout/stderr gracefully
- LFS auth: remove x-user-uid header injection vector, generate per-request tokens

Medium:
- Memory leak: remove Box::leak in provider normalization
- Race conditions: query closed count directly instead of subtraction
- Silent failures: add tracing::warn for AI tasks, room events, activity logs
- Frontend nav: sync activeRoomId when initialRoomId prop changes
- Duplicate nav: remove redundant setActiveRoom in delete handler
- Callback conflict: skip undefined values in updateCallbacks merge
- Stale closure: use wsClient state instead of wsClientRef.current in useMemo

Low:
- Captcha: validate captcha not empty before login submission
- Broadcast capacity: reduce from 100K to 1000
- Error handling: add try/catch for removeMember and updateMemberRole
- Loading state: show placeholder instead of null in RepositoryContextProvider
- WebSocket: add heartbeat ping and jitter to reconnect backoff
2026-04-27 10:57:23 +08:00
ZhenYi
15483b4e95 chore(static): remove duplicate profile.release — already defined in workspace root 2026-04-26 16:41:24 +08:00
ZhenYi
0b5dc98ce5 refactor(db): simplify read-replica to single connection for CNPG
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
CNPG's cluster-ro service already handles load balancing and failover,
so the application-level Vec + random_range is redundant.

- db_read: Vec<DatabaseConnection> → Option<DatabaseConnection>
- database_read_replicas returns Option<String> instead of Vec<String>
- health checks now explicitly ping both writer() and reader()
- remove unused rand dependency from libs/db
2026-04-26 01:03:39 +08:00
ZhenYi
468007177f fix(hooks,email): add describe_counter! to pre-register metrics
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
Hook tasks and email metrics were missing from /metrics because
describe_counter! was never called before install_recorder(), so
unincremented counters were not exported. Room metrics appeared
because RoomMetrics::new() already described them.

- apps/git-hook: describe 8 hook_tasks_*/hook_sync_* counters
- apps/email: describe 8 email_* counters
- both: add metrics = "0.22" as direct dependency
2026-04-26 00:42:59 +08:00
ZhenYi
d593354ba9 feat: add sitemap index with static/users/projects/repos sub-sitemaps
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
- Main sitemap index at /sitemap.xml referencing 4 sub-sitemaps
- /sidemap/static: fixed routes (homepage, auth, marketing pages)
- /sidemap/users: public user profiles sorted alphabetically
- /sidemap/projects: public projects sorted alphabetically
- /sidemap/repos: public repos sorted alphabetically
- Redis cache with 8h TTL (no refresh on access), key: sidemap:{type}
- robots.txt Sitemap URL uses main_domain() with https:// forced
- All sitemap loc entries use https:// base URL
2026-04-26 00:06:18 +08:00
ZhenYi
da9e96f6dd feat: add /robots.txt blocking sensitive paths from crawlers
Disallows: /api/, /health, /metrics, /ws/, /avatar/, /blob/,
/media/, /static/, /assets/
2026-04-25 23:49:50 +08:00
ZhenYi
10836730ed feat: add health endpoints and Prometheus metrics to git-hook and email-worker
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
Health monitoring:
- gitserver: /health endpoint on port 8021 (DB + Redis ping)
- git-hook: hyper health server on port 8083 with /health
- email-worker: hyper health server on port 8084 with /health
- K8s probes updated to httpGet for all three services

Metrics (via /metrics endpoint):
- git-hook: hook_tasks_total/success/failed/locked/retried/exhausted,
  hook_sync_branches/tags_changed_total
- email: email_queued/consumed/sent/failed_total,
  email_validation_skipped/build_errors/send_attempts_total
2026-04-25 23:45:48 +08:00
ZhenYi
8b47f677bb fix(avatar): add upload API routes and fix URL path prefix
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
- Add /api/users/me/avatar and /api/projects/{name}/avatar multipart upload endpoints
- Fix avatar URL path: missing /avatar prefix (static.gitdata.ai/avatar/{file})
- Fix project avatar: Utc::now() → .timestamp(), missing extension, wrong return type
- Replace broken SkipNoisyPaths middleware with self-contained RequestLogger
  (actix-web 4.13 body type incompatibility with newer actix-http)
- Exclude /assets/* requests from main app logger
- Exclude /avatar/*, /blob/*, /media/*, /static/* from static server logger
- Fix TypingEvent missing sender_type field in ws_universal.rs and connection.rs
- Wire real fetch-based upload in user profile settings
- Add project avatar upload UI to project settings page
2026-04-25 23:19:22 +08:00
ZhenYi
b00d42ee8d chore(app): exclude health/metrics/WS from access logger output
Skip terminal access logs for noisy K8s probe and monitoring
endpoints: /health, /metrics, and /ws path prefix.
Applied to both the main app and static file server.
2026-04-25 22:51:21 +08:00
ZhenYi
91bebba45e fix(app): remove redundant Arc wrapper around PrometheusHandle
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
Actix-web's Data<T> extractor expects T to implement Clone,
so PrometheusHandle (which is Clone) does not need to be
wrapped in Arc. The Arc type mismatch caused /metrics to
return 500.
2026-04-25 21:09:32 +08:00
ZhenYi
0a9dfef9b4 chore: remove unused instance_id import in main.rs
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
2026-04-25 09:55:08 +08:00
ZhenYi
ae601774df chore(admin): remove all metrics/observability features
- Delete admin metrics dashboard page (admin/metrics/page.tsx)
- Delete LineChart component (used only by metrics)
- Remove "指标监控" nav link from Sidebar
- Remove getMetrics/exportMetricsCsv from admin-rpc.ts client
- Remove /api/admin/metrics and /api/admin/metrics/export HTTP routes
  from adminrpc (was also leaking metrics via HTTP)
- Remove metrics RPC methods (get_metrics, export_metrics_csv) from
  adminrpc gRPC server and their helper functions
- Remove spawn_redis_metrics_flusher from app/main.rs
- Remove redis_metrics module from observability crate
- Remove redis/deadpool-redis deps from observability Cargo.toml
2026-04-23 15:42:00 +08:00
ZhenYi
f125fb0c02 fix(adminrpc): pass otel_enabled as defer arg to avoid double-init
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
When OTLP is enabled, init_tracing_subscriber() must defer so that
init_otlp() is the sole caller of try_init(). Without this, the adminrpc
binary crashes with "global default trace dispatcher already set".
2026-04-22 23:47:15 +08:00
ZhenYi
acd7fe8f6c fix(email): pass defer argument to init_tracing_subscriber
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
2026-04-22 23:36:40 +08:00
ZhenYi
6310dfda2f fix(gitserver,git-hook): pass defer argument to init_tracing_subscriber
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
The init_tracing_subscriber() function now takes a second `defer: bool`
argument. These binaries do not use OTLP, so pass false.
2026-04-22 23:32:43 +08:00
ZhenYi
8defac98ad fix(observability): resolve tracing double-init runtime panic
Both init_tracing_subscriber() and init_otlp() were calling try_init()
on the global tracing dispatcher, causing "global default trace dispatcher
has already been set" at runtime when APP_OTEL_ENABLED=true.

Fix: simplify the API so init_tracing_subscriber() never installs the
subscriber — it either calls try_init() immediately (non-OTLP mode) or
returns without installing (OTLP mode, defer=true).  init_otlp() now
builds the complete subscriber stack (registry + env_filter + fmt_layer +
otel_layer) and calls try_init() once.

init_tracing_subscriber() signature: (level, defer) → ()
init_otlp() signature: (endpoint, service_name, _, log_level) → Result

The fmt layer is replicated inside init_otlp() for the OTLP path.
2026-04-22 23:28:56 +08:00
ZhenYi
f67c788cbe feat(gRPC): migrate admin RPC from Redis Pub/Sub to Tonic gRPC
- libs/rpc/admin: tonic-prost generated server + client wrappers
- apps/adminrpc: standalone binary with all 8 admin RPC methods
- Redis Pub/Sub JSON-RPC code removed from admin module
- libs/agent: add React agent loop for ReAct pattern
- proto/admin.proto: updated with list_workspace_sessions, is_user_online
2026-04-22 22:39:06 +08:00
ZhenYi
962bf0312d feat(observability): Phase 6 OTLP tracing + Prometheus metrics endpoint
OTLP tracing:
- libs/observability/otlp.rs: SdkTracerProvider via HTTP/proto OTLP exporter
- libs/observability/tracing_middleware.rs: Actix-web span with trace_id propagation
- libs/observability/tracing_fmt.rs: JSON fmt + registry.try_init for layered init
- libs/rpc: gRPC method spans via info_span
- libs/agent, libs/room, libs/service, libs/api: structured tracing throughout

Prometheus metrics:
- libs/observability/prometheus_exporter.rs: /metrics HTTP handler + metrics crate
- libs/observability/metrics_middleware.rs: HttpMetrics middleware + AtomicU64
- libs/observability/redis_metrics.rs: Redis counter poller via RedisMetrics
- libs/room/metrics.rs: RoomMetrics (connections, messages, presence counters)

Config env vars: APP_OTEL_ENABLED, APP_OTEL_ENDPOINT, APP_OTEL_SERVICE_NAME
2026-04-22 10:27:54 +08:00
ZhenYi
fbd228f17e feat(adminrpc): new standalone binary for admin gRPC service
Separate binary for Kubernetes internal admin RPC communication
(SessionAdmin service on port 9090). Includes:

- Redis cluster pool via session_manager
- OTLP tracing with env-driven configuration
- Tracing subscriber init (JSON to stderr)
- Graceful startup with connection verification
2026-04-21 23:06:11 +08:00
ZhenYi
4aaee59fa4 fix(app): PrometheusHandle must be Data-wrapped before Fn closure capture
PrometheusHandle was moved into the HttpServer Fn closure but Fn
closures require Clone (not FnOnce). Wrap in web::Data before
cloning into the closure.
2026-04-21 23:05:54 +08:00
ZhenYi
236aebe4ea refactor(apps): migrate app, gitserver, git-hook, email from slog to tracing
- apps/app: remove mod logging, replace init_tracing_subscriber() call,
  remove slog macros from main.rs, remove logging.rs
- apps/gitserver: remove slog usage from main.rs
- apps/git-hook: remove slog from main.rs
- apps/email: remove slog from main.rs
2026-04-21 22:30:01 +08:00
ZhenYi
81e6ee3d48 feat(observability): Phase 1-5 slog structured logging across platform
Phase 1: add libs/observability crate (build_logger, instance_id);
  remove duplicate logger init from 4 crates
Phase 2: Actix-web RequestLogger with trace_id; MetricsMiddleware + HttpMetrics
Phase 3: Git SSH handle.rs slog struct; HTTP handler Logger kv
Phase 4: AI client eprintln -> slog warn; billing ai_usage_recorded log
Phase 5: SessionManager slog; workspace alert slog 2.x syntax
2026-04-21 13:44:12 +08:00
ZhenYi
fb91f5a6c5 feat(admin): add admin panel with billing alerts and model sync
- Add libs/api/admin with admin API endpoints:
  sync models, workspace credit, billing alert check
- Add workspace_alert_config model and alert service
- Add Session::no_op() for background tasks without user context
- Add admin/ Next.js admin panel (AI models, billing, workspaces, audit)
- Start billing alert background task every 30 minutes
2026-04-19 20:48:59 +08:00
ZhenYi
3354055e6d fix(operator): mount /data PVC into git-hook deployment
GitHook controller was generating a Deployment without any persistent
storage — only a ConfigMap volume at /config. The worker needs /data to
access repo storage paths (APP_REPOS_ROOT defaults to /data/repos).

Changes:
- GitHookSpec: added storage_size field (default 10Gi), matching the
  pattern already used by GitServerSpec
- git_hook.rs reconcile(): now creates a PVC ({name}-data) before the
  Deployment, mounts it at /data, and sets APP_REPOS_ROOT=/data/repos
- git-hook-crd.yaml: synced storageSize field into the CRD schema
2026-04-17 14:15:38 +08:00
ZhenYi
7c042c7b9d fix(git-hook): use HookService instead of non-existent GitServiceHooks
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
GitServiceHooks was renamed to HookService in the hook module refactor.
Updated main.rs to:
- Use HookService::new() with correct parameters (no http client)
- Call start_worker() which returns CancellationToken
- Wait on cancel.cancelled() instead of double-awaiting ctrl_c
- Clone token before moving into signal handler task
2026-04-17 13:54:17 +08:00
ZhenYi
9368df54da feat(service): auto-sync OpenRouter models on app startup and every 10 minutes
- Add `start_sync_task()` in agent/sync.rs: spawns a background task
  that syncs immediately on app startup, then every 10 minutes.
- `sync_once()` performs a single pass; errors are logged and swallowed
  so the periodic task never stops.
- Remove authentication requirement from OpenRouter API (no API key needed).
- Call `service.start_sync_task()` from main.rs after AppService init.
- Also update the existing `sync_upstream_models` (HTTP API) to remove
  the now-unnecessary API key requirement for consistency.
2026-04-16 22:35:34 +08:00
ZhenYi
df976d16cb fix(operator): Time to_string and phase type mismatch
- Time does not implement Display, use .0 (inner DateTime<Utc>) and
  to_rfc3339() instead.
- phase is &str, convert to String to match JobStatusResult.phase.
2026-04-15 09:38:46 +08:00
ZhenYi
93cfff9738 init 2026-04-15 09:08:09 +08:00