Commit Graph

18 Commits

Author SHA1 Message Date
ZhenYi
88dd3a5f61 fix: log silently dropped errors in compaction and SSH path handling
- Add tracing::warn! when conversation compaction fails (was let _ = e)
- Add tracing::debug! when SSH path canonicalize fails (was let _ = e)
2026-04-27 14:01:25 +08:00
ZhenYi
0a272ed63a fix: start SSH rate limiter cleanup and fix ToolContext reset per tool call
- Start SSH rate limiter cleanup task that was missing (prevent memory leak)
- Create single ToolContext outside tool execution loop so max_tool_calls
  and max_depth guards actually fire across batch tool calls (was creating
  fresh context per call, bypassing all limits)
2026-04-27 13:57:47 +08:00
ZhenYi
09645d8641 fix: resolve multiple bugs across backend and frontend
Security fixes:
- Remove WS token from plaintext log output (ws_universal.rs)
- Replace weak LCG PRNG with rand::thread_rng() for access key generation
- Add project membership check to issue triage endpoint (prevent unauthorized AI usage)
- Validate deepLinkUrl to prevent javascript: navigation (XSS defense-in-depth)

Data integrity fixes:
- Fix UUID truncation in AI model sync (as_u128() as i64 -> timestamp_millis)
- Wrap PR cascade delete in database transaction
- Add missing cascade deletes for room_message_reaction, room_message_edit_history, room_notifications
- Fix N+1 query for last_commit_times (single grouped query instead of per-repo)

Panic prevention:
- Replace unwrap() with safe fallbacks in health/metrics endpoints (email, git-hook apps)
- Replace unwrap() in access key scopes serialization
- Replace expect() in tool executor result map with synthetic error
- Replace expect() in log level parsing with default fallback

Logic bugs:
- Fix users_online metric double-decrement (decrement only when count reaches 0)
- Fix Map iteration + deletion bug in universal-ws.ts onclose handler
- Fix stale audioStream reference in catch block (use local stream variable)
- Add missing reInit event cleanup in carousel.tsx
- Fix email retry backoff integer overflow ((1 << i) as u64 -> 1u64 << i)

React fixes:
- Use message.id instead of index as key in message-list
- Add audio stream cleanup on unmount in use-audio-recording
2026-04-27 13:54:21 +08:00
ZhenYi
cce9d216b8 fix: resolve 4 remaining "design decision" bugs
- SSH rate limiter: wire SshRateLimiter into SSHServer with IP-based
  rate limiting on new_client connections
- Room startup: cap initial room load at 1000 via limit() to prevent
  resource exhaustion on large instances
- WS token exposure: only include token in URL for cross-origin
  connections; same-origin web clients authenticate via secure cookies
- CSRF: confirmed SameSite::Lax + Secure + HttpOnly are all set
  (session config defaults)
2026-04-27 11:20:38 +08:00
ZhenYi
bdb5393835 fix: resolve 30+ bugs from security audit
Critical:
- CORS: replace allow_any_origin + credentials with env-configured origins
- XSS: escape HTML before dangerouslySetInnerHTML in search results
- Path traversal: sanitize storage keys to reject ".." components
- Auth missing: add Session requirement to git init/open/is-repo endpoints
- Transaction: wrap issue cascade delete in DB transaction

High:
- Mutex poisoning: replace unwrap() with poison-recovering guards
- Drop tokio::spawn: use runtime handle or fallback thread for lock release
- Redis KEYS: replace with non-blocking SCAN for typing events
- SSH panic: handle missing stdin/stdout/stderr gracefully
- LFS auth: remove x-user-uid header injection vector, generate per-request tokens

Medium:
- Memory leak: remove Box::leak in provider normalization
- Race conditions: query closed count directly instead of subtraction
- Silent failures: add tracing::warn for AI tasks, room events, activity logs
- Frontend nav: sync activeRoomId when initialRoomId prop changes
- Duplicate nav: remove redundant setActiveRoom in delete handler
- Callback conflict: skip undefined values in updateCallbacks merge
- Stale closure: use wsClient state instead of wsClientRef.current in useMemo

Low:
- Captcha: validate captcha not empty before login submission
- Broadcast capacity: reduce from 100K to 1000
- Error handling: add try/catch for removeMember and updateMemberRole
- Loading state: show placeholder instead of null in RepositoryContextProvider
- WebSocket: add heartbeat ping and jitter to reconnect backoff
2026-04-27 10:57:23 +08:00
ZhenYi
64dc27161b chore(git): minor fixes and improvements across git library modules
Apply small fixes across multiple git ops files: handle errors, improve
type safety, and refine HTTP handler and SSH git operations.
2026-04-27 08:28:09 +08:00
ZhenYi
0c1a9ddf98 refactor(git): migrate libs/git from slog to tracing
- Remove all use slog::* imports and log: slog::Logger fields
- ssh/handle.rs: replace slog macro chains with tracing::{info!, warn!,
  error!, debug!}; remove log field from GitSshHandle
- ssh/authz.rs, ssh/mod.rs, ssh/server.rs: remove slog Logger fields
- http/: auth.rs, handler.rs, mod.rs, routes.rs: remove slog usage
- hook/: pool worker, sync modules, webhook_dispatch.rs: remove slog
2026-04-21 22:29:26 +08:00
ZhenYi
81e6ee3d48 feat(observability): Phase 1-5 slog structured logging across platform
Phase 1: add libs/observability crate (build_logger, instance_id);
  remove duplicate logger init from 4 crates
Phase 2: Actix-web RequestLogger with trace_id; MetricsMiddleware + HttpMetrics
Phase 3: Git SSH handle.rs slog struct; HTTP handler Logger kv
Phase 4: AI client eprintln -> slog warn; billing ai_usage_recorded log
Phase 5: SessionManager slog; workspace alert slog 2.x syntax
2026-04-21 13:44:12 +08:00
ZhenYi
ef61b193c4 fix(git/hook): refine Redis queue worker, remove dead code, fix warnings
- pool/mod.rs: pass shared http_client Arc to HookWorker
- worker.rs: remove double-locking (sync() manages its own lock),
  await all webhook handles before returning, share http_client,
  hoist namespace query out of loop
- redis.rs: atomic NAK via Lua script (LREM + LPUSH in one eval)
- sync/lock.rs: increase LOCK_TTL from 60s to 300s for large repos
- sync/mod.rs: split sync/sync_work, fsck_only/fsck_work, gc_only/gc_work
  so callers can choose locked vs lock-free path; run_gc + sync_skills
  outside the DB transaction
- hook/mod.rs: remove unused http field from HookService
- ssh/mod.rs, http/mod.rs: remove unused HookService/http imports
2026-04-17 13:05:07 +08:00
ZhenYi
8fb2436f22 feat(git): add Redis-backed hook worker with per-repo distributed locking
- pool/worker.rs: single-threaded consumer that BLMPOPs from Redis queues
  sequentially. K8s replicas provide HA — each pod runs one worker.
- pool/redis.rs: RedisConsumer with BLMOVE atomic dequeue, ACK/NAK, and
  retry-with-json support.
- pool/types.rs: HookTask, TaskType, PoolConfig (minimal — no pool metrics).
- sync/lock.rs: Redis SET NX EX per-repo lock to prevent concurrent workers
  from processing the same repo. Lock conflicts are handled by requeueing
  without incrementing retry count.
- hook/mod.rs: HookService.start_worker() spawns the background worker.
- ssh/mod.rs / http/mod.rs: ReceiveSyncService RPUSHes to Redis queue.
  Both run_http and run_ssh call start_worker() to launch the consumer.
- Lock conflicts (GitError::Locked) in the worker are requeued without
  incrementing retry_count so another worker can pick them up.
2026-04-17 12:33:58 +08:00
ZhenYi
eeb99bf628 refactor(git): drop hook pool, sync execution is now direct and sequential
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
- Remove entire pool/ directory (RedisConsumer, CpuMonitor, LogStream, HookTask, TaskType)
- Remove Redis distributed lock (acquire_lock/release_lock) — K8s StatefulSet
  scheduling guarantees exclusive access per repo shard
- Remove sync/lock.rs, sync/remote.rs, sync/status.rs (dead code)
- Remove hook/event.rs (GitHookEvent was never used)
- New HookService exposes sync_repo / fsck_repo / gc_repo directly
- ReceiveSyncService now calls HookService inline instead of LPUSH to Redis queue
- sync/mod.rs: git2 operations wrapped in spawn_blocking for Send safety
  (git2 types are not Send — async git2 operations must not cross await points)
- scripts/push.js: drop 'frontend' from docker push list (embedded into static binary)
2026-04-17 12:22:09 +08:00
ZhenYi
3de4fff11d feat(service): improve model sync and harden git HTTP/SSH stability
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
Model sync:
- Filter OpenRouter models by what the user's AI client can actually access,
  before upserting metadata (avoids bloating with inaccessible models).
- Fall back to direct endpoint sync when no OpenRouter metadata matches
  (handles Bailian/MiniMax and other non-OpenRouter providers).

Git stability fixes:
- SSH: add 5s timeout on stdin flush/shutdown in channel_eof and
  cleanup_channel to prevent blocking the event loop on unresponsive git.
- SSH: remove dbg!() calls from production code paths.
- HTTP auth: pass proper Logger to SshAuthService instead of discarding
  all auth events to slog::Discard.

Dependencies:
- reqwest: add native-tls feature for HTTPS on Windows/Linux/macOS.
2026-04-17 00:13:40 +08:00
ZhenYi
0a998affbb refactor(git): remove SSH rate limiting
SSH is deployed inside Kubernetes cluster where rate limiting
at the application layer is unnecessary. Remove all SSH rate
limiter code:
- SshRateLimiter from SSHandle and SSHServer structs
- is_user_allowed checks in auth_publickey, auth_publickey_offered
- is_repo_access_allowed in exec_request
- is_ip_allowed in server::new_client
- rate_limiter module and start_cleanup
2026-04-16 22:40:59 +08:00
ZhenYi
bbf2d75fba fix(git): harden hook pool retry, standardize slog log format
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
- Add retry_count to HookTask with serde(default) for backwards compat
- Limit hook task retries to MAX_RETRIES=5, discard after limit to prevent
  infinite requeue loops that caused 'task nack'd and requeued' log spam
- Add nak_with_retry() in RedisConsumer to requeue with incremented count
- Standardize all slog logs: replace "info!(l, "msg"; "k" => v)" shorthand
  with "info!(l, "{}", format!("msg k={}", v))" across ssh/authz.rs,
  ssh/handle.rs, ssh/server.rs, hook/webhook_dispatch.rs, hook/pool/mod.rs
2026-04-16 21:41:35 +08:00
ZhenYi
02847ef1db fix(git): downgrade russh to 0.50.4, remove flate2 feature, fix log format
- Downgrade russh from 0.55.0 to 0.50.4
- Remove unused flate2 feature from russh dependency
- Use info!(logger, "{}", format!(...)) for channel lifecycle log messages
2026-04-16 20:58:01 +08:00
ZhenYi
1090359951 fix(git): add SSH channel lifecycle logging and fix password auth username check
- Remove user=="git" restriction from auth_password: the actual user is
  determined by the token, not the SSH username, matching Gitea's approach
- Add channel_open_session logging with explicit flush to verify
  CHANNEL_OPEN_CONFIRMATION reaches the client
- Add pty_request handler (reject with log) so git clients that request
  a PTY are handled gracefully instead of falling through to default
- Add subsystem_request handler (log + accept) so git subsystems are
  visible in logs
- Prefix unused variables with _ to eliminate warnings
2026-04-16 20:40:17 +08:00
ZhenYi
cef4ff1289 fix(git): harden HTTP and SSH git transports for robustness
HTTP:
- Return Err(...) instead of Ok(HttpResponse::...) for error cases so
  actix returns correct HTTP status codes instead of 200
- Add 30s timeout on info_refs and handle_git_rpc git subprocess calls
- Add 1MB pre-PACK limit to prevent memory exhaustion on receive-pack
- Enforce branch protection rules (forbid push/force-push/deletion/tag)
- Simplify graceful shutdown (remove manual signal handling)

SSH:
- Fix build_git_command: use block match arms so chained .arg() calls
  are on the Command, not the match expression's () result
- Add MAX_RETRIES=5 to forward() data-pump loop to prevent infinite
  spin on persistent network failures
- Fall back to raw path if canonicalize() fails instead of panicking
- Add platform-specific git config paths (/dev/null on unix, NUL on win)
- Start rate limiter cleanup background task so HashMap doesn't grow
  unbounded over time
- Derive Clone on RateLimiter so SshRateLimiter::start_cleanup works
2026-04-16 20:11:18 +08:00
ZhenYi
93cfff9738 init 2026-04-15 09:08:09 +08:00