Commit Graph

612 Commits

Author SHA1 Message Date
ZhenYi
0a272ed63a fix: start SSH rate limiter cleanup and fix ToolContext reset per tool call
- Start SSH rate limiter cleanup task that was missing (prevent memory leak)
- Create single ToolContext outside tool execution loop so max_tool_calls
  and max_depth guards actually fire across batch tool calls (was creating
  fresh context per call, bypassing all limits)
2026-04-27 13:57:47 +08:00
ZhenYi
09645d8641 fix: resolve multiple bugs across backend and frontend
Security fixes:
- Remove WS token from plaintext log output (ws_universal.rs)
- Replace weak LCG PRNG with rand::thread_rng() for access key generation
- Add project membership check to issue triage endpoint (prevent unauthorized AI usage)
- Validate deepLinkUrl to prevent javascript: navigation (XSS defense-in-depth)

Data integrity fixes:
- Fix UUID truncation in AI model sync (as_u128() as i64 -> timestamp_millis)
- Wrap PR cascade delete in database transaction
- Add missing cascade deletes for room_message_reaction, room_message_edit_history, room_notifications
- Fix N+1 query for last_commit_times (single grouped query instead of per-repo)

Panic prevention:
- Replace unwrap() with safe fallbacks in health/metrics endpoints (email, git-hook apps)
- Replace unwrap() in access key scopes serialization
- Replace expect() in tool executor result map with synthetic error
- Replace expect() in log level parsing with default fallback

Logic bugs:
- Fix users_online metric double-decrement (decrement only when count reaches 0)
- Fix Map iteration + deletion bug in universal-ws.ts onclose handler
- Fix stale audioStream reference in catch block (use local stream variable)
- Add missing reInit event cleanup in carousel.tsx
- Fix email retry backoff integer overflow ((1 << i) as u64 -> 1u64 << i)

React fixes:
- Use message.id instead of index as key in message-list
- Add audio stream cleanup on unmount in use-audio-recording
2026-04-27 13:54:21 +08:00
ZhenYi
f36f08e3c4 fix: remaining unwrap panics and new bugs discovered during audit
- email worker: replace Mailbox::parse().unwrap() with match to
  handle invalid recipient addresses gracefully
- metrics middleware: RwLock poison recovery on read/write locks
  to prevent panic on thread panic
- access key: SystemTime::now() unwrap_or_default instead of unwrap
  for clock-before-epoch edge case
- chpc: NaiveDateTime and_hms_opt unwrap_or MIN/MAX fallbacks
- push notification: second code path fixed for let-chain unwrap
- ai_streaming: constant UUID parse use expect() instead of unwrap
2026-04-27 11:30:01 +08:00
ZhenYi
df42af2ed0 fix: remaining push notification unwrap in second code path
- Fix second copy of push_subscription unwrap that was in a
  tokio::spawn block with different indentation
- Replace constant UUID parse unwrap with expect()
2026-04-27 11:23:48 +08:00
ZhenYi
68b70330b8 docs: mark all 38 bugs as resolved in audit report 2026-04-27 11:21:13 +08:00
ZhenYi
cce9d216b8 fix: resolve 4 remaining "design decision" bugs
- SSH rate limiter: wire SshRateLimiter into SSHServer with IP-based
  rate limiting on new_client connections
- Room startup: cap initial room load at 1000 via limit() to prevent
  resource exhaustion on large instances
- WS token exposure: only include token in URL for cross-origin
  connections; same-origin web clients authenticate via secure cookies
- CSRF: confirmed SameSite::Lax + Secure + HttpOnly are all set
  (session config defaults)
2026-04-27 11:20:38 +08:00
ZhenYi
763d47dc45 fix: silent AI billing failures — add tracing::warn for billing errors 2026-04-27 11:15:15 +08:00
ZhenYi
1e975c0837 fix: regex injection in message search + semaphore expect panic
- Escape regex special chars in highlightText to prevent ReDoS
- Replace semaphore.acquire().expect() with graceful skip
- Add toast error feedback for search failures
- Remove unsafe (resp.data as any) bypass
2026-04-27 11:12:26 +08:00
ZhenYi
2842a62d35 docs: update bug audit report with fix status 2026-04-27 11:02:57 +08:00
ZhenYi
e96bb29434 fix: additional bugs - push notification unwraps and as any cleanup
- Replace Option::unwrap() with let-chains for push subscription fields
- Remove unsafe (repo as any).branch_count access in settings
2026-04-27 11:01:59 +08:00
ZhenYi
bdb5393835 fix: resolve 30+ bugs from security audit
Critical:
- CORS: replace allow_any_origin + credentials with env-configured origins
- XSS: escape HTML before dangerouslySetInnerHTML in search results
- Path traversal: sanitize storage keys to reject ".." components
- Auth missing: add Session requirement to git init/open/is-repo endpoints
- Transaction: wrap issue cascade delete in DB transaction

High:
- Mutex poisoning: replace unwrap() with poison-recovering guards
- Drop tokio::spawn: use runtime handle or fallback thread for lock release
- Redis KEYS: replace with non-blocking SCAN for typing events
- SSH panic: handle missing stdin/stdout/stderr gracefully
- LFS auth: remove x-user-uid header injection vector, generate per-request tokens

Medium:
- Memory leak: remove Box::leak in provider normalization
- Race conditions: query closed count directly instead of subtraction
- Silent failures: add tracing::warn for AI tasks, room events, activity logs
- Frontend nav: sync activeRoomId when initialRoomId prop changes
- Duplicate nav: remove redundant setActiveRoom in delete handler
- Callback conflict: skip undefined values in updateCallbacks merge
- Stale closure: use wsClient state instead of wsClientRef.current in useMemo

Low:
- Captcha: validate captcha not empty before login submission
- Broadcast capacity: reduce from 100K to 1000
- Error handling: add try/catch for removeMember and updateMemberRole
- Loading state: show placeholder instead of null in RepositoryContextProvider
- WebSocket: add heartbeat ping and jitter to reconnect backoff
2026-04-27 10:57:23 +08:00
ZhenYi
0f441f5eb4 fix(docker): use ubuntu:24.04 base image for all runtime Dockerfiles
Some checks failed
CI / Rust Lint & Check (push) Has been cancelled
CI / Rust Tests (push) Has been cancelled
CI / Frontend Lint & Type Check (push) Has been cancelled
CI / Frontend Build (push) Has been cancelled
Resolves GLIBC_2.39 mismatch error — CI builds on ubuntu-latest
(24.04) which links against glibc 2.39, but debian:bookworm-slim
only provides glibc 2.36, causing binary execution failure.
2026-04-27 09:42:02 +08:00
ZhenYi
3f1f0d5e23 chore(service/git): minor fixes in service layer git operations
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
Small adjustments to commit, init, refs, star, and watch operations
in the service layer.
2026-04-27 08:28:27 +08:00
ZhenYi
64dc27161b chore(git): minor fixes and improvements across git library modules
Apply small fixes across multiple git ops files: handle errors, improve
type safety, and refine HTTP handler and SSH git operations.
2026-04-27 08:28:09 +08:00
ZhenYi
a26551343c fix(frontend): refresh WS token after connection failures and handle AI/repo events
Clear wsToken on auth-related close codes (3000-4999), connection
timeout, and after 3 consecutive reconnect failures so the next connect
attempt fetches a fresh token. Add onRoomAiUpdated and onRepoChanged
callbacks that re-fetch AI configs and repo list when pushed via WS.
Fix AI member list to never display raw UUID.
2026-04-26 23:59:07 +08:00
ZhenYi
c8eba28e7a feat(frontend): add repo type to mention autocomplete system
Add 'repo' to MentionType across all editor types, include repos in the
@ trigger pool, add repo badge (green chip), Repos section in the
mention dropdown, and MentionBadge styles. Wire projectRepos from
room context into IMEditor mentionItems.
2026-04-26 23:58:59 +08:00
ZhenYi
adbc0705db feat(room): inject repository details into AI system prompt on mention
When a user mentions a repository in room chat, extract the repo name
from @[repo:name:label] brackets, look up the full repo model from the
database, and inject its details (name, description, default branch,
visibility) into the AI message context. Works independently of
embed_service availability.
2026-04-26 23:58:52 +08:00
ZhenYi
d72019e39f feat(room): add WS events for AI config and repo lifecycle changes
Add RoomAiUpdated, RepoCreated, RepoUpdated, RepoDeleted event types.
Publish RoomAiUpdated after room_ai upsert/delete and repo events
after repo create/update. Always set model_name in AI list response
(fallback to "AI {uuid}" when model lookup fails) so frontend never
displays a raw UUID.
2026-04-26 23:58:33 +08:00
ZhenYi
283835eb26 fix(agent/sync): avoid double /v1/ prefix in model sync URL
When APP_AI_BASIC_URL already ends with /v1 (e.g. openrouter.ai/api/v1),
appending /v1/models produces /v1/v1/models. Detect trailing /v1 and
only append /models in that case.
2026-04-26 23:58:25 +08:00
ZhenYi
c7a8bc0458 refactor(fctool): extract tool modules into standalone fctool crate
Move git_tools, file_tools, and project_tools from libs/service into a
new libs/fctool crate with correct workspace dependencies. Fixes the
rev.len() >= 40 bug in all git tool resolve functions (OID check needs
exact 40-char hex, not just >= 40). Adds 4 new git blob tools
(blob_get, blob_exists, blob_content, blob_create). Fixes parameter
naming inconsistency in repos.rs and adds project_name to list_repos
output. Removes unused excel/pdf/ppt/word file tools.
2026-04-26 23:58:16 +08:00
ZhenYi
0e53f4a69f fix(room): fix two major memory leaks
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
1. WS disconnect now unsubscribes from user_notification_inner.
   Previously, every WebSocket connection created a broadcast channel
   for user notifications that was never removed on disconnect, causing
   unbounded growth proportional to unique connected users over time.

2. Room worker tasks now use the manager's room_shutdown_txs channel
   instead of a local broadcast channel. shutdown_room() sends on this
   channel, so when a room is deleted the worker task receives the signal
   and terminates, releasing its DashMap (capacity 10,000) and all
   captured closures. Previously the worker ran forever.
2026-04-26 16:52:20 +08:00
ZhenYi
15483b4e95 chore(static): remove duplicate profile.release — already defined in workspace root 2026-04-26 16:41:24 +08:00
ZhenYi
7d7103e271 feat(observability): use human-readable log format for terminals
Some checks are pending
CI / Frontend Build (push) Blocked by required conditions
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
When stdout is connected to a TTY, use tracing_subscriber's pretty
format with colors instead of single-line JSON. Non-TTY (container
logs, pipes) continue to output JSON for log aggregation.

Override auto-detection via APP_LOG_FORMAT=json|pretty.

Also adds APP_LOG_PRETTY=true to use serde_json::to_string_pretty
for human-readable JSON output (useful for development/debugging).
2026-04-26 16:39:03 +08:00
ZhenYi
ecf9f33b26 refactor(agent/sync): remove OpenRouter dependency, use upstream /v1/models directly
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
The upstream AI endpoint already returns complete model metadata:
- name, owned_by, context_length, max_output_tokens
- capabilities (vision, tool_call, reasoning)
- pricing (input, output, cache_read, cache_write, currency)

Remove the OpenRouter fallback entirely and parse the upstream
response directly for all sync operations. Both sync_upstream_models
(API) and sync_once (background task) now use a single unified path.

Changes:
- Remove OpenRouter types and fetch_openrouter_models()
- Add UpstreamModel / UpstreamCapabilities / UpstreamPricing types
- Parse capabilities from upstream instead of inferring from name
- Use real pricing from upstream instead of defaulting to 0.00
- Simplify sync flow: list → parse → upsert (no filtering/matching)
- Add provider normalizations for moonshot, zai, minimax, qwen
2026-04-26 16:30:41 +08:00
ZhenYi
a8e3b0f5a8 fix(agent/sync): handle multiple /v1/models response formats
The upstream AI endpoint returns an OpenAI-compatible format, but the
response body parsing was fragile. Make it resilient:
1. Try standard OpenAI format: { "data": [{id}, ...] }
2. Try raw array: [{id}, ...]
3. Try alternate format: { "models": [{id}, ...] }
4. Log actual response body (first 500 chars) when all formats fail

Also adds a warning log with the raw response on parse failure so
future debugging is straightforward.
2026-04-26 16:26:57 +08:00
ZhenYi
30713786bf revert(db): remove check_compatibility — method not available in sqlx 0.8
Some checks are pending
CI / Frontend Build (push) Blocked by required conditions
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
The check_compatibility(false) method was added in the previous commit
but does not exist in sqlx 0.8.x used by sea-orm 2.0. The warning
"Failed to obtain server version" is cosmetic and does not affect
functionality.
2026-04-26 15:49:51 +08:00
ZhenYi
8a23a22c9b fix(agent/sync): make OpenRouter fetch optional, fallback to direct sync
When OpenRouter's public /api/v1/models endpoint fails (network error,
timeout, parse failure), the entire sync was aborted — meaning models
accessible from the user's AI endpoint were never synced.

Now: if OpenRouter fetch fails, fall back to sync_models_direct for all
available models instead of returning an error. Both sync_upstream_models
(API) and sync_once (background task) have this fix.
2026-04-26 15:49:34 +08:00
ZhenYi
31ed420186 fix(db): disable sqlx check_compatibility for non-standard PostgreSQL servers
Cloud-managed PostgreSQL variants (PolarDB, CockroachDB, etc.) may
not return a standard version string, causing:
  "Failed to obtain server version. Unable to check client-server
   compatibility."

Setting check_compatibility(false) on both writer and reader
connections silences this harmless warning.
2026-04-26 15:36:13 +08:00
ZhenYi
638dfd7a6e feat(agent/sync): sync non-OpenRouter models from upstream endpoint
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
When upstream /v1/models returns models not yet in OpenRouter's catalog
(e.g. brand-new models like DeepSeek-V4), also upsert them through the
same pipeline (provider → model → version → pricing → capabilities →
parameter_profile) with inferred defaults, instead of silently dropping
them. Previously the direct-sync fallback only triggered when *zero*
OpenRouter matches existed.
2026-04-26 15:17:33 +08:00
ZhenYi
27cd4ea83c feat(admin/metrics): add Prometheus-compatible metrics endpoint and ops documentation
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
- Add /api/metrics/prometheus endpoint using prom-client (unauthenticated for scraping)
- Update middleware to allow unauthenticated access to prometheus endpoint
- Add /api/metrics permission routing (platform:read for GET)
- Install prom-client dependency
- Add metrics.md with Grafana dashboard JSON, Prometheus config, alerting rules
2026-04-26 14:49:25 +08:00
ZhenYi
fb27918285 feat(admin): remove daily report, add platform metrics endpoint
Remove daily report system (page, API routes, cron scheduler) as it is
no longer needed. Add /api/metrics endpoint exposing total and time-
windowed counts (27h, 7d, 30d) for users, workspaces, projects, repos,
rooms, and skills.

Also clean up dead code:
- Remove OpenRouter sync and alerts check routes
- Remove syncModels/checkAlerts from adminrpc client
- Remove unused adminRpcAvailable state from platform sessions page
- Fix handleEdit displayName comparison bug in platform users page
- Simplify pricing sync to create 0-price defaults
2026-04-26 14:44:21 +08:00
ZhenYi
660ffd6acb chore(api): remove entire admin module
Admin Next.js app handles all admin tasks directly via database access.
Only health check endpoint was remaining, not worth maintaining.
2026-04-26 14:08:15 +08:00
ZhenYi
8ea826e6ad chore(api): remove admin billing endpoint
Admin Next.js app handles billing directly via database access now.
2026-04-26 14:05:52 +08:00
ZhenYi
ef767297f7 chore(api): remove admin AI model CRUD routes
Admin Next.js app now handles DB access directly for provider/model/
version/pricing management. Keep only health, sync, alerts, and billing.
2026-04-26 14:04:01 +08:00
ZhenYi
99ebfc14a7 fix(frontend): scrollToIndex smooth option uses behavior property
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
TanStack Virtual uses 'behavior' for scroll animation, not 'smooth'.
2026-04-26 13:31:11 +08:00
ZhenYi
6eb65a5c65 feat(observability): inject _msg field for VictoriaLogs compatibility
Add MsgJsonFormat custom event formatter that outputs JSON with _msg as
the first field, required by VictoriaLogs for full-text search. HTTP
middleware stores interpolated "METHOD /path" in thread-local buffer
for the formatter to read on span-close events.
2026-04-26 13:31:05 +08:00
ZhenYi
07e74c230c feat: thinking_content column + first-project budget logic
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
- Add thinking_content column to room_message table
- Migration for thinking_content column
- ws-protocol update with streaming chunk types
- Billing: first project gets $10, first workspace gets $30
- Subsequent projects/workspaces get $0 budget
2026-04-26 13:11:06 +08:00
ZhenYi
0939aa240b fix(frontend): ordered chunk rendering + initial scroll-to-bottom
- OrderedStreamChunks renders think/answer interleaved per arrival order
- parseSavedChunks parses stored __chunks__ JSON on page refresh
- Tool call chunks hidden from frontend display
- Fix streaming join('') instead of join('\n') to avoid per-token newlines
- Fix MessageList scroll-to-bottom using virtualizer.scrollToIndex
- Remove unused streamingContent/streamingThinkingContent state
- Add retryable error patterns for HTTP connection issues
2026-04-26 13:10:51 +08:00
ZhenYi
f5e3da35b0 feat(room): store ordered streaming chunks + billing integration
- Save thinking_content as {"__chunks__": [{type, content}]} for replay
- Tool call sanitization — don't expose raw results to frontend
- Billing record_ai_usage integration
- Room service module refactoring into service/ directory
2026-04-26 13:10:42 +08:00
ZhenYi
b4b5538447 feat(agent): add ordered stream chunk collection + retry for HTTP errors
- StreamChunk/StreamChunkType types for preserving arrival order
- Chunk collection in call_stream_once and process_stream
- Add "error sending request" and "Http client error" to retryable errors
- StreamResult includes chunks vector for ordered replay
2026-04-26 13:10:26 +08:00
ZhenYi
0b5dc98ce5 refactor(db): simplify read-replica to single connection for CNPG
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
CNPG's cluster-ro service already handles load balancing and failover,
so the application-level Vec + random_range is redundant.

- db_read: Vec<DatabaseConnection> → Option<DatabaseConnection>
- database_read_replicas returns Option<String> instead of Vec<String>
- health checks now explicitly ping both writer() and reader()
- remove unused rand dependency from libs/db
2026-04-26 01:03:39 +08:00
ZhenYi
468007177f fix(hooks,email): add describe_counter! to pre-register metrics
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
Hook tasks and email metrics were missing from /metrics because
describe_counter! was never called before install_recorder(), so
unincremented counters were not exported. Room metrics appeared
because RoomMetrics::new() already described them.

- apps/git-hook: describe 8 hook_tasks_*/hook_sync_* counters
- apps/email: describe 8 email_* counters
- both: add metrics = "0.22" as direct dependency
2026-04-26 00:42:59 +08:00
ZhenYi
02b7a5beda feat(gitserver): add /robots.txt to disallow all crawlers
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
- Returns Disallow: / for all user-agents
- Points crawlers to main site sitemap via APP_GIT_HTTP_DOMAIN
2026-04-26 00:16:21 +08:00
ZhenYi
7eb9c5a7fb docs: update monitoring metrics document with static-server probes
and correct gitserver named-port configuration
2026-04-26 00:14:16 +08:00
ZhenYi
fd232354cc fix(gitserver): correct health probe port path in k8s template 2026-04-26 00:11:48 +08:00
ZhenYi
a4dd25304c docs: add monitoring metrics operations document
Covers: endpoints,指标列表, Prometheus scrape 配置,
K8s probe YAML, Alertmanager 告警规则示例
2026-04-26 00:10:45 +08:00
ZhenYi
d593354ba9 feat: add sitemap index with static/users/projects/repos sub-sitemaps
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
- Main sitemap index at /sitemap.xml referencing 4 sub-sitemaps
- /sidemap/static: fixed routes (homepage, auth, marketing pages)
- /sidemap/users: public user profiles sorted alphabetically
- /sidemap/projects: public projects sorted alphabetically
- /sidemap/repos: public repos sorted alphabetically
- Redis cache with 8h TTL (no refresh on access), key: sidemap:{type}
- robots.txt Sitemap URL uses main_domain() with https:// forced
- All sitemap loc entries use https:// base URL
2026-04-26 00:06:18 +08:00
ZhenYi
a8494cc032 chore(api): add sidemap module 2026-04-25 23:50:23 +08:00
ZhenYi
da9e96f6dd feat: add /robots.txt blocking sensitive paths from crawlers
Disallows: /api/, /health, /metrics, /ws/, /avatar/, /blob/,
/media/, /static/, /assets/
2026-04-25 23:49:50 +08:00
ZhenYi
10836730ed feat: add health endpoints and Prometheus metrics to git-hook and email-worker
Some checks are pending
CI / Rust Lint & Check (push) Waiting to run
CI / Rust Tests (push) Waiting to run
CI / Frontend Lint & Type Check (push) Waiting to run
CI / Frontend Build (push) Blocked by required conditions
Health monitoring:
- gitserver: /health endpoint on port 8021 (DB + Redis ping)
- git-hook: hyper health server on port 8083 with /health
- email-worker: hyper health server on port 8084 with /health
- K8s probes updated to httpGet for all three services

Metrics (via /metrics endpoint):
- git-hook: hook_tasks_total/success/failed/locked/retried/exhausted,
  hook_sync_branches/tags_changed_total
- email: email_queued/consumed/sent/failed_total,
  email_validation_skipped/build_errors/send_attempts_total
2026-04-25 23:45:48 +08:00