When a user mentions a repository in room chat, extract the repo name
from @[repo:name:label] brackets, look up the full repo model from the
database, and inject its details (name, description, default branch,
visibility) into the AI message context. Works independently of
embed_service availability.
Add RoomAiUpdated, RepoCreated, RepoUpdated, RepoDeleted event types.
Publish RoomAiUpdated after room_ai upsert/delete and repo events
after repo create/update. Always set model_name in AI list response
(fallback to "AI {uuid}" when model lookup fails) so frontend never
displays a raw UUID.
When APP_AI_BASIC_URL already ends with /v1 (e.g. openrouter.ai/api/v1),
appending /v1/models produces /v1/v1/models. Detect trailing /v1 and
only append /models in that case.
Move git_tools, file_tools, and project_tools from libs/service into a
new libs/fctool crate with correct workspace dependencies. Fixes the
rev.len() >= 40 bug in all git tool resolve functions (OID check needs
exact 40-char hex, not just >= 40). Adds 4 new git blob tools
(blob_get, blob_exists, blob_content, blob_create). Fixes parameter
naming inconsistency in repos.rs and adds project_name to list_repos
output. Removes unused excel/pdf/ppt/word file tools.
1. WS disconnect now unsubscribes from user_notification_inner.
Previously, every WebSocket connection created a broadcast channel
for user notifications that was never removed on disconnect, causing
unbounded growth proportional to unique connected users over time.
2. Room worker tasks now use the manager's room_shutdown_txs channel
instead of a local broadcast channel. shutdown_room() sends on this
channel, so when a room is deleted the worker task receives the signal
and terminates, releasing its DashMap (capacity 10,000) and all
captured closures. Previously the worker ran forever.
When stdout is connected to a TTY, use tracing_subscriber's pretty
format with colors instead of single-line JSON. Non-TTY (container
logs, pipes) continue to output JSON for log aggregation.
Override auto-detection via APP_LOG_FORMAT=json|pretty.
Also adds APP_LOG_PRETTY=true to use serde_json::to_string_pretty
for human-readable JSON output (useful for development/debugging).
The upstream AI endpoint already returns complete model metadata:
- name, owned_by, context_length, max_output_tokens
- capabilities (vision, tool_call, reasoning)
- pricing (input, output, cache_read, cache_write, currency)
Remove the OpenRouter fallback entirely and parse the upstream
response directly for all sync operations. Both sync_upstream_models
(API) and sync_once (background task) now use a single unified path.
Changes:
- Remove OpenRouter types and fetch_openrouter_models()
- Add UpstreamModel / UpstreamCapabilities / UpstreamPricing types
- Parse capabilities from upstream instead of inferring from name
- Use real pricing from upstream instead of defaulting to 0.00
- Simplify sync flow: list → parse → upsert (no filtering/matching)
- Add provider normalizations for moonshot, zai, minimax, qwen
The upstream AI endpoint returns an OpenAI-compatible format, but the
response body parsing was fragile. Make it resilient:
1. Try standard OpenAI format: { "data": [{id}, ...] }
2. Try raw array: [{id}, ...]
3. Try alternate format: { "models": [{id}, ...] }
4. Log actual response body (first 500 chars) when all formats fail
Also adds a warning log with the raw response on parse failure so
future debugging is straightforward.
The check_compatibility(false) method was added in the previous commit
but does not exist in sqlx 0.8.x used by sea-orm 2.0. The warning
"Failed to obtain server version" is cosmetic and does not affect
functionality.
When OpenRouter's public /api/v1/models endpoint fails (network error,
timeout, parse failure), the entire sync was aborted — meaning models
accessible from the user's AI endpoint were never synced.
Now: if OpenRouter fetch fails, fall back to sync_models_direct for all
available models instead of returning an error. Both sync_upstream_models
(API) and sync_once (background task) have this fix.
Cloud-managed PostgreSQL variants (PolarDB, CockroachDB, etc.) may
not return a standard version string, causing:
"Failed to obtain server version. Unable to check client-server
compatibility."
Setting check_compatibility(false) on both writer and reader
connections silences this harmless warning.
When upstream /v1/models returns models not yet in OpenRouter's catalog
(e.g. brand-new models like DeepSeek-V4), also upsert them through the
same pipeline (provider → model → version → pricing → capabilities →
parameter_profile) with inferred defaults, instead of silently dropping
them. Previously the direct-sync fallback only triggered when *zero*
OpenRouter matches existed.
Remove daily report system (page, API routes, cron scheduler) as it is
no longer needed. Add /api/metrics endpoint exposing total and time-
windowed counts (27h, 7d, 30d) for users, workspaces, projects, repos,
rooms, and skills.
Also clean up dead code:
- Remove OpenRouter sync and alerts check routes
- Remove syncModels/checkAlerts from adminrpc client
- Remove unused adminRpcAvailable state from platform sessions page
- Fix handleEdit displayName comparison bug in platform users page
- Simplify pricing sync to create 0-price defaults
Add MsgJsonFormat custom event formatter that outputs JSON with _msg as
the first field, required by VictoriaLogs for full-text search. HTTP
middleware stores interpolated "METHOD /path" in thread-local buffer
for the formatter to read on span-close events.
- Save thinking_content as {"__chunks__": [{type, content}]} for replay
- Tool call sanitization — don't expose raw results to frontend
- Billing record_ai_usage integration
- Room service module refactoring into service/ directory
- StreamChunk/StreamChunkType types for preserving arrival order
- Chunk collection in call_stream_once and process_stream
- Add "error sending request" and "Http client error" to retryable errors
- StreamResult includes chunks vector for ordered replay
CNPG's cluster-ro service already handles load balancing and failover,
so the application-level Vec + random_range is redundant.
- db_read: Vec<DatabaseConnection> → Option<DatabaseConnection>
- database_read_replicas returns Option<String> instead of Vec<String>
- health checks now explicitly ping both writer() and reader()
- remove unused rand dependency from libs/db
- Main sitemap index at /sitemap.xml referencing 4 sub-sitemaps
- /sidemap/static: fixed routes (homepage, auth, marketing pages)
- /sidemap/users: public user profiles sorted alphabetically
- /sidemap/projects: public projects sorted alphabetically
- /sidemap/repos: public repos sorted alphabetically
- Redis cache with 8h TTL (no refresh on access), key: sidemap:{type}
- robots.txt Sitemap URL uses main_domain() with https:// forced
- All sitemap loc entries use https:// base URL
Health monitoring:
- gitserver: /health endpoint on port 8021 (DB + Redis ping)
- git-hook: hyper health server on port 8083 with /health
- email-worker: hyper health server on port 8084 with /health
- K8s probes updated to httpGet for all three services
Metrics (via /metrics endpoint):
- git-hook: hook_tasks_total/success/failed/locked/retried/exhausted,
hook_sync_branches/tags_changed_total
- email: email_queued/consumed/sent/failed_total,
email_validation_skipped/build_errors/send_attempts_total
- Add sender_type field to TypingEvent (user/ai)
- Change Redis TTL from 10s to 60s for AI typing persistence
- Broadcast typing.start/stop with sender_type=ai when AI stream starts/ends
- Replay active AI typing events from Redis on new WS subscribe
- Fix ai.stream_chunk WS payload missing display_name and chunk_type
- Add initial thinking chunk on AI stream start for immediate indicator
- service now delegates model/provider/pricing logic to agent crate
- ChatService built at startup with EmbedService (graceful degradation)
- RoomService wired with EmbedService for Qdrant embedding
- Add error types for embedding service
- message.rs: after creating a text message, spawn async task to
embed and upsert into Qdrant collection "room:{project}:{room_id}"
- service.rs: RoomService now takes optional EmbedService, embed on
every message creation
- build.rs: generate .gz and .br compressed variants at build time,
embed both into the FRONTEND constant
- dist.rs: content-type for SPA fallback uses index.html path explicitly,
explicit extension mapping for Vite content-hashed filenames
- frontend lib: get_frontend_asset_compressed returns (data, encoding, etag)
CommitReflogEntry now includes offset_minutes field, populated from
sig.when().offset_minutes() — matches git's internal timezone offset
representation. Used by git_tools for accurate reflog timestamps.
RoomMessageEvent was losing the AI model name because the
From<RoomMessageEnvelope> impl hardcoded display_name: None.
Add display_name to RoomMessageEnvelope and propagate it through
all AI streaming code paths (chat, ReAct, non-streaming).
Member messages keep display_name: None.
- ReAct streaming: collect all ReactStep chunks into reasoning_buffer;
if no Answer step is emitted, persist the full reasoning chain instead
of empty content
- All AI error paths (reasoning loop failure, non-streaming errors) now
send user-visible [AI error: ...] messages instead of silently dropping
- Fix borrow checker: clone content before struct init, use should_log bool
to avoid double-borrow on err_msg
- Streaming path: on tool_call execution error, emit an [Observation]
chunk so the model sees the failure and can retry/adapt
- Non-streaming path: inject error as a user message so the loop
continues with error context, not silently stop
ReAct loop was terminating early when the model returned:
[Agent ran through N steps...]
{"thought": "...", "action": {...}}
The extract_json function only checked the string start or code fences.
Now scans for { or [ at non-word positions and uses depth-counting
to strip trailing text, allowing JSON buried anywhere in the response.
service.rs: Replace per-message Lua+DB seq with simple INCR, only
reconcile DB every 1000 messages (99.9% queries eliminated).
storage.rs: Replace N+1 GET loop with single MGET for both
get_user_sessions and get_workspace_sessions (N+1 → 2 roundtrips).
TypingStart/TypingStop actions are intercepted in ws_universal.rs so
this match arm is never reached, but we need a safe fallback instead
of std::hint::unreachable_unchecked().
RoomConnectionManager now holds a cache field and typing_inner broadcast
map. broadcast_typing() persists start/stop to Redis (SETEX 10s / DEL)
and broadcasts via tokio channel. ws_universal.rs handles TypingStart/
TypingStop actions and streams typing events to WS clients.
Add TypingEvent struct in queue::types for broadcast-based typing
indicators, and TypingStart/TypingStop variants in RoomEventType for
WebSocket event dispatch.
- Delete admin metrics dashboard page (admin/metrics/page.tsx)
- Delete LineChart component (used only by metrics)
- Remove "指标监控" nav link from Sidebar
- Remove getMetrics/exportMetricsCsv from admin-rpc.ts client
- Remove /api/admin/metrics and /api/admin/metrics/export HTTP routes
from adminrpc (was also leaking metrics via HTTP)
- Remove metrics RPC methods (get_metrics, export_metrics_csv) from
adminrpc gRPC server and their helper functions
- Remove spawn_redis_metrics_flusher from app/main.rs
- Remove redis_metrics module from observability crate
- Remove redis/deadpool-redis deps from observability Cargo.toml
Both init_tracing_subscriber() and init_otlp() were calling try_init()
on the global tracing dispatcher, causing "global default trace dispatcher
has already been set" at runtime when APP_OTEL_ENABLED=true.
Fix: simplify the API so init_tracing_subscriber() never installs the
subscriber — it either calls try_init() immediately (non-OTLP mode) or
returns without installing (OTLP mode, defer=true). init_otlp() now
builds the complete subscriber stack (registry + env_filter + fmt_layer +
otel_layer) and calls try_init() once.
init_tracing_subscriber() signature: (level, defer) → ()
init_otlp() signature: (endpoint, service_name, _, log_level) → Result
The fmt layer is replicated inside init_otlp() for the OTLP path.
Users who accepted a project invitation could not see that project
on their /user/{username} page because get_user_projects only queried
projects where created_by == user_uid, ignoring project_members entries.
Now unions created_projects and member_projects with privacy filtering.
Column::Project was used twice instead of Column::User in the query,
causing the filter to be project_id = X AND project_id = user_uid
which never matches.
- session_manager/manager.rs: remove slog::Logger field, update new()
and with_config() to remove log parameter
- email/lib.rs: remove slog::Logger from AppEmail::init()
- rpc/admin/server.rs: remove slog::Logger from serve() and spawn(),
replace with tracing::info!/error!
- Remove all use slog::* imports and log: slog::Logger fields
- ssh/handle.rs: replace slog macro chains with tracing::{info!, warn!,
error!, debug!}; remove log field from GitSshHandle
- ssh/authz.rs, ssh/mod.rs, ssh/server.rs: remove slog Logger fields
- http/: auth.rs, handler.rs, mod.rs, routes.rs: remove slog usage
- hook/: pool worker, sync modules, webhook_dispatch.rs: remove slog
- Remove all use slog::* imports and log: slog::Logger fields
- Replace slog macros with tracing::{info!, warn!, error!, debug!}
- metrics.rs: upgrade metrics 0.21→0.22, remove register_*! macros,
use functional API: metrics::gauge!(), metrics::counter!(),
metrics::histogram!(), metrics::describe_gauge!() etc.
- RoomMetrics: all fields now use functional metrics API, dynamic
room_id labels passed as owned String to avoid lifetime issues
- RoomService: remove pub log: slog::Logger field
- connection.rs: remove log from subscribe_room_events,
subscribe_project_room_events, subscribe_task_events_fn
- room-context: dedup by id not seq (streaming seq=0); single atomic
setStreamingContent with delta detection; preserve reactions from WS
- MessageBubble: fix avatar lookup (members before IIFE); handleReaction
deps (no message.reactions); add reactions to wsMessageToUiMessage
- MessageInput: memoize mentionItems; fix upload path with VITE_API_BASE_URL
- IMEditor: warn on upload failure instead of silent swallow
- RoomSettingsPanel: sync form on room switch; loadModels before useEffect
- DiscordChatPanel: extract inline callbacks to useCallback stable refs
- Rewrite DiscordChannelSidebar with @dnd-kit drag-and-drop:
rooms are sortable within categories; dragging onto a different
category header assigns the room to that category
- Add inline 'Add Category' button: Enter/Esc to confirm/cancel
- Wire category create/move handlers in room.tsx via RoomContext
- Fix onAiStreamChunk to accumulate content properly and avoid
redundant re-renders during AI streaming (dedup guard)
- No backend changes needed: category CRUD and room category update
endpoints were already wired
Converts AppEmail from blocking sync sends to a background worker via
mpsc channel, adds SMTP pool tuning (min_idle 5, max_size 100), and
3-retry backoff on send failures.
The login function calls auth_2fa_status before set_user(user.uid), so
context.user() returns None and causes Unauthorized error on subsequent
logins after logout. Extracts auth_2fa_status_by_uid as an internal
helper accepting a Uuid, preserving the context-based wrapper for API
endpoints that require an authenticated user.
Add the second half of the password reset flow: /password/confirm API
endpoint with token validation, transactional password update, and a
frontend confirm-password-reset-page with proper error handling for
expired/used tokens. Updates generated SDK/client bindings.
Add AI-accessible tools for reading structured files (CSV, JSON/JSONC,
Markdown, SQL) and searching repository content (git_grep). Also adds
git_blob_get to retrieve raw blob text content with binary detection.
Deferred: Excel, Word, PDF, PPT modules are stubbed out due to library
API incompatibilities (calamine 0.26, lopdf 0.34, quick-xml 0.37).
InvitationResponse was missing project_name and invited_by_username fields,
causing /invitations accept to redirect to /project/undefined.
Now populated via async from_model() with batch DB lookups.
- Refactor room-context.tsx with improved WebSocket state management
- Enhance room-ws-client.ts with reconnect logic and message handling
- Update Discord layout components with message editor improvements
- Add WebSocket universal endpoint support in ws_universal.rs
- Add workspace_my_pending_invitations() for listing pending invites
- Add workspace_accept_invitation_by_slug() to accept by slug without token
- Register new routes: GET /workspaces/me/invitations, POST /workspaces/invitations/accept-by-slug
Frontend:
- Add Discord-style 3-column layout (server icons / channel sidebar / chat)
- AI Studio design system: new CSS token palette (--room-* vars)
- Replace all hardcoded Discord colors with CSS variable tokens
- Add RoomSettingsPanel (name, visibility, AI model management)
- Settings + Member list panels mutually exclusive (don't overlap)
- AI models shown at top of member list with green accent
- Fix TS errors: TipTap SuggestionOptions, unused imports, StarterKit options
- Remove MentionInput, MentionPopover, old room components (废弃代码清理)
Backend:
- RoomAiResponse returns model_name from agents.model JOIN
- room_ai_list and room_ai_upsert fetch model name for each config
- AiConfigData ws-protocol interface updated with model_name
Note: RoomSettingsPanel UI still uses shadcn defaults (未完全迁移到AI Studio)
Backend:
- Add MENTION_TAG_RE matching new `<mention type="..." id="...">label</mention>` format
- Extend extract_mentions() and resolve_mentions() to parse new format (legacy backward-compatible)
Frontend:
- New src/lib/mention-ast.ts: AST types (TextNode, MentionNode, AiActionNode),
parse() and serialize() functions for AST↔HTML conversion
- MentionPopover: load @repository: from project repos, @ai: from room_ai configs
(not room members); output new HTML format with ID instead of label
- MessageMentions: use AST parse() for rendering (falls back to legacy parser)
- ChatInputArea: insertMention now produces `<mention type="user" id="...">label</mention>`
- RoomParticipantsPanel: onMention passes member UUID to insertMention
- RoomContext: add projectRepos and roomAiConfigs for mention data sources
Create migration m20260417_000001_add_stream_to_room_ai that adds
the `stream BOOLEAN NOT NULL DEFAULT true` column to room_ai.
The model definition includes this field but the original migration
was missing it.
Also format libs/api/room/ws.rs and add gitdata.ai to allowed
WS origins.
- Fix ReactionGroup.count: i64 -> i32 and users: Vec<Uuid> -> Vec<String>
to match frontend ReactionItem (count: number, users: string[]).
Mismatched types caused the WS reaction update to silently fail.
Also update ReactionItem in api/ws_types.rs to match.
- Add activeRoomIdRef guard in onRoomReactionUpdated to prevent stale
room state from processing outdated events after room switch.
- Switch from prev.map() to targeted findIndex+spread in onRoomReactionUpdated
to avoid unnecessary array recreation.
Backend:
- Add reaction_batch handler: GET /api/rooms/{room_id}/messages/reactions/batch?message_ids=...
- Register route in libs/api/room/mod.rs
- Backend already had message_reactions_batch service method, just needed HTTP exposure
Frontend:
- Add ReactionListData import to room-context.tsx
- Fix thisLoadReactions client type (was using broken NonNullable<ReturnType<>>)
- Remove unused oldRoomId variable
- Delete unused useRoomWs.ts hook (RoomWsClient has no on/off methods)
- Remove unused EmojiPicker function and old manual overlay picker from RoomMessageBubble
- Remove unused savedToken variable in room-ws-client
Frontend:
- P0: Replace constant estimateSize(40px) with content-based estimation
using line count and reply presence for accurate virtual list scroll
- P1: Replace Shadow DOM custom elements with styled spans for @mentions,
eliminating expensive attachShadow calls per mention instance
- P1: Remove per-message ResizeObserver (one per bubble), replace with
static inline toolbar layout to avoid observer overhead
- P2: Fix WS token re-fetch on every room switch by preserving token
across navigation and not clearing activeRoomIdRef on cleanup
Backend:
- P1: Fix reaction check+insert race condition by moving into transaction
instead of separate query + on-conflict insert
- P2: Fix N+1 queries in get_mention_notifications with batch fetch
for users and rooms using IN clauses
- P2: Update room_last_activity in broadcast_stream_chunk to prevent
idle room cleanup during active AI streaming
- P3: Use enum comparison instead of to_string() in room_member_leave
- reaction.rs: query before insert to detect new vs duplicate reactions,
only publish Redis event when a reaction was actually added
- room.rs: delete Redis seq key on room deletion to prevent seq
collision on re-creation
- message.rs: use Redis-atomic next_room_message_seq_internal for
concurrent safety; look up sender display name once for both
mention notifications and response body; add warn log when
should_ai_respond fails instead of silent unwrap_or(false)
- ws_universal.rs: re-check room access permission when re-subscribing
dead streams after error to prevent revoked permissions being bypassed
- RoomChatPanel.tsx: truncate reply preview content to 80 chars
- RoomMessageList.tsx: remove redundant inline style on message row div
- Add `check_room_access` before `manager.subscribe()` in ws_universal
to prevent unauthorized room subscription (security fix)
- Fix busy-wait in `poll_push_streams`: sleep 50ms when streams are
empty, yield only when there are active streams
- Re-subscribe dead rooms after stream errors so events are not
permanently lost until manual reconnect
- Fix streaming message placeholder using fake content as room_id:
use chunk.room_id from backend instead
- Show toast error on history load failures instead of silent fallback
- git_commit_graph: use default_branch when rev is None
- git_commit_graph_react: use default_branch when rev is None
- git_commit_reflog: fall back to default_branch when HEAD is detached
Fixes errors:
- reference 'refs/heads/master' not found (graph-react endpoint)
- HEAD has no name (reflog endpoint when HEAD is detached)
1. branch_list: Fix is_head comparison
- head_name now keeps full ref (e.g., "refs/heads/main")
- Previously stripped to short name causing is_head to always be false
2. branch_get/branch_exists: Fix name resolution for branches with '/'
- Previously, "feature/testing" was assumed to be a remote branch
- Now tries both refs/heads/ and refs/remotes/ candidates
- Correctly handles local branches like "feature/testing"
3. sync_refs: Fix upstream reference format
- Was storing "refs/remotes/{}/branch" (broken pattern)
- Now properly queries git2 for actual upstream branch name
Move all specific branch routes before /branches/{name} to prevent
route shadowing. Previously, routes like /branches/rename, /branches/move,
/branches/upstream, /branches/diff, etc. were shadowed by /branches/{name}.
Move specific routes (/commits/reflog, /commits/branches, /commits/tags)
before parameterized routes (/commits/{oid}) to avoid route shadowing.
Previously, /commits/reflog was matched by /commits/{oid} with oid="reflog",
causing InvalidOid("reflog") errors.
Also fixes other potential route shadowing issues in commit routes.
The database stores short branch names (e.g., "main"), but git2's push_ref()
requires full reference names (e.g., "refs/heads/main"). This fixes all
service-layer endpoints to convert default_branch to the full ref format.
Fixed endpoints:
- git_readme: convert to refs/heads/{branch}
- git_commit_count: convert to refs/heads/{branch}
- git_contributors: convert to refs/heads/{branch}
- git_commit_log: convert to refs/heads/{branch}
- git_commit_walk: convert to refs/heads/{branch}
Resolves errors:
- Internal("the given reference name 'main' is not valid")
- Add #[serde(default)] to MergeAnalysisQuery.their_oid and MergeRefAnalysisQuery fields
since they come from path parameters, not query strings
- git_readme/git_commit_count/git_contributors: default to repo.default_branch
instead of HEAD to avoid errors when HEAD points to refs/heads/master but
the actual default branch is main
sync_tags held git2 types (StringArray, Repository) across .await points
on sea-orm DB operations. Applied the same pattern as sync_refs:
- Added collect_tag_refs() sync helper that collects all tag metadata
(name, oid, description, tagger) into owned TagTip structs.
- sync_tags now calls collect_tag_refs() before any .await, so no git2
types cross the async boundary.
- Removed #[allow(dead_code)] from TagTip now that it's consumed.
- pool/worker.rs: only dispatch webhooks after sync succeeds (skip on error);
pass max_retries from PoolConfig to HookWorker
- sync/branch.rs: replaced with stub (sync_refs moved to commit.rs)
- sync/commit.rs: add collect_git_refs() that collects BranchTip/TagTip from
git2 entirely within one sync call; sync_refs now uses owned data so no
git2 types cross .await boundaries (future is Send)
- sync/fsck.rs: remove extraneous "HEAD" arg from rollback git update-ref
(correct syntax is: update-ref -m <msg> <ref> <new-sha> — no old-sha)
- webhook_dispatch.rs: touch_count uses Expr::col().add(1) for atomic
increment instead of overwriting with Expr::value(1)
- pool/mod.rs: pass config.redis_max_retries to HookWorker
- sync/mod.rs: wrap scan_skills_from_dir in spawn_blocking to avoid
blocking the async executor; use to_path_buf() to get owned PathBuf
- pool/worker.rs: replace 500ms poll sleep with cancellation-is_cancelled
check (eliminates artificial latency); add exponential backoff on Redis
errors (1s base, 32s cap, reset on success)
- pool/redis.rs: add 5s timeout on pool.get() for all three methods
(next, ack_raw, nak_with_retry) to prevent indefinite blocking on
unresponsive Redis
- sync/gc.rs: add comment explaining why git gc --auto non-zero exit
is benign
- webhook_dispatch.rs: remove unnecessary format! wrappers in slog macros
- config/hook.rs: document max_concurrent intent (K8s operator/HPA, not
the single-threaded worker itself)
- pool/mod.rs: pass shared http_client Arc to HookWorker
- worker.rs: remove double-locking (sync() manages its own lock),
await all webhook handles before returning, share http_client,
hoist namespace query out of loop
- redis.rs: atomic NAK via Lua script (LREM + LPUSH in one eval)
- sync/lock.rs: increase LOCK_TTL from 60s to 300s for large repos
- sync/mod.rs: split sync/sync_work, fsck_only/fsck_work, gc_only/gc_work
so callers can choose locked vs lock-free path; run_gc + sync_skills
outside the DB transaction
- hook/mod.rs: remove unused http field from HookService
- ssh/mod.rs, http/mod.rs: remove unused HookService/http imports
- pool/worker.rs: single-threaded consumer that BLMPOPs from Redis queues
sequentially. K8s replicas provide HA — each pod runs one worker.
- pool/redis.rs: RedisConsumer with BLMOVE atomic dequeue, ACK/NAK, and
retry-with-json support.
- pool/types.rs: HookTask, TaskType, PoolConfig (minimal — no pool metrics).
- sync/lock.rs: Redis SET NX EX per-repo lock to prevent concurrent workers
from processing the same repo. Lock conflicts are handled by requeueing
without incrementing retry count.
- hook/mod.rs: HookService.start_worker() spawns the background worker.
- ssh/mod.rs / http/mod.rs: ReceiveSyncService RPUSHes to Redis queue.
Both run_http and run_ssh call start_worker() to launch the consumer.
- Lock conflicts (GitError::Locked) in the worker are requeued without
incrementing retry_count so another worker can pick them up.
Model sync:
- Filter OpenRouter models by what the user's AI client can actually access,
before upserting metadata (avoids bloating with inaccessible models).
- Fall back to direct endpoint sync when no OpenRouter metadata matches
(handles Bailian/MiniMax and other non-OpenRouter providers).
Git stability fixes:
- SSH: add 5s timeout on stdin flush/shutdown in channel_eof and
cleanup_channel to prevent blocking the event loop on unresponsive git.
- SSH: remove dbg!() calls from production code paths.
- HTTP auth: pass proper Logger to SshAuthService instead of discarding
all auth events to slog::Discard.
Dependencies:
- reqwest: add native-tls feature for HTTPS on Windows/Linux/macOS.
All log calls in sync.rs now use slog macros:
- sync_once: uses logger passed as parameter (sourced from AppService.logs)
- sync_upstream_models (HTTP API): uses self.logs
- Remove use of tracing::warn/info/error entirely
The periodic background sync task and the HTTP API handler now
write to the same slog logger as the rest of the application.
Drop all hard-coded model-name lookup tables that hardcoded specific
model names and prices:
- infer_context_length: remove GPT-4o/o1/Claude/etc. fallback table
- infer_max_output: remove GPT-4o/o1/etc. output token limits
- infer_pricing_fallback: remove entire hardcoded pricing table
- infer_capability_list: derive from architecture.modality only,
no longer uses model name strings
Also fix stats: if upsert_version fails, skip counting and continue
rather than counting model but not version (which caused
versions_created=0 while pricing_created>0 inconsistency).
SSH is deployed inside Kubernetes cluster where rate limiting
at the application layer is unnecessary. Remove all SSH rate
limiter code:
- SshRateLimiter from SSHandle and SSHServer structs
- is_user_allowed checks in auth_publickey, auth_publickey_offered
- is_repo_access_allowed in exec_request
- is_ip_allowed in server::new_client
- rate_limiter module and start_cleanup
- Add `start_sync_task()` in agent/sync.rs: spawns a background task
that syncs immediately on app startup, then every 10 minutes.
- `sync_once()` performs a single pass; errors are logged and swallowed
so the periodic task never stops.
- Remove authentication requirement from OpenRouter API (no API key needed).
- Call `service.start_sync_task()` from main.rs after AppService init.
- Also update the existing `sync_upstream_models` (HTTP API) to remove
the now-unnecessary API key requirement for consistency.
Fail fast if repo storage_path does not exist on disk, before
blocking the thread pool with spawn_blocking. This prevents
invalid tasks from consuming blocking threads while failing.
- Fix match pattern: `Ok(Ok(Err(_)))` must be treated as NOT a panic,
since execute_task_body returns Err(()) on task failure (not a panic).
Previously the Err path incorrectly set panicked=true, which could
cause pool to appear unhealthy.
- Add "task failed" log before retry/discard decision so real error
is visible in logs (previously only last_error on exhausted-retries
path was logged, which required hitting 5 retries to see the cause).
- Convert 3 remaining pool/mod.rs shorthand logs to format!() pattern.
- Add retry_count to HookTask with serde(default) for backwards compat
- Limit hook task retries to MAX_RETRIES=5, discard after limit to prevent
infinite requeue loops that caused 'task nack'd and requeued' log spam
- Add nak_with_retry() in RedisConsumer to requeue with incremented count
- Standardize all slog logs: replace "info!(l, "msg"; "k" => v)" shorthand
with "info!(l, "{}", format!("msg k={}", v))" across ssh/authz.rs,
ssh/handle.rs, ssh/server.rs, hook/webhook_dispatch.rs, hook/pool/mod.rs
- Downgrade russh from 0.55.0 to 0.50.4
- Remove unused flate2 feature from russh dependency
- Use info!(logger, "{}", format!(...)) for channel lifecycle log messages
- Remove user=="git" restriction from auth_password: the actual user is
determined by the token, not the SSH username, matching Gitea's approach
- Add channel_open_session logging with explicit flush to verify
CHANNEL_OPEN_CONFIRMATION reaches the client
- Add pty_request handler (reject with log) so git clients that request
a PTY are handled gracefully instead of falling through to default
- Add subsystem_request handler (log + accept) so git subsystems are
visible in logs
- Prefix unused variables with _ to eliminate warnings
- Add LFS_MAX_OBJECT_SIZE (50 GiB) and validate object sizes in both the
batch advisory check and the upload_object streaming loop to prevent
unbounded disk usage from malicious clients
- Fix HTTP rate limiter: track read_count and write_count separately so
a burst of writes cannot exhaust the read budget (previously all
operations incremented read_count regardless of type)
HTTP:
- Return Err(...) instead of Ok(HttpResponse::...) for error cases so
actix returns correct HTTP status codes instead of 200
- Add 30s timeout on info_refs and handle_git_rpc git subprocess calls
- Add 1MB pre-PACK limit to prevent memory exhaustion on receive-pack
- Enforce branch protection rules (forbid push/force-push/deletion/tag)
- Simplify graceful shutdown (remove manual signal handling)
SSH:
- Fix build_git_command: use block match arms so chained .arg() calls
are on the Command, not the match expression's () result
- Add MAX_RETRIES=5 to forward() data-pump loop to prevent infinite
spin on persistent network failures
- Fall back to raw path if canonicalize() fails instead of panicking
- Add platform-specific git config paths (/dev/null on unix, NUL on win)
- Start rate limiter cleanup background task so HashMap doesn't grow
unbounded over time
- Derive Clone on RateLimiter so SshRateLimiter::start_cleanup works
Backend:
- Atomic seq assignment via Redis Lua script: INCR + GET run atomically
inside a Lua script, preventing duplicate seqs under concurrent requests.
DB reconciliation only triggers on cross-server handoff (rare path).
- Broadcast channel capacity: 10,000 → 100,000 to prevent message drops
under high-throughput rooms.
Frontend:
- Optimistic sendMessage: adds message to UI immediately (marked
isOptimistic=true) so user sees it instantly. Replaces with
server-confirmed message on success, marks as isOptimisticError on
failure. Fire-and-forget to IndexedDB for persistence.
- Seq-based dedup in onRoomMessage: replaces optimistic message by
matching seq, preventing duplicates when WS arrives before REST confirm.
- Reconnect jitter: replaced deterministic backoff with full jitter
(random within backoff window), preventing thundering herd on server
restart.
- Visual WS status dot in room header: green=connected, amber
(pulsing)=connecting, red=error/disconnected.
- isPending check extended to cover both old 'temp-' prefix and new
isOptimistic flag, showing 'Sending...' / 'Failed' badges.
validate_origin() only allowed localhost origins by default, causing
production WebSocket connections to be rejected. Now it reads
APP_DOMAIN_URL and APP_STATIC_DOMAIN from env and automatically
adds their http/https/ws/wss variants to the allowed origins list.
Also add APP_DOMAIN_URL to the production configmap.