- Add `check_room_access` before `manager.subscribe()` in ws_universal
to prevent unauthorized room subscription (security fix)
- Fix busy-wait in `poll_push_streams`: sleep 50ms when streams are
empty, yield only when there are active streams
- Re-subscribe dead rooms after stream errors so events are not
permanently lost until manual reconnect
- Fix streaming message placeholder using fake content as room_id:
use chunk.room_id from backend instead
- Show toast error on history load failures instead of silent fallback
- git_commit_graph: use default_branch when rev is None
- git_commit_graph_react: use default_branch when rev is None
- git_commit_reflog: fall back to default_branch when HEAD is detached
Fixes errors:
- reference 'refs/heads/master' not found (graph-react endpoint)
- HEAD has no name (reflog endpoint when HEAD is detached)
1. branch_list: Fix is_head comparison
- head_name now keeps full ref (e.g., "refs/heads/main")
- Previously stripped to short name causing is_head to always be false
2. branch_get/branch_exists: Fix name resolution for branches with '/'
- Previously, "feature/testing" was assumed to be a remote branch
- Now tries both refs/heads/ and refs/remotes/ candidates
- Correctly handles local branches like "feature/testing"
3. sync_refs: Fix upstream reference format
- Was storing "refs/remotes/{}/branch" (broken pattern)
- Now properly queries git2 for actual upstream branch name
Move all specific branch routes before /branches/{name} to prevent
route shadowing. Previously, routes like /branches/rename, /branches/move,
/branches/upstream, /branches/diff, etc. were shadowed by /branches/{name}.
Move specific routes (/commits/reflog, /commits/branches, /commits/tags)
before parameterized routes (/commits/{oid}) to avoid route shadowing.
Previously, /commits/reflog was matched by /commits/{oid} with oid="reflog",
causing InvalidOid("reflog") errors.
Also fixes other potential route shadowing issues in commit routes.
The database stores short branch names (e.g., "main"), but git2's push_ref()
requires full reference names (e.g., "refs/heads/main"). This fixes all
service-layer endpoints to convert default_branch to the full ref format.
Fixed endpoints:
- git_readme: convert to refs/heads/{branch}
- git_commit_count: convert to refs/heads/{branch}
- git_contributors: convert to refs/heads/{branch}
- git_commit_log: convert to refs/heads/{branch}
- git_commit_walk: convert to refs/heads/{branch}
Resolves errors:
- Internal("the given reference name 'main' is not valid")
- Add #[serde(default)] to MergeAnalysisQuery.their_oid and MergeRefAnalysisQuery fields
since they come from path parameters, not query strings
- git_readme/git_commit_count/git_contributors: default to repo.default_branch
instead of HEAD to avoid errors when HEAD points to refs/heads/master but
the actual default branch is main
GitHook controller was generating a Deployment without any persistent
storage — only a ConfigMap volume at /config. The worker needs /data to
access repo storage paths (APP_REPOS_ROOT defaults to /data/repos).
Changes:
- GitHookSpec: added storage_size field (default 10Gi), matching the
pattern already used by GitServerSpec
- git_hook.rs reconcile(): now creates a PVC ({name}-data) before the
Deployment, mounts it at /data, and sets APP_REPOS_ROOT=/data/repos
- git-hook-crd.yaml: synced storageSize field into the CRD schema
The git-hook worker needs /data to access repo storage paths, but the
deployment defined the shared-data PVC volume at pod level without
attaching it to the container via volumeMounts. Added the missing
volumeMounts block (name: shared-data, mountPath: /data) so the PVC is
properly mounted into the git-hook container, consistent with app and
static deployments.
GitServiceHooks was renamed to HookService in the hook module refactor.
Updated main.rs to:
- Use HookService::new() with correct parameters (no http client)
- Call start_worker() which returns CancellationToken
- Wait on cancel.cancelled() instead of double-awaiting ctrl_c
- Clone token before moving into signal handler task
sync_tags held git2 types (StringArray, Repository) across .await points
on sea-orm DB operations. Applied the same pattern as sync_refs:
- Added collect_tag_refs() sync helper that collects all tag metadata
(name, oid, description, tagger) into owned TagTip structs.
- sync_tags now calls collect_tag_refs() before any .await, so no git2
types cross the async boundary.
- Removed #[allow(dead_code)] from TagTip now that it's consumed.
- pool/worker.rs: only dispatch webhooks after sync succeeds (skip on error);
pass max_retries from PoolConfig to HookWorker
- sync/branch.rs: replaced with stub (sync_refs moved to commit.rs)
- sync/commit.rs: add collect_git_refs() that collects BranchTip/TagTip from
git2 entirely within one sync call; sync_refs now uses owned data so no
git2 types cross .await boundaries (future is Send)
- sync/fsck.rs: remove extraneous "HEAD" arg from rollback git update-ref
(correct syntax is: update-ref -m <msg> <ref> <new-sha> — no old-sha)
- webhook_dispatch.rs: touch_count uses Expr::col().add(1) for atomic
increment instead of overwriting with Expr::value(1)
- pool/mod.rs: pass config.redis_max_retries to HookWorker
- sync/mod.rs: wrap scan_skills_from_dir in spawn_blocking to avoid
blocking the async executor; use to_path_buf() to get owned PathBuf
- pool/worker.rs: replace 500ms poll sleep with cancellation-is_cancelled
check (eliminates artificial latency); add exponential backoff on Redis
errors (1s base, 32s cap, reset on success)
- pool/redis.rs: add 5s timeout on pool.get() for all three methods
(next, ack_raw, nak_with_retry) to prevent indefinite blocking on
unresponsive Redis
- sync/gc.rs: add comment explaining why git gc --auto non-zero exit
is benign
- webhook_dispatch.rs: remove unnecessary format! wrappers in slog macros
- config/hook.rs: document max_concurrent intent (K8s operator/HPA, not
the single-threaded worker itself)
- pool/mod.rs: pass shared http_client Arc to HookWorker
- worker.rs: remove double-locking (sync() manages its own lock),
await all webhook handles before returning, share http_client,
hoist namespace query out of loop
- redis.rs: atomic NAK via Lua script (LREM + LPUSH in one eval)
- sync/lock.rs: increase LOCK_TTL from 60s to 300s for large repos
- sync/mod.rs: split sync/sync_work, fsck_only/fsck_work, gc_only/gc_work
so callers can choose locked vs lock-free path; run_gc + sync_skills
outside the DB transaction
- hook/mod.rs: remove unused http field from HookService
- ssh/mod.rs, http/mod.rs: remove unused HookService/http imports
- pool/worker.rs: single-threaded consumer that BLMPOPs from Redis queues
sequentially. K8s replicas provide HA — each pod runs one worker.
- pool/redis.rs: RedisConsumer with BLMOVE atomic dequeue, ACK/NAK, and
retry-with-json support.
- pool/types.rs: HookTask, TaskType, PoolConfig (minimal — no pool metrics).
- sync/lock.rs: Redis SET NX EX per-repo lock to prevent concurrent workers
from processing the same repo. Lock conflicts are handled by requeueing
without incrementing retry count.
- hook/mod.rs: HookService.start_worker() spawns the background worker.
- ssh/mod.rs / http/mod.rs: ReceiveSyncService RPUSHes to Redis queue.
Both run_http and run_ssh call start_worker() to launch the consumer.
- Lock conflicts (GitError::Locked) in the worker are requeued without
incrementing retry_count so another worker can pick them up.
Model sync:
- Filter OpenRouter models by what the user's AI client can actually access,
before upserting metadata (avoids bloating with inaccessible models).
- Fall back to direct endpoint sync when no OpenRouter metadata matches
(handles Bailian/MiniMax and other non-OpenRouter providers).
Git stability fixes:
- SSH: add 5s timeout on stdin flush/shutdown in channel_eof and
cleanup_channel to prevent blocking the event loop on unresponsive git.
- SSH: remove dbg!() calls from production code paths.
- HTTP auth: pass proper Logger to SshAuthService instead of discarding
all auth events to slog::Discard.
Dependencies:
- reqwest: add native-tls feature for HTTPS on Windows/Linux/macOS.
All log calls in sync.rs now use slog macros:
- sync_once: uses logger passed as parameter (sourced from AppService.logs)
- sync_upstream_models (HTTP API): uses self.logs
- Remove use of tracing::warn/info/error entirely
The periodic background sync task and the HTTP API handler now
write to the same slog logger as the rest of the application.
Drop all hard-coded model-name lookup tables that hardcoded specific
model names and prices:
- infer_context_length: remove GPT-4o/o1/Claude/etc. fallback table
- infer_max_output: remove GPT-4o/o1/etc. output token limits
- infer_pricing_fallback: remove entire hardcoded pricing table
- infer_capability_list: derive from architecture.modality only,
no longer uses model name strings
Also fix stats: if upsert_version fails, skip counting and continue
rather than counting model but not version (which caused
versions_created=0 while pricing_created>0 inconsistency).
SSH is deployed inside Kubernetes cluster where rate limiting
at the application layer is unnecessary. Remove all SSH rate
limiter code:
- SshRateLimiter from SSHandle and SSHServer structs
- is_user_allowed checks in auth_publickey, auth_publickey_offered
- is_repo_access_allowed in exec_request
- is_ip_allowed in server::new_client
- rate_limiter module and start_cleanup
- Add `start_sync_task()` in agent/sync.rs: spawns a background task
that syncs immediately on app startup, then every 10 minutes.
- `sync_once()` performs a single pass; errors are logged and swallowed
so the periodic task never stops.
- Remove authentication requirement from OpenRouter API (no API key needed).
- Call `service.start_sync_task()` from main.rs after AppService init.
- Also update the existing `sync_upstream_models` (HTTP API) to remove
the now-unnecessary API key requirement for consistency.
Fail fast if repo storage_path does not exist on disk, before
blocking the thread pool with spawn_blocking. This prevents
invalid tasks from consuming blocking threads while failing.
- Fix match pattern: `Ok(Ok(Err(_)))` must be treated as NOT a panic,
since execute_task_body returns Err(()) on task failure (not a panic).
Previously the Err path incorrectly set panicked=true, which could
cause pool to appear unhealthy.
- Add "task failed" log before retry/discard decision so real error
is visible in logs (previously only last_error on exhausted-retries
path was logged, which required hitting 5 retries to see the cause).
- Convert 3 remaining pool/mod.rs shorthand logs to format!() pattern.
- Add retry_count to HookTask with serde(default) for backwards compat
- Limit hook task retries to MAX_RETRIES=5, discard after limit to prevent
infinite requeue loops that caused 'task nack'd and requeued' log spam
- Add nak_with_retry() in RedisConsumer to requeue with incremented count
- Standardize all slog logs: replace "info!(l, "msg"; "k" => v)" shorthand
with "info!(l, "{}", format!("msg k={}", v))" across ssh/authz.rs,
ssh/handle.rs, ssh/server.rs, hook/webhook_dispatch.rs, hook/pool/mod.rs
- Downgrade russh from 0.55.0 to 0.50.4
- Remove unused flate2 feature from russh dependency
- Use info!(logger, "{}", format!(...)) for channel lifecycle log messages
- Remove user=="git" restriction from auth_password: the actual user is
determined by the token, not the SSH username, matching Gitea's approach
- Add channel_open_session logging with explicit flush to verify
CHANNEL_OPEN_CONFIRMATION reaches the client
- Add pty_request handler (reject with log) so git clients that request
a PTY are handled gracefully instead of falling through to default
- Add subsystem_request handler (log + accept) so git subsystems are
visible in logs
- Prefix unused variables with _ to eliminate warnings
- Add LFS_MAX_OBJECT_SIZE (50 GiB) and validate object sizes in both the
batch advisory check and the upload_object streaming loop to prevent
unbounded disk usage from malicious clients
- Fix HTTP rate limiter: track read_count and write_count separately so
a burst of writes cannot exhaust the read budget (previously all
operations incremented read_count regardless of type)
HTTP:
- Return Err(...) instead of Ok(HttpResponse::...) for error cases so
actix returns correct HTTP status codes instead of 200
- Add 30s timeout on info_refs and handle_git_rpc git subprocess calls
- Add 1MB pre-PACK limit to prevent memory exhaustion on receive-pack
- Enforce branch protection rules (forbid push/force-push/deletion/tag)
- Simplify graceful shutdown (remove manual signal handling)
SSH:
- Fix build_git_command: use block match arms so chained .arg() calls
are on the Command, not the match expression's () result
- Add MAX_RETRIES=5 to forward() data-pump loop to prevent infinite
spin on persistent network failures
- Fall back to raw path if canonicalize() fails instead of panicking
- Add platform-specific git config paths (/dev/null on unix, NUL on win)
- Start rate limiter cleanup background task so HashMap doesn't grow
unbounded over time
- Derive Clone on RateLimiter so SshRateLimiter::start_cleanup works
onMessageEdited optimistically set edited_at, then fetched the full
message. If the fetch failed the "Edited" indicator persisted even though
the content was stale. Fix by capturing the original edited_at and
reverting it in the catch block — consistent with editMessage rollback.
connect() is async/fire-and-forget — if the user switches rooms while
WS is still connecting, the subscribeRoom() call captures the stale
(activeRoomId) closure value and subscribes to the wrong room. Fix by
re-reading activeRoomIdRef.current after the await so we always subscribe
to the room that is active when the connection actually opens.
- useEffect([wsClient]): remove wsClient from deps to prevent
React StrictMode double-mount from disconnecting the real client.
First mount connects client-1; StrictMode cleanup disconnects it.
Second mount connects client-2; first mount's second cleanup would
then disconnect client-2, leaving WS permanently unconnected.
Changing to useEffect([]) + optional chaining fixes this.
- revokeMessage: add optimistic removal + rollback on server rejection,
consistent with editMessage pattern. Previously a failed delete left the
message visible with no feedback.
- sendMessage: guard with sendingRef to prevent concurrent in-flight
sends (was missing — rapid clicks could create duplicate messages)
- resubscribeAll: log at warn level instead of silently swallowing,
so operators can observe auth expiry or persistent failure patterns
- RoomMessageBubble: apply opacity-60 when isPending or isFailed,
and hide action toolbar for pending messages (can't react/act on
unconfirmed messages)
Backend:
- Atomic seq assignment via Redis Lua script: INCR + GET run atomically
inside a Lua script, preventing duplicate seqs under concurrent requests.
DB reconciliation only triggers on cross-server handoff (rare path).
- Broadcast channel capacity: 10,000 → 100,000 to prevent message drops
under high-throughput rooms.
Frontend:
- Optimistic sendMessage: adds message to UI immediately (marked
isOptimistic=true) so user sees it instantly. Replaces with
server-confirmed message on success, marks as isOptimisticError on
failure. Fire-and-forget to IndexedDB for persistence.
- Seq-based dedup in onRoomMessage: replaces optimistic message by
matching seq, preventing duplicates when WS arrives before REST confirm.
- Reconnect jitter: replaced deterministic backoff with full jitter
(random within backoff window), preventing thundering herd on server
restart.
- Visual WS status dot in room header: green=connected, amber
(pulsing)=connecting, red=error/disconnected.
- isPending check extended to cover both old 'temp-' prefix and new
isOptimistic flag, showing 'Sending...' / 'Failed' badges.