Root cause: publish_reaction_event sends a RoomMessageEvent (not
reaction_added) with reactions field set. The onRoomMessage handler
previously returned prev immediately when a duplicate ID was found,
skipping the reaction update entirely.
Fix:
- Add reactions field to RoomMessagePayload so TypeScript knows it's there
- When a duplicate message ID is found AND payload carries reactions,
update the existing message's reactions field instead of ignoring
- ReactionItem and ReactionGroup have identical shapes, so assignment works
Backend:
- Add reaction_batch handler: GET /api/rooms/{room_id}/messages/reactions/batch?message_ids=...
- Register route in libs/api/room/mod.rs
- Backend already had message_reactions_batch service method, just needed HTTP exposure
Frontend:
- Add ReactionListData import to room-context.tsx
- Fix thisLoadReactions client type (was using broken NonNullable<ReturnType<>>)
- Remove unused oldRoomId variable
- Delete unused useRoomWs.ts hook (RoomWsClient has no on/off methods)
- Remove unused EmojiPicker function and old manual overlay picker from RoomMessageBubble
- Remove unused savedToken variable in room-ws-client
Manual getBoundingClientRect positioning caused the picker to appear at
the far right of the room and shift content. Replaced with shadcn
Popover which handles anchor positioning, flipping, and portal rendering
automatically.
loadMore(null) fires immediately — cached messages render instantly.
WS connect runs in parallel; subscribeRoom and reaction batch-fetch
use WS-first request() which falls back to HTTP transparently.
- setup() now awaits client.connect() before subscribeRoom and loadMore,
ensuring the connection is truly open so WS is used for both
- subscribeRoom / reactionListBatch: already WS-first via RoomWsClient.request()
- IDB paths (initial + loadMore) now call thisLoadReactions to batch-fetch
reactions via WS with HTTP fallback, fixing the missing reactions bug
- thisLoadReactions helper: batch-fetches reactions for loaded messages
via WS (RoomWsClient.request() does WS-first → HTTP fallback automatically)
- Called after both IDB paths (initial load + loadMore) so reactions are
populated even when messages come from IndexedDB cache
- Also deduplicated API-path reaction loading to use the same helper
On auto-reconnect (scheduleReconnect), attempt connection with the stored
wsToken first. If the WS closes immediately (server rejected the token),
fall back to fetching a fresh token and retrying. Only requests a new
token when the existing one fails or when connect() is called manually
with forceNewToken=true.
- Remove clearRoomMessages on room switch: IDB cache now persists across
visits, enabling instant re-entry without API round-trip
- Increase initial API limit from 50 → 200: more messages cached upfront
- Add loadOlderMessagesFromIdb: uses 'by_room_seq' compound index to
serve scroll-back history from IDB without API call
- loadMore now tries IDB first before falling back to API
- Remove useTransition/useDeferredValue from RoomMessageList
- Wrap component in memo to prevent unnecessary re-renders
- Use requestAnimationFrame to defer scroll state updates
- Remove isUserScrolling state (no longer needed)
- Simplify auto-scroll effect: sync distance check + RAF deferred scroll
- Add replyMap memo to decouple reply lookup from row computation
- Stabilize handleEditConfirm to depend on editingMessage?.id only
- Remove Performance Stats panel (RoomPerformanceMonitor)
- Fix initial room load being skipped: `setup()` called `loadMoreRef.current`
which was null on first mount (ref assigned in later effect). Call `loadMore`
directly so the initial fetch always fires. WS message.list used when
connected, HTTP fallback otherwise.
- Rewrite useRoomWs to use shared RoomWsClient instead of creating its own
raw WebSocket, eliminating duplicate WS connection per room.
- Remove dead loadMoreRef now that setup calls loadMore directly.
Frontend:
- P0: Replace constant estimateSize(40px) with content-based estimation
using line count and reply presence for accurate virtual list scroll
- P1: Replace Shadow DOM custom elements with styled spans for @mentions,
eliminating expensive attachShadow calls per mention instance
- P1: Remove per-message ResizeObserver (one per bubble), replace with
static inline toolbar layout to avoid observer overhead
- P2: Fix WS token re-fetch on every room switch by preserving token
across navigation and not clearing activeRoomIdRef on cleanup
Backend:
- P1: Fix reaction check+insert race condition by moving into transaction
instead of separate query + on-conflict insert
- P2: Fix N+1 queries in get_mention_notifications with batch fetch
for users and rooms using IN clauses
- P2: Update room_last_activity in broadcast_stream_chunk to prevent
idle room cleanup during active AI streaming
- P3: Use enum comparison instead of to_string() in room_member_leave
- reaction.rs: query before insert to detect new vs duplicate reactions,
only publish Redis event when a reaction was actually added
- room.rs: delete Redis seq key on room deletion to prevent seq
collision on re-creation
- message.rs: use Redis-atomic next_room_message_seq_internal for
concurrent safety; look up sender display name once for both
mention notifications and response body; add warn log when
should_ai_respond fails instead of silent unwrap_or(false)
- ws_universal.rs: re-check room access permission when re-subscribing
dead streams after error to prevent revoked permissions being bypassed
- RoomChatPanel.tsx: truncate reply preview content to 80 chars
- RoomMessageList.tsx: remove redundant inline style on message row div
- Add `check_room_access` before `manager.subscribe()` in ws_universal
to prevent unauthorized room subscription (security fix)
- Fix busy-wait in `poll_push_streams`: sleep 50ms when streams are
empty, yield only when there are active streams
- Re-subscribe dead rooms after stream errors so events are not
permanently lost until manual reconnect
- Fix streaming message placeholder using fake content as room_id:
use chunk.room_id from backend instead
- Show toast error on history load failures instead of silent fallback
- git_commit_graph: use default_branch when rev is None
- git_commit_graph_react: use default_branch when rev is None
- git_commit_reflog: fall back to default_branch when HEAD is detached
Fixes errors:
- reference 'refs/heads/master' not found (graph-react endpoint)
- HEAD has no name (reflog endpoint when HEAD is detached)
1. branch_list: Fix is_head comparison
- head_name now keeps full ref (e.g., "refs/heads/main")
- Previously stripped to short name causing is_head to always be false
2. branch_get/branch_exists: Fix name resolution for branches with '/'
- Previously, "feature/testing" was assumed to be a remote branch
- Now tries both refs/heads/ and refs/remotes/ candidates
- Correctly handles local branches like "feature/testing"
3. sync_refs: Fix upstream reference format
- Was storing "refs/remotes/{}/branch" (broken pattern)
- Now properly queries git2 for actual upstream branch name
Move all specific branch routes before /branches/{name} to prevent
route shadowing. Previously, routes like /branches/rename, /branches/move,
/branches/upstream, /branches/diff, etc. were shadowed by /branches/{name}.
Move specific routes (/commits/reflog, /commits/branches, /commits/tags)
before parameterized routes (/commits/{oid}) to avoid route shadowing.
Previously, /commits/reflog was matched by /commits/{oid} with oid="reflog",
causing InvalidOid("reflog") errors.
Also fixes other potential route shadowing issues in commit routes.
The database stores short branch names (e.g., "main"), but git2's push_ref()
requires full reference names (e.g., "refs/heads/main"). This fixes all
service-layer endpoints to convert default_branch to the full ref format.
Fixed endpoints:
- git_readme: convert to refs/heads/{branch}
- git_commit_count: convert to refs/heads/{branch}
- git_contributors: convert to refs/heads/{branch}
- git_commit_log: convert to refs/heads/{branch}
- git_commit_walk: convert to refs/heads/{branch}
Resolves errors:
- Internal("the given reference name 'main' is not valid")
- Add #[serde(default)] to MergeAnalysisQuery.their_oid and MergeRefAnalysisQuery fields
since they come from path parameters, not query strings
- git_readme/git_commit_count/git_contributors: default to repo.default_branch
instead of HEAD to avoid errors when HEAD points to refs/heads/master but
the actual default branch is main
GitHook controller was generating a Deployment without any persistent
storage — only a ConfigMap volume at /config. The worker needs /data to
access repo storage paths (APP_REPOS_ROOT defaults to /data/repos).
Changes:
- GitHookSpec: added storage_size field (default 10Gi), matching the
pattern already used by GitServerSpec
- git_hook.rs reconcile(): now creates a PVC ({name}-data) before the
Deployment, mounts it at /data, and sets APP_REPOS_ROOT=/data/repos
- git-hook-crd.yaml: synced storageSize field into the CRD schema
The git-hook worker needs /data to access repo storage paths, but the
deployment defined the shared-data PVC volume at pod level without
attaching it to the container via volumeMounts. Added the missing
volumeMounts block (name: shared-data, mountPath: /data) so the PVC is
properly mounted into the git-hook container, consistent with app and
static deployments.
GitServiceHooks was renamed to HookService in the hook module refactor.
Updated main.rs to:
- Use HookService::new() with correct parameters (no http client)
- Call start_worker() which returns CancellationToken
- Wait on cancel.cancelled() instead of double-awaiting ctrl_c
- Clone token before moving into signal handler task
sync_tags held git2 types (StringArray, Repository) across .await points
on sea-orm DB operations. Applied the same pattern as sync_refs:
- Added collect_tag_refs() sync helper that collects all tag metadata
(name, oid, description, tagger) into owned TagTip structs.
- sync_tags now calls collect_tag_refs() before any .await, so no git2
types cross the async boundary.
- Removed #[allow(dead_code)] from TagTip now that it's consumed.
- pool/worker.rs: only dispatch webhooks after sync succeeds (skip on error);
pass max_retries from PoolConfig to HookWorker
- sync/branch.rs: replaced with stub (sync_refs moved to commit.rs)
- sync/commit.rs: add collect_git_refs() that collects BranchTip/TagTip from
git2 entirely within one sync call; sync_refs now uses owned data so no
git2 types cross .await boundaries (future is Send)
- sync/fsck.rs: remove extraneous "HEAD" arg from rollback git update-ref
(correct syntax is: update-ref -m <msg> <ref> <new-sha> — no old-sha)
- webhook_dispatch.rs: touch_count uses Expr::col().add(1) for atomic
increment instead of overwriting with Expr::value(1)
- pool/mod.rs: pass config.redis_max_retries to HookWorker
- sync/mod.rs: wrap scan_skills_from_dir in spawn_blocking to avoid
blocking the async executor; use to_path_buf() to get owned PathBuf
- pool/worker.rs: replace 500ms poll sleep with cancellation-is_cancelled
check (eliminates artificial latency); add exponential backoff on Redis
errors (1s base, 32s cap, reset on success)
- pool/redis.rs: add 5s timeout on pool.get() for all three methods
(next, ack_raw, nak_with_retry) to prevent indefinite blocking on
unresponsive Redis
- sync/gc.rs: add comment explaining why git gc --auto non-zero exit
is benign
- webhook_dispatch.rs: remove unnecessary format! wrappers in slog macros
- config/hook.rs: document max_concurrent intent (K8s operator/HPA, not
the single-threaded worker itself)
- pool/mod.rs: pass shared http_client Arc to HookWorker
- worker.rs: remove double-locking (sync() manages its own lock),
await all webhook handles before returning, share http_client,
hoist namespace query out of loop
- redis.rs: atomic NAK via Lua script (LREM + LPUSH in one eval)
- sync/lock.rs: increase LOCK_TTL from 60s to 300s for large repos
- sync/mod.rs: split sync/sync_work, fsck_only/fsck_work, gc_only/gc_work
so callers can choose locked vs lock-free path; run_gc + sync_skills
outside the DB transaction
- hook/mod.rs: remove unused http field from HookService
- ssh/mod.rs, http/mod.rs: remove unused HookService/http imports
- pool/worker.rs: single-threaded consumer that BLMPOPs from Redis queues
sequentially. K8s replicas provide HA — each pod runs one worker.
- pool/redis.rs: RedisConsumer with BLMOVE atomic dequeue, ACK/NAK, and
retry-with-json support.
- pool/types.rs: HookTask, TaskType, PoolConfig (minimal — no pool metrics).
- sync/lock.rs: Redis SET NX EX per-repo lock to prevent concurrent workers
from processing the same repo. Lock conflicts are handled by requeueing
without incrementing retry count.
- hook/mod.rs: HookService.start_worker() spawns the background worker.
- ssh/mod.rs / http/mod.rs: ReceiveSyncService RPUSHes to Redis queue.
Both run_http and run_ssh call start_worker() to launch the consumer.
- Lock conflicts (GitError::Locked) in the worker are requeued without
incrementing retry_count so another worker can pick them up.
Model sync:
- Filter OpenRouter models by what the user's AI client can actually access,
before upserting metadata (avoids bloating with inaccessible models).
- Fall back to direct endpoint sync when no OpenRouter metadata matches
(handles Bailian/MiniMax and other non-OpenRouter providers).
Git stability fixes:
- SSH: add 5s timeout on stdin flush/shutdown in channel_eof and
cleanup_channel to prevent blocking the event loop on unresponsive git.
- SSH: remove dbg!() calls from production code paths.
- HTTP auth: pass proper Logger to SshAuthService instead of discarding
all auth events to slog::Discard.
Dependencies:
- reqwest: add native-tls feature for HTTPS on Windows/Linux/macOS.
All log calls in sync.rs now use slog macros:
- sync_once: uses logger passed as parameter (sourced from AppService.logs)
- sync_upstream_models (HTTP API): uses self.logs
- Remove use of tracing::warn/info/error entirely
The periodic background sync task and the HTTP API handler now
write to the same slog logger as the rest of the application.
Drop all hard-coded model-name lookup tables that hardcoded specific
model names and prices:
- infer_context_length: remove GPT-4o/o1/Claude/etc. fallback table
- infer_max_output: remove GPT-4o/o1/etc. output token limits
- infer_pricing_fallback: remove entire hardcoded pricing table
- infer_capability_list: derive from architecture.modality only,
no longer uses model name strings
Also fix stats: if upsert_version fails, skip counting and continue
rather than counting model but not version (which caused
versions_created=0 while pricing_created>0 inconsistency).
SSH is deployed inside Kubernetes cluster where rate limiting
at the application layer is unnecessary. Remove all SSH rate
limiter code:
- SshRateLimiter from SSHandle and SSHServer structs
- is_user_allowed checks in auth_publickey, auth_publickey_offered
- is_repo_access_allowed in exec_request
- is_ip_allowed in server::new_client
- rate_limiter module and start_cleanup
- Add `start_sync_task()` in agent/sync.rs: spawns a background task
that syncs immediately on app startup, then every 10 minutes.
- `sync_once()` performs a single pass; errors are logged and swallowed
so the periodic task never stops.
- Remove authentication requirement from OpenRouter API (no API key needed).
- Call `service.start_sync_task()` from main.rs after AppService init.
- Also update the existing `sync_upstream_models` (HTTP API) to remove
the now-unnecessary API key requirement for consistency.
Fail fast if repo storage_path does not exist on disk, before
blocking the thread pool with spawn_blocking. This prevents
invalid tasks from consuming blocking threads while failing.
- Fix match pattern: `Ok(Ok(Err(_)))` must be treated as NOT a panic,
since execute_task_body returns Err(()) on task failure (not a panic).
Previously the Err path incorrectly set panicked=true, which could
cause pool to appear unhealthy.
- Add "task failed" log before retry/discard decision so real error
is visible in logs (previously only last_error on exhausted-retries
path was logged, which required hitting 5 retries to see the cause).
- Convert 3 remaining pool/mod.rs shorthand logs to format!() pattern.
- Add retry_count to HookTask with serde(default) for backwards compat
- Limit hook task retries to MAX_RETRIES=5, discard after limit to prevent
infinite requeue loops that caused 'task nack'd and requeued' log spam
- Add nak_with_retry() in RedisConsumer to requeue with incremented count
- Standardize all slog logs: replace "info!(l, "msg"; "k" => v)" shorthand
with "info!(l, "{}", format!("msg k={}", v))" across ssh/authz.rs,
ssh/handle.rs, ssh/server.rs, hook/webhook_dispatch.rs, hook/pool/mod.rs