Skip to content

Multi-minute event-loop freeze in uv__fsevents_close (macOS) when indexing/file-watching is enabled #147

@inercia

Description

@inercia

Summary

On macOS, the agent's entire Node.js event loop freezes for several minutes (observed ~5–7 min) with no output — it appears completely hung — then self-recovers and the request completes normally. Stack sampling during the freeze shows the main thread blocked in uv__fsevents_close on libuv's internal FSEvents-thread handshake semaphore, while libuv's CFRunLoop/FSEvents thread is alive and idle. This indicates a lost-wakeup / handshake race in libuv's macOS FSEvents close path. It occurs only with file-watching (indexing) enabled and correlated strongly with connecting a corporate VPN; it reproduced across multiple workspaces simultaneously.

Environment

  • auggie: 0.28.0 (commit 63537d73)
  • Node.js: v26.0.0 (bundled libuv 1.x; libnode.147)
  • macOS: 26.5 (build 25F71)
  • CPU: Apple Silicon (arm64)
  • Invocation: auggie --workspace-root=<workspace> --allow-indexing --acp
  • Workspaces: large, on local APFS volumes (NOT network mounts)

Symptoms

  • Agent stops streaming entirely mid-request; appears "still responding" with zero progress.
  • ~0% CPU during the freeze.
  • Freeze lasted several minutes, then the request completed normally (stop_reason=end_turn).
  • Occurred on 2+ workspaces at the same time, shortly after connecting a VPN.

Diagnosis (macOS sample)

Main thread during the freeze (≈100% of samples over multi-second windows):

node::SpinEventLoopInternal(node::Environment*)
  uv_run
    uv__fsevents_close            (libuv.1.0.0.dylib)
      <uv_sem_wait>
        semaphore_wait_trap       (libsystem_kernel.dylib)

libuv CFRunLoop / FSEvents thread at the same time — alive and healthy, parked on its mach port, occasionally still delivering FSEvents callbacks:

CFRunLoopRun
  _CFRunLoopRunSpecificWithOptions
    __CFRunLoopRun
      __CFRunLoopServiceMachPort
        mach_msg -> mach_msg2_trap
  (occasionally)
  __CFRunLoopDoSource1
    FSEventsClientProcessMessageCallback
      FSEventsD2F_server -> _Xcallback_rpc -> implementation_callback_rpc
        uv__fsevents_event_cb     (libuv.1.0.0.dylib)

After recovery, the main thread returns to the normal loop:

node::SpinEventLoopInternal -> uv_run -> uv__io_poll -> kevent

Interpretation

On macOS, uv__fsevents_close() hands stream teardown to libuv's dedicated CFRunLoop thread and blocks on a semaphore until that thread acknowledges. Here the main thread is stuck on that semaphore while the CFRunLoop thread is alive but idle in mach_msg — i.e. it is not (or no longer) processing the pending close signal, so the acknowledging post is delayed for minutes. This looks like a lost-wakeup / ordering race in the fsevents close handshake (uv__cf_loop_signal / signal_source / the sync semaphore), rather than the CFRunLoop thread being blocked on real work.

Likely trigger

  • Requires active FSEvents watchers (--allow-indexing). With watchers disabled the path is never taken.
  • Strongly correlated with connecting a corporate VPN. Plausible mechanism: VPN connect produces volume/mount//Volumes changes that FSEvents delivers as events, prompting libuv to reschedule/close streams in a burst — hitting the close-handshake race. The CFRunLoop thread is NOT blocked on any network/configd/DNS call, so the VPN's network stack is not directly involved.

Reproduction (observed; not yet minimized)

  1. macOS arm64, Node v26, with one or more recursive FSEvents watchers active on a large local workspace.
  2. Trigger a burst of FSEvents activity (connect a VPN that alters mounted volumes, or rapidly create/close many watchers).
  3. Observe the main thread freeze in uv__fsevents_close for minutes, then recover.

A minimal repro would likely be a Node script that creates and rapidly closes many recursive fs.watch watchers on macOS while volume-change events fire.

Impact

Total event-loop freeze: the agent appears hung for minutes and all in-flight work stalls. From a host application's perspective it is indistinguishable from a permanent deadlock until it self-heals.

Workaround

Run without --allow-indexing for affected workspaces (removes FSEvents watchers, avoiding the close path entirely).

Possibly related

  • Node.js / libuv macOS fsevents reports of uv__fsevents_close semaphore handshakes hanging under high watcher churn.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions