Implement the following plan:
bin/plumb/main.ml SwitchoverSteps 0–4 and 5a of PLMB-037 are complete and merged to main.
The fabric runs inline morphisms and agent subprocesses end-to-end
over ZMQ. The plumb binary still uses Exec.execute (pipe-based,
thread-based). This step switches it to use Fabric.execute +
Bridge.make_handler.
The binary goes from being "the executor" to being "the I/O adapter for the fabric" — a thin shell that reads stdin, pushes to ZMQ boundary sockets, pulls from ZMQ output sockets, and writes to stdout.
| File | Change |
|---|---|
bin/plumb/main.ml |
Rewrite to use Fabric + Bridge instead of Exec |
bin/plumb/dune |
Add fabric, spawn_lib, eio_main, eio.unix deps |
lib/fabric/bridge.ml |
Expand make_handler to accept tool dispatch params |
lib/fabric/bridge.ml |
Fix: MCP temp file cleanup, fork in run_in_systhread |
Reviews identified two blockers and several medium issues. All are addressed in this plan.
Sys.catch_break true installs an async signal handler that raises
Sys.Break at arbitrary allocation points. Inside Eio_main.run,
this can fire in Eio scheduler internals and skip cleanup
(fabric.cancel(), Fabric.shutdown, IPC file cleanup).
Fix: Remove Sys.catch_break true. Instead, set a SIGINT
handler before Eio_main.run that sets a shared Atomic.t flag.
The reader loop checks this flag (or catches the exception if it
propagates). On SIGINT, call fabric.cancel() to trigger orderly
shutdown.
Concretely: use Sys.set_signal Sys.sigint (Signal_handle ...).
The handler sets an atomic flag and calls fabric.cancel() (which
is safe to call from a signal handler — it just iterates a hashtable
and closes ZMQ sockets via C FFI). The reader loop's T.recv will
fail with Transport_error once the socket is closed, causing the
switch to exit. Entry.main catches Sys.Break at line 73 for
exit code 130 — we raise Sys.Break after shutdown to preserve
this.
run_in_systhread(input_line) not cancellableThe bridge demux runs Eio_unix.run_in_systhread (fun () ->
input_line ic). This blocks in a thread pool thread until the
child writes to stdout. fabric.cancel() closes ZMQ sockets but
not child stdout pipes. The switch cannot exit while a
run_in_systhread is outstanding.
Fix: The bridge demux already handles this correctly in practice
— when the child process exits (from receiving EOF on stdin via the
mux path), its stdout pipe closes, and input_line raises
End_of_file. The key ordering: stdin writer sends EOF sentinel →
bridge mux writes __eof to child stdin → child processes and
exits → child stdout closes → demux input_line gets EOF →
demux sends EOF sentinels on output sockets → demux reaps child.
For the SIGINT path: fabric.cancel() closes ZMQ sockets. The mux
fibres' T.recv calls fail → mux fibres exit → mux_source_done
→ child stdin closes → child gets EOF → child exits → stdout
closes → demux unblocks. This chain works but may take a few
seconds. If the child hangs, waitpid_timed sends SIGTERM after
the timeout.
No code change needed — the existing ordering is correct. But the
plumb binary's shutdown must not call Fabric.shutdown inside the
switch scope (the Unix.sleepf 1.5 would block the scheduler).
Call it after the switch exits.
Fork under Eio (M1): spawn_subprocess calls Unix.fork()
from an Eio fibre. Wrap the spawn call in run_in_systhread in
bridge.ml to avoid scheduler state corruption.
Tool dispatch thread not joined (M2): Bridge creates a dispatch
thread (Thread.create) that is never joined. Track it and join
in the demux cleanup block.
MCP temp file leak (M3): Bridge writes MCP tool schemas to a temp file but only cleans up the spec file. Add cleanup for the MCP file.
Tool dispatch boundary (M4): The bridge's tool_dispatch type
takes only req_fd, resp_fd, agent_name. But
Tool_dispatch.tool_dispatch_loop needs search_path, tool_env,
on_oitl_request, call_pipeline, base_dir, etc. Two options:
make_handler to accept ~search_path,
~tool_env, ~on_oitl_request, ~base_dir. The bridge
constructs the full dispatch loop internally. Cleaner boundary.tool_dispatch_fn as-is — the caller captures
everything except fds and agent name in a closure.Decision: (A). The bridge already resolves tool bindings and MCP
connections per-agent. Adding search_path, tool_env,
on_oitl_request, and base_dir lets it construct the complete
dispatch loop. The tool_dispatch_fn parameter is removed.
For call_pipeline: stub with an error for now. Recursive pipeline
invocation through the fabric is a separate concern.
Entry.main → load_main → validate → compile routing table
→ Eio_main.run → Eio.Switch.run
→ Install SIGINT handler (sets flag, calls fabric.cancel)
→ Build Bridge.make_handler (prog, type_env, search_path,
tool_env, on_oitl_request, bin_dir, resource_find)
→ Fabric.execute ~process_handler
→ Fork writer daemon: run_in_systhread(input_line stdin)
→ T.send boundary_input → on EOF: send_eof
→ Reader (main fibre): T.recv boundary_output
→ run_in_systhread(print_endline) → on EOF: break
→ fabric.cancel() → switch exits
→ Fabric.shutdown (outside switch scope)
Before calling Fabric.execute, compile the routing table:
let ports, pb = match main.impl with
| Plumb (ports, pb) -> (ports, pb)
| _ -> Error.raise_ Config "config_error" "main must be plumb"
in
let type_env = Validate.build_type_env prog.prog_types in
let routing_table = Routing.compile ~prog ~ports ~pb
~type_env ~main_binding:main ~seeded:pb.pb_seeded in
let handler = Fabric.Bridge.make_handler
~prog ~type_env
~bin_dir:(Lazy.force Prefix_path.bin_dir)
~resource_find:Resource_path.find
~search_path
~tool_env
~base_dir:(Filename.dirname spec_path)
~on_oitl_request
() in
(* Writer daemon: stdin → boundary input *)
Eio.Fiber.fork_daemon ~sw (fun () ->
let push = Fabric.boundary_input fabric input_port in
(try while true do
let line = Eio_unix.run_in_systhread (fun () ->
input_line stdin) in
T.send push ~topic:input_port line
done with End_of_file | Sys_error _ -> ());
Fabric.Bridge.send_eof push ~topic:input_port;
`Stop_daemon);
(* Reader: boundary output → stdout (main fibre) *)
let pull = Fabric.boundary_output fabric output_port in
(try while true do
let (_topic, payload) = T.recv pull in
if Fabric.Bridge.is_eof payload then raise Exit;
Eio_unix.run_in_systhread (fun () ->
print_endline payload)
done with
| Exit | Transport.Transport_error _ -> ());
fabric.cancel ()
Port names from the main binding:
- input_port = List.nth ports 0
- output_port = List.nth ports n_in
(* Before Eio_main.run — fabric ref for signal handler *)
let fabric_ref = ref None in
Sys.set_signal Sys.sigint (Sys.Signal_handle (fun _ ->
match !fabric_ref with
| Some f -> f.Fabric.cancel ()
| None -> raise Sys.Break));
Inside the switch, set fabric_ref := Some fabric after
Fabric.execute. On SIGINT, the handler calls fabric.cancel(),
which closes all sockets, causing the reader's T.recv to fail
with Transport_error, breaking the loop. After the switch exits,
call Fabric.shutdown. If no fabric is running yet, the handler
raises Sys.Break which Entry.main catches for exit code 130.
Eio_main.run (fun _env ->
let fabric = ref None in
Eio.Switch.run (fun sw ->
let f = Fabric.execute ~sw ~routing_table ~type_env
~process_handler:handler () in
fabric := Some f;
fabric_ref := Some f;
(* ... boundary I/O fibres ... *)
f.cancel ())); (* reader loop exits here *)
(* Outside switch — safe to sleep/block *)
match !fabric with
| Some f -> Fabric.shutdown f
| None -> ()
bridge.ml Changesmake_handler signature:~search_path, ~tool_env, ~base_dir,
~on_oitl_request parameters~tool_dispatch_fn parameterBridge constructs tool_dispatch_loop closure internally
Fork safety (M1): Wrap spawn_subprocess call in
Eio_unix.run_in_systhread
Join dispatch thread (M2): Store thread handle, join in demux cleanup
MCP file cleanup (M3): Track and remove MCP temp file
bin/plumb/dune Changes(executable
(name main)
(public_name plumb)
(libraries plumb fabric language runtime spawn_lib
eio_main eio.unix unix threads.posix yojson))
nix develop --command dune build --root=. in worktreenix develop --command dune test --root=. — all tests passecho '"hello"' | ./plumb test/fixtures/agent_eliza.plumb
→ ELIZA responseecho '{"text":"hi"}' | ./plumb test/fixtures/identity.plumb
→ echo input (inline-only pipeline)test_plumb.ml integration tests still pass/tmp/plumb-* directories)--auto-approve flag works through bridgeIf you need specific details from before exiting plan mode (like exact code snippets, error messages, or content you generated), read the full transcript at: /Users/ww/.claude/projects/-Users-ww-src-plumbing/1c8917b0-bf01-4f55-891a-9c2f55a120ca.jsonl
1→(** Plumb runtime — general-purpose executor for plumb programs.
2→
3→ Parses a plumb program, finds the main binding, creates channels,
4→ spawns processes, wires boundary ports, and forwards stdin/stdout
5→ through the pipeline. Concurrent streaming forwarding — handles
6→ arbitrary n:m message ratios (copy, merge, etc.). *)
7→
8→let () =
9→ (* Default to Info; PIPELINE_DEBUG=1 wins (already set to Debug in Log
10→ module initialiser). --verbose lowers to Debug explicitly. *)
11→ let verbose = Array.exists (fun s -> s = "--verbose") Sys.argv in
12→ if verbose then Runtime.Log.set_level Runtime.Log.Debug
13→ else if Sys.getenv_opt "PIPELINE_DEBUG" <> Some "1" then
14→ Runtime.Log.set_level Runtime.Log.Info;
15→ (* Extract --provider/--model/--auto-approve from argv and set env vars.
16→ Entry.main strips these from argv before extracting the spec path. *)
17→ (match Runtime.Entry.resolve_flag Sys.argv "--provider" with
18→ | Some p -> Unix.putenv "PLUMB_PROVIDER" p | None -> ());
19→ (match Runtime.Entry.resolve_flag Sys.argv "--model" with
20→ | Some m -> Unix.putenv "PLUMB_MODEL" m | None -> ());
21→ let auto_approve = Array.exists (fun s -> s = "--auto-approve") Sys.argv in
22→ let docroot = Runtime.Entry.resolve_docroot Sys.argv in
23→ let search_path = Plumb.Stdlib_path.search_path () in
24→ let tool_env = { Plumb.Exec.docroot } in
25→ Sys.catch_break true;
26→ Runtime.Entry.main
27→ ~usage:"Usage: plumb [--verbose] [--docroot DIR] [--provider NAME] [--model NAME] [--auto-approve] <spec.plumb>"
28→ ~run:(fun spec_path ->
29→ let prog, main = Runtime.Spec.load_main_from_file ~search_path spec_path in
30→ let on_oitl_request =
31→ if auto_approve then
32→ fun (_req : Plumb.Exec.oitl_request) ->
33→ Plumb.Exec.{ oitl_approved = true; oitl_policy = Some "always" }
34→ else
35→ fun (req : Plumb.Exec.oitl_request) ->
36→ Printf.eprintf "plumb: denied tool call '%s' by agent '%s' \
37→ (use --auto-approve)\n%!" req.oitl_tool req.oitl_agent;
38→ Plumb.Exec.{ oitl_approved = false; oitl_policy = None }
39→ in
40→ let running = Plumb.Exec.execute ~search_path ~tool_env
41→ ~on_oitl_request prog main spec_path in
42→ let oc = Plumb.Exec.input running in
43→ let ic = Plumb.Exec.output running in
44→ (* Writer thread: stdin → pipeline input.
45→ release_ctrl sends stop to agents and closes ctrl fds.
46→ On normal EOF the executor's mux thread does this too
47→ (after data processing) — double-close is idempotent.
48→ On Ctrl-C, release_ctrl is the only path that sends stop. *)
49→ let writer = Thread.create (fun () ->
50→ (try while true do
51→ let line = input_line stdin in
52→ Printf.fprintf oc "%s\n%!" line
53→ done with _ -> ());
54→ close_out_noerr oc;
55→ Plumb.Exec.release_ctrl running
56→ ) () in
57→ (* Main thread: pipeline output → stdout *)
58→ (try while true do
59→ let resp = input_line ic in
60→ print_endline resp
61→ done with
62→ | End_of_file -> ()
63→ | Sys.Break -> close_in_noerr stdin);
64→ Thread.join writer;
65→ ignore (Plumb.Exec.shutdown running))
66→ 1→(executable
2→ (name main)
3→ (public_name plumb)
4→ (libraries plumb language runtime unix threads.posix))
5→ 1→(** Bridge — stdio mux/demux adapter for agent subprocesses.
2→
3→ Natural transformation between ZMQ topic-framed sockets and the
4→ agent binary's port-multiplexed stdin/stdout protocol. The agent
5→ sees the same [{"__port":...,"msg":...}] envelopes as under exec.ml.
6→
7→ All pipe writes use [Mutex] + [Eio_unix.run_in_systhread] (R2).
8→
9→ The [__port] envelope protocol is confined to this module and the
10→ agent binary — it must never appear in [fabric.ml] or [inline.ml]
11→ (R7: the bridge is NOT a morphism, it is a natural transformation
12→ between two concrete stream representations). *)
13→
14→open Language
15→
16→module T = Transport.Zmq
17→
18→(** {1 EOF Sentinel}
19→
20→ An empty-payload message on a PUSH socket signals end-of-stream.
21→ The receiver exits its recv loop on seeing this sentinel. *)
22→
23→let send_eof sock ~topic =
24→ T.send sock ~topic ""
25→
26→let is_eof payload =
27→ String.length payload = 0
28→
29→(** {1 Process Handler Type}
30→
31→ The fabric calls a process handler for each subprocess entry.
32→ The handler receives the process's ZMQ socket endpoints and starts
33→ the subprocess. The fabric does not know how. *)
34→
35→type process_handler =
36→ sw:Eio.Switch.t ->
37→ proc_entry:Runtime.Routing.process_entry ->
38→ push_sockets:(string * [`Push] T.socket) list ->
39→ pull_sockets:(string * [`Pull] T.socket) list ->
40→ unit
41→
42→(** {1 Tool Dispatch Callback}
43→
44→ The bridge accepts an optional tool dispatch callback rather than
45→ depending on [Tool_dispatch] directly. The caller constructs
46→ this from [Tool_dispatch.tool_dispatch_loop]. *)
47→
48→type tool_dispatch =
49→ req_fd:Unix.file_descr ->
50→ resp_fd:Unix.file_descr ->
51→ agent_name:string ->
52→ unit
53→
54→(** {1 Observation Callback} *)
55→
56→type observation = {
57→ agent : string;
58→ payload : Yojson.Safe.t;
59→}
60→
61→(** {1 Pipe Write Helpers}
62→
63→ All writes to child stdin go through mux_envelope / mux_eof,
64→ protected by an OCaml [Mutex]. These MUST be called from within
65→ [Eio_unix.run_in_systhread] — [Mutex.lock] is not safe from
66→ Eio fibres. *)
67→
68→(** Write a port-tagged envelope to the child stdin.
69→ Caller must hold (or be inside run_in_systhread for) the mux mutex. *)
70→let write_envelope mux_m oc port json_str =
71→ match Yojson.Safe.from_string json_str with
72→ | _ ->
73→ let s = Printf.sprintf
74→ "{\"__port\":%S,\"msg\":%s}\n" port json_str in
75→ Mutex.lock mux_m;
76→ Fun.protect ~finally:(fun () -> Mutex.unlock mux_m) (fun () ->
77→ output_string oc s; flush oc)
78→ | exception Yojson.Json_error msg ->
79→ Runtime.Log.warn "bridge_mux"
80→ [("port", `String port); ("error", `String msg)]
81→
82→(** Write an EOF marker for a port. *)
83→let write_eof mux_m oc port =
84→ let s = Printf.sprintf
85→ "{\"__port\":%S,\"__eof\":true}\n" port in
86→ Mutex.lock mux_m;
87→ Fun.protect ~finally:(fun () -> Mutex.unlock mux_m) (fun () ->
88→ output_string oc s; flush oc)
89→
90→(** {1 Bridge Construction} *)
91→
92→(** Construct a process handler that spawns agent subprocesses with
93→ stdio bridge. *)
94→let make_handler
95→ ~(prog : program)
96→ ~type_env
97→ ~(bin_dir : string option)
98→ ~(resource_find : string -> string)
99→ ?(resolve_bin : (string -> string) option)
100→ ?(tool_dispatch_fn : tool_dispatch option)
101→ ?(on_observe : (observation -> unit) option)
102→ ()
103→ : process_handler =
104→ fun ~sw ~proc_entry ~push_sockets ~pull_sockets ->
105→ let pe = proc_entry in
106→ let binding = pe.Runtime.Routing.proc_binding in
107→ let config = match binding.impl with
108→ | Agent config -> config
109→ | _ -> failwith "bridge: expected Agent impl"
110→ in
111→
112→ (* Resolve prompt paths *)
113→ let config = Spawn_lib.resolve_prompts_config
114→ ~resource_find ~base_dir:(Sys.getcwd ()) config in
115→ let binding = { binding with impl = Agent config } in
116→
117→ (* Decompose input/output types *)
118→ let decomp_in = Spawn_lib.agent_input_decomposition type_env binding.input in
119→ let decomp_out = Spawn_lib.agent_output_decomposition type_env binding.output in
120→ let has_control = decomp_in <> None in
121→ let has_telemetry = decomp_out <> None in
122→
123→ (* Synthesize child spec with simple types (no products) *)
124→ let child_binding =
125→ let b = match decomp_in with
126→ | Some (data_t, _) -> { binding with input = Stream data_t }
127→ | None -> binding
128→ in
129→ match decomp_out with
130→ | Some (result_t, _) -> { b with output = Stream result_t }
131→ | None -> b
132→ in
133→
134→ (* Find tool bindings *)
135→ let tool_names = match List.assoc_opt "tools" config with
136→ | Some (Arr items) ->
137→ List.filter_map (fun item ->
138→ match item with Ident name -> Some name | _ -> None
139→ ) items
140→ | _ -> []
141→ in
142→ let tool_bindings = List.filter_map (fun name ->
143→ List.find_opt (fun (b : binding) ->
144→ b.bd_name = name && Validate.is_tool_binding b
145→ ) prog.prog_bindings
146→ ) tool_names in
147→ let has_tools = tool_bindings <> [] || tool_dispatch_fn <> None in
148→
149→ (* Synthesize spec *)
150→ let spec_source =
151→ if tool_bindings <> [] then
152→ Spawn_lib.synthesize_agent_spec prog child_binding tool_bindings
153→ else
154→ Spawn_lib.synthesize_spec prog child_binding
155→ in
156→ let spec_path = Spawn_lib.write_temp_spec spec_source in
157→
158→ (* Resolve binary *)
159→ let resolve = match resolve_bin with
160→ | Some f -> f
161→ | None -> Spawn_lib.resolve_binary ~bin_dir
162→ in
163→ let bin_path = resolve "plumb-agent" in
164→
165→ (* Create child stdin/stdout pipes *)
166→ let child_in_r, child_in_w = Unix.pipe ~cloexec:true () in
167→ let child_out_r, child_out_w = Unix.pipe ~cloexec:true () in
168→ let child_in_oc = Unix.out_channel_of_descr child_in_w in
169→
170→ (* MCP connections and env *)
171→ let mcp_connections = match List.assoc_opt "mcp" config with
172→ | Some (Arr items) ->
173→ List.filter_map (fun item ->
174→ match item with
175→ | Obj kvs ->
176→ (match Spawn_lib.connect_mcp_server kvs with
177→ | Ok conn -> Some conn
178→ | Error msg ->
179→ Runtime.Log.warn "mcp_connect_failed"
180→ [("detail", `String msg)];
181→ None)
182→ | _ -> None
183→ ) items
184→ | _ -> []
185→ in
186→ let mcp_env = if mcp_connections <> [] then begin
187→ let mcp_schemas = List.concat_map (fun conn ->
188→ List.map Mcp_client.tool_schema (Mcp_client.tools conn)
189→ ) mcp_connections in
190→ let mcp_path = Filename.temp_file "agentpi_mcp_" ".json" in
191→ let mcp_oc = open_out_gen
192→ [Open_wronly; Open_creat; Open_trunc] 0o600 mcp_path in
193→ Fun.protect ~finally:(fun () -> close_out mcp_oc) (fun () ->
194→ output_string mcp_oc
195→ (Yojson.Safe.to_string (`List mcp_schemas)));
196→ [("PLUMB_MCP_TOOLS", mcp_path)]
197→ end else [] in
198→
199→ (* Agent children need LLM credentials, docroot, and name *)
200→ let env = Spawn_lib.credential_env () @ Spawn_lib.provider_env ()
201→ @ [("PLUMB_AGENT_NAME", binding.bd_name)]
202→ @ mcp_env in
203→
204→ (* Spawn child — closes child_in_r and child_out_w in parent *)
205→ let pid = Spawn_lib.spawn_subprocess ~bin_path ~spec_path
206→ ~stdin_fd:child_in_r ~stdout_fd:child_out_w
207→ ~env () in
208→ Runtime.Log.info "bridge"
209→ ["event", `String "spawned";
210→ "agent", `String binding.bd_name;
211→ "pid", `Int pid;
212→ "has_control", `Bool has_control;
213→ "has_telemetry", `Bool has_telemetry;
214→ "has_tools", `Bool has_tools];
215→
216→ (* Mux: OCaml Mutex for system-thread safety (R2) *)
217→ let mux_m = Mutex.create () in
218→
219→ (* Track active mux sources; close child stdin when last exits *)
220→ let n_sources = 2 + (if has_tools && tool_dispatch_fn <> None then 1 else 0) in
221→ let mux_remaining = ref n_sources in
222→ let mux_rem_m = Mutex.create () in
223→ let mux_source_done () =
224→ Mutex.lock mux_rem_m;
225→ decr mux_remaining;
226→ let n = !mux_remaining in
227→ Mutex.unlock mux_rem_m;
228→ if n = 0 then close_out_noerr child_in_oc
229→ in
230→
231→ (* Find sockets by port name *)
232→ let find_pull port = List.assoc_opt port pull_sockets in
233→ let find_push port = List.assoc_opt port push_sockets in
234→
235→ (* === Tool dispatch setup === *)
236→ let tool_req_oc = ref None in
237→ let tool_resp_r = ref None in
238→ (match tool_dispatch_fn with
239→ | Some dispatch when has_tools ->
240→ let req_r, req_w = Unix.pipe ~cloexec:true () in
241→ let resp_r', resp_w = Unix.pipe ~cloexec:true () in
242→ tool_req_oc := Some (Unix.out_channel_of_descr req_w);
243→ tool_resp_r := Some resp_r';
244→ let _dispatch_thread = Thread.create (fun () ->
245→ dispatch ~req_fd:req_r ~resp_fd:resp_w
246→ ~agent_name:binding.bd_name
247→ ) () in
248→ ()
249→ | _ -> ());
250→
251→ (* === Mux: data input (ZMQ PULL → child stdin) ===
252→
253→ T.recv runs in Eio fibre context (uses Eio_unix.await_readable).
254→ Pipe writes use Eio_unix.run_in_systhread + Mutex. *)
255→ (match find_pull "data_in" with
256→ | Some data_in_sock ->
257→ Eio.Fiber.fork_daemon ~sw (fun () ->
258→ (try
259→ while true do
260→ let (_topic, payload) = T.recv data_in_sock in
261→ if is_eof payload then begin
262→ Eio_unix.run_in_systhread (fun () ->
263→ (try write_eof mux_m child_in_oc "input"
264→ with Sys_error _ -> ()));
265→ raise Exit
266→ end;
267→ Eio_unix.run_in_systhread (fun () ->
268→ try write_envelope mux_m child_in_oc "input" payload
269→ with Sys_error _ -> raise Exit)
270→ done
271→ with
272→ | Exit -> ()
273→ | Transport.Transport_error (_, _msg) -> ());
274→ (* After data EOF, send ctrl stop (R4) *)
275→ if has_control then
276→ Eio_unix.run_in_systhread (fun () ->
277→ try
278→ let s = "{\"__port\":\"ctrl_in\",\"msg\":{\"stop\":true}}\n" in
279→ Mutex.lock mux_m;
280→ Fun.protect ~finally:(fun () -> Mutex.unlock mux_m) (fun () ->
281→ output_string child_in_oc s; flush child_in_oc)
282→ with Sys_error _ -> ());
283→ Eio_unix.run_in_systhread (fun () -> mux_source_done ());
284→ `Stop_daemon)
285→ | None ->
286→ (* No data input socket — mux source done immediately *)
287→ mux_source_done ());
288→
289→ (* === Mux: control input (ZMQ PULL → child stdin) === *)
290→ (match find_pull "ctrl_in" with
291→ | Some ctrl_in_sock ->
292→ Eio.Fiber.fork_daemon ~sw (fun () ->
293→ (try
294→ while true do
295→ let (_topic, payload) = T.recv ctrl_in_sock in
296→ if is_eof payload then begin
297→ Eio_unix.run_in_systhread (fun () ->
298→ (try write_eof mux_m child_in_oc "ctrl_in"
299→ with Sys_error _ -> ()));
300→ raise Exit
301→ end;
302→ Eio_unix.run_in_systhread (fun () ->
303→ try write_envelope mux_m child_in_oc "ctrl_in" payload
304→ with Sys_error _ -> raise Exit)
305→ done
306→ with
307→ | Exit -> ()
308→ | Transport.Transport_error (_, _msg) -> ());
309→ Eio_unix.run_in_systhread (fun () -> mux_source_done ());
310→ `Stop_daemon)
311→ | None ->
312→ (* No control socket — mux source done immediately *)
313→ mux_source_done ());
314→
315→ (* === Mux: tool responses (pipe → child stdin) === *)
316→ (match !tool_resp_r with
317→ | Some resp_r' ->
318→ Eio.Fiber.fork_daemon ~sw (fun () ->
319→ Eio_unix.run_in_systhread (fun () ->
320→ let ic = Unix.in_channel_of_descr resp_r' in
321→ (try while true do
322→ let line = input_line ic in
323→ write_envelope mux_m child_in_oc "tool_resp" line
324→ done with End_of_file | Sys_error _ -> ());
325→ close_in_noerr ic;
326→ mux_source_done ());
327→ `Stop_daemon)
328→ | None -> ());
329→
330→ (* === Demux: child stdout → ZMQ PUSH sockets ===
331→
332→ Pipe reads use Eio_unix.run_in_systhread (blocking input_line).
333→ ZMQ sends run in Eio fibre context. *)
334→ let data_out_sock = find_push "data_out" in
335→ let ctrl_out_sock = find_push "ctrl_out" in
336→ let telemetry_sock = find_push "telemetry" in
337→
338→ Eio.Fiber.fork_daemon ~sw (fun () ->
339→ let ic = Unix.in_channel_of_descr child_out_r in
340→ (try while true do
341→ let line = Eio_unix.run_in_systhread (fun () ->
342→ input_line ic) in
343→ if String.length (String.trim line) > 0 then
344→ match Yojson.Safe.from_string line with
345→ | `Assoc fields ->
346→ let port = match List.assoc_opt "__port" fields with
347→ | Some (`String s) -> s | _ -> "output" in
348→ let msg = match List.assoc_opt "msg" fields with
349→ | Some m -> Yojson.Safe.to_string m
350→ | None -> line in
351→ (match port with
352→ | "output" ->
353→ Option.iter (fun sock ->
354→ T.send sock ~topic:binding.bd_name msg
355→ ) data_out_sock
356→ | "ctrl_out" ->
357→ Option.iter (fun sock ->
358→ T.send sock ~topic:binding.bd_name msg
359→ ) ctrl_out_sock
360→ | "telemetry" ->
361→ Option.iter (fun sock ->
362→ T.send sock ~topic:binding.bd_name msg
363→ ) telemetry_sock;
364→ (match on_observe with
365→ | Some cb ->
366→ (try
367→ let json = Yojson.Safe.from_string msg in
368→ cb { agent = binding.bd_name; payload = json }
369→ with _ -> ())
370→ | None -> ())
371→ | "tool_req" ->
372→ Eio_unix.run_in_systhread (fun () ->
373→ match !tool_req_oc with
374→ | Some oc ->
375→ output_string oc msg;
376→ output_char oc '\n';
377→ flush oc
378→ | None ->
379→ Runtime.Log.warn "bridge_demux"
380→ [("event", `String "tool_req_no_dispatch");
381→ ("agent", `String binding.bd_name)])
382→ | _ ->
383→ Runtime.Log.warn "bridge_demux"
384→ [("port", `String port);
385→ ("agent", `String binding.bd_name)])
386→ | _ ->
387→ (* Not an envelope — treat as raw output *)
388→ Option.iter (fun sock ->
389→ T.send sock ~topic:binding.bd_name line
390→ ) data_out_sock
391→ | exception exn ->
392→ Runtime.Log.debug "bridge_demux"
393→ ["agent", `String binding.bd_name;
394→ "event", `String "json_parse_failed";
395→ "error", `String (Printexc.to_string exn)];
396→ Option.iter (fun sock ->
397→ T.send sock ~topic:binding.bd_name line
398→ ) data_out_sock
399→ done with
400→ | End_of_file -> ()
401→ | Sys_error msg ->
402→ Runtime.Log.debug "bridge_demux"
403→ ["agent", `String binding.bd_name;
404→ "event", `String "pipe_closed";
405→ "detail", `String msg]);
406→ close_in_noerr ic;
407→ (* Send EOF sentinel on all output ZMQ sockets (R5) *)
408→ Option.iter (fun sock ->
409→ (try send_eof sock ~topic:binding.bd_name with _ -> ())
410→ ) data_out_sock;
411→ Option.iter (fun sock ->
412→ (try send_eof sock ~topic:binding.bd_name with _ -> ())
413→ ) ctrl_out_sock;
414→ Option.iter (fun sock ->
415→ (try send_eof sock ~topic:binding.bd_name with _ -> ())
416→ ) telemetry_sock;
417→ (* Close tool_req write-end to terminate dispatch loop *)
418→ Eio_unix.run_in_systhread (fun () ->
419→ (match !tool_req_oc with
420→ | Some oc -> close_out_noerr oc
421→ | None -> ());
422→ (* Reap child process (R6) *)
423→ let timeout_s = match Spawn_lib.timeout_of_binding binding with
424→ | Some t -> t | None -> 60.0 in
425→ let status, timed_out =
426→ Spawn_lib.waitpid_timed ~timeout_seconds:timeout_s pid in
427→ if timed_out then
428→ Runtime.Log.warn "bridge_timeout"
429→ ["agent", `String binding.bd_name;
430→ "timeout_seconds", `Float timeout_s];
431→ (match status with
432→ | Unix.WEXITED 0 -> ()
433→ | Unix.WEXITED code ->
434→ Runtime.Log.warn "bridge_exit"
435→ ["agent", `String binding.bd_name;
436→ "code", `Int code]
437→ | Unix.WSIGNALED sig_ ->
438→ Runtime.Log.warn "bridge_signal"
439→ ["agent", `String binding.bd_name;
440→ "signal", `Int sig_]
441→ | Unix.WSTOPPED _ -> ());
442→ (* Clean up temp files *)
443→ (try Sys.remove spec_path with Sys_error _ -> ()));
444→ `Stop_daemon);
445→
446→ ignore has_telemetry
447→ 1→(** Fabric — wire ZMQ sockets from a routing table and run the pipeline.
2→
3→ The fabric is the runtime core. It:
4→ 1. Creates ZMQ sockets for all routes in the routing table
5→ 2. Binds and connects sockets per the wiring instructions
6→ 3. Seeds channels with initial tokens (D7)
7→ 4. Starts inline morphism fibres
8→ 5. Coordinates subprocess spawning via the supervisor
9→
10→ Startup sequence (D7, bind-before-connect for inproc://):
11→ 1. Create ZMQ context
12→ 2. Bind phase: create and bind all sockets, seed channels
13→ 3. Connect phase: connect all consumer sockets
14→ 4. Start phase: start inline fibres, supervisor spawns subprocesses
15→
16→ See PLMB-037 Step 4, design decisions D6–D9. *)
17→
18→open Language
19→
20→module T = Transport.Zmq
21→
22→(** A running pipeline handle. *)
23→type t = {
24→ ctx : T.context;
25→ supervisor : Supervisor.t;
26→ push_sockets : (string, [`Push] T.socket) Hashtbl.t;
27→ pull_sockets : (string, [`Pull] T.socket) Hashtbl.t;
28→ pipeline_id : string;
29→ cancel : unit -> unit;
30→ (** Cancel function: fails the inner switch, causing all socket
31→ fibres and morphism fibres to terminate. *)
32→}
33→
34→(** Endpoint naming for ipc:// transport.
35→ Format: ipc:///tmp/plumb-{pid}/{process}/{port} *)
36→let endpoint ~pipeline_id proc_name port_name =
37→ Printf.sprintf "ipc:///tmp/plumb-%s/%s/%s" pipeline_id proc_name port_name
38→
39→(** Get the PUSH socket for writing to a boundary input port.
40→ The caller sends data on this socket to feed the pipeline. *)
41→let boundary_input t port_name =
42→ let ep = endpoint ~pipeline_id:t.pipeline_id port_name "out" in
43→ Hashtbl.find t.push_sockets ep
44→
45→(** Get the PULL socket for reading from a boundary output port.
46→ The caller receives data from this socket. *)
47→let boundary_output t port_name =
48→ let ep = endpoint ~pipeline_id:t.pipeline_id port_name "in" in
49→ Hashtbl.find t.pull_sockets ep
50→
51→(** Create the fabric directory for ipc:// endpoints. *)
52→let ensure_fabric_dir pipeline_id =
53→ let dir = Printf.sprintf "/tmp/plumb-%s" pipeline_id in
54→ (try Unix.mkdir dir 0o700 with Unix.Unix_error (Unix.EEXIST, _, _) -> ());
55→ dir
56→
57→(** Clean up the fabric directory. *)
58→let cleanup_fabric_dir pipeline_id =
59→ let dir = Printf.sprintf "/tmp/plumb-%s" pipeline_id in
60→ let cleanup_dir d =
61→ try
62→ let entries = Sys.readdir d in
63→ Array.iter (fun f ->
64→ try Sys.remove (Filename.concat d f) with Sys_error _ -> ()
65→ ) entries;
66→ (try Unix.rmdir d with Unix.Unix_error _ -> ())
67→ with Sys_error _ -> ()
68→ in
69→ (* Remove per-process subdirectories *)
70→ (try
71→ let entries = Sys.readdir dir in
72→ Array.iter (fun sub ->
73→ let path = Filename.concat dir sub in
74→ if Sys.is_directory path then cleanup_dir path
75→ ) entries
76→ with Sys_error _ -> ());
77→ cleanup_dir dir
78→
79→(** Execute a pipeline from a routing table.
80→
81→ This is the main entry point. It replaces [Exec.execute]. *)
82→let execute ~sw ~routing_table ~type_env
83→ ?(pipeline_id = string_of_int (Unix.getpid ()))
84→ ?process_handler () =
85→ let ctx = T.create_context () in
86→ let _fabric_dir = ensure_fabric_dir pipeline_id in
87→ let sup = Supervisor.create () in
88→
89→ (* Create subdirectories for all endpoints (processes and channels) *)
90→ let all_names = Hashtbl.create 16 in
91→ List.iter (fun (r : Runtime.Routing.route) ->
92→ Hashtbl.replace all_names r.src ();
93→ Hashtbl.replace all_names r.dst ()
94→ ) routing_table.Runtime.Routing.routes;
95→ Hashtbl.iter (fun name () ->
96→ let dir = Printf.sprintf "/tmp/plumb-%s/%s" pipeline_id name in
97→ (try Unix.mkdir dir 0o700 with Unix.Unix_error (Unix.EEXIST, _, _) -> ())
98→ ) all_names;
99→
100→ (* === Phase 1: Create and bind sockets ===
101→
102→ For each route, create a socket pair (sender, receiver).
103→ The sender binds; the receiver connects.
104→ Routes are keyed by (src, src_port) → sender socket,
105→ and (dst, dst_port) → receiver socket. *)
106→
107→ (* Socket tables: endpoint → socket *)
108→ let push_sockets : (string, [`Push] T.socket) Hashtbl.t = Hashtbl.create 16 in
109→ let pull_sockets : (string, [`Pull] T.socket) Hashtbl.t = Hashtbl.create 16 in
110→
111→ (* For each route, ensure sender and receiver sockets exist *)
112→ List.iter (fun (r : Runtime.Routing.route) ->
113→ let ep = endpoint ~pipeline_id r.src r.src_port in
114→ if not (Hashtbl.mem push_sockets ep) then begin
115→ let sock = match r.kind with
116→ | Runtime.Routing.Fan_out ->
117→ (* PUB/SUB for copy — but we use PUSH/PULL per-destination
118→ since our single-frame encoding doesn't support ZMQ topic
119→ filtering. The routing table handles fan-out by creating
120→ multiple routes from the same source. *)
121→ let s = T.push_socket ctx ~sw in
122→ T.bind s ep;
123→ s
124→ | _ ->
125→ let s = T.push_socket ctx ~sw in
126→ T.bind s ep;
127→ s
128→ in
129→ Hashtbl.replace push_sockets ep sock
130→ end;
131→ let ep_dst = endpoint ~pipeline_id r.dst r.dst_port in
132→ if not (Hashtbl.mem pull_sockets ep_dst) then begin
133→ let sock = T.pull_socket ctx ~sw in
134→ Hashtbl.replace pull_sockets ep_dst sock
135→ end
136→ ) routing_table.Runtime.Routing.routes;
137→
138→ (* === Phase 2: Connect sockets === *)
139→
140→ List.iter (fun (r : Runtime.Routing.route) ->
141→ let ep = endpoint ~pipeline_id r.src r.src_port in
142→ let ep_dst = endpoint ~pipeline_id r.dst r.dst_port in
143→ match Hashtbl.find_opt pull_sockets ep_dst with
144→ | Some pull_sock -> T.connect pull_sock ep
145→ | None ->
146→ Runtime.Log.warn "fabric"
147→ ["event", `String "no_pull_socket";
148→ "dst", `String r.dst;
149→ "dst_port", `String r.dst_port]
150→ ) routing_table.Runtime.Routing.routes;
151→
152→ (* === Phase 3: Seed channels (D7) ===
153→ Send null token on seeded channels before starting fibres. *)
154→
155→ List.iter (fun ch_name ->
156→ (* Find the PUSH socket that writes to this channel *)
157→ let matching_routes = List.filter (fun (r : Runtime.Routing.route) ->
158→ r.topic = ch_name && r.src <> ch_name
159→ ) routing_table.Runtime.Routing.routes in
160→ match matching_routes with
161→ | r :: _ ->
162→ let ep = endpoint ~pipeline_id r.src r.src_port in
163→ (match Hashtbl.find_opt push_sockets ep with
164→ | Some sock -> T.send sock ~topic:ch_name "null"
165→ | None ->
166→ Runtime.Log.warn "fabric"
167→ ["event", `String "seed_no_socket";
168→ "channel", `String ch_name])
169→ | [] ->
170→ Runtime.Log.warn "fabric"
171→ ["event", `String "seed_no_route";
172→ "channel", `String ch_name]
173→ ) routing_table.Runtime.Routing.seeded_channels;
174→
175→ (* === Phase 4: Start inline morphism fibres ===
176→
177→ Each inline morphism reads from its input PULL socket(s) and
178→ writes to its output PUSH socket(s). Subprocess-boundary
179→ processes are handled by the supervisor (deferred — agents
180→ need the agent binary rewrite from Step 5). *)
181→
182→ let find_push proc_name port =
183→ let ep = endpoint ~pipeline_id proc_name port in
184→ match Hashtbl.find_opt push_sockets ep with
185→ | Some s -> s
186→ | None -> failwith (Printf.sprintf "fabric: no PUSH socket for %s:%s" proc_name port)
187→ in
188→ let find_pull proc_name port =
189→ let ep = endpoint ~pipeline_id proc_name port in
190→ match Hashtbl.find_opt pull_sockets ep with
191→ | Some s -> s
192→ | None -> failwith (Printf.sprintf "fabric: no PULL socket for %s:%s" proc_name port)
193→ in
194→
195→ List.iter (fun (pe : Runtime.Routing.process_entry) ->
196→ match pe.proc_kind with
197→ | Runtime.Routing.Subprocess ->
198→ (match process_handler with
199→ | Some handler ->
200→ (* Collect sockets for this subprocess.
201→ PUSH sockets: routes where src = proc_name (process writes)
202→ PULL sockets: routes where dst = proc_name (process reads) *)
203→ let proc_push = List.filter_map (fun (r : Runtime.Routing.route) ->
204→ if r.src = pe.proc_name then
205→ let ep = endpoint ~pipeline_id r.src r.src_port in
206→ match Hashtbl.find_opt push_sockets ep with
207→ | Some s -> Some (r.src_port, s)
208→ | None -> None
209→ else None
210→ ) routing_table.Runtime.Routing.routes in
211→ let proc_pull = List.filter_map (fun (r : Runtime.Routing.route) ->
212→ if r.dst = pe.proc_name then
213→ let ep = endpoint ~pipeline_id r.dst r.dst_port in
214→ match Hashtbl.find_opt pull_sockets ep with
215→ | Some s -> Some (r.dst_port, s)
216→ | None -> None
217→ else None
218→ ) routing_table.Runtime.Routing.routes in
219→ handler ~sw ~proc_entry:pe
220→ ~push_sockets:proc_push ~pull_sockets:proc_pull
221→ | None ->
222→ Runtime.Log.info "fabric"
223→ ["event", `String "subprocess_deferred";
224→ "name", `String pe.proc_name])
225→ | Runtime.Routing.Inline ->
226→ Eio.Fiber.fork_daemon ~sw (fun () ->
227→ (try match pe.proc_impl with
228→ | Id ->
229→ let msg_type = Validate.unwrap_stream pe.proc_binding.input in
230→ Inline.forward_id ~name:pe.proc_name type_env msg_type
231→ (find_pull pe.proc_name "in")
232→ (find_push pe.proc_name "out")
233→ | Copy ->
234→ let msg_type = Validate.unwrap_stream pe.proc_binding.input in
235→ Inline.forward_copy ~name:pe.proc_name type_env msg_type
236→ (find_pull pe.proc_name "in")
237→ (find_push pe.proc_name "out0")
238→ (find_push pe.proc_name "out1")
239→ | Merge ->
240→ let inp = Validate.unwrap_stream pe.proc_binding.input in
241→ let in0_t, in1_t = match Validate.resolve type_env inp with
242→ | Product [t0; t1] -> (t0, t1)
243→ | t -> (t, t)
244→ in
245→ Inline.forward_merge ~name:pe.proc_name type_env in0_t in1_t
246→ (find_pull pe.proc_name "in0")
247→ (find_pull pe.proc_name "in1")
248→ (find_push pe.proc_name "out")
249→ | MergeAny ->
250→ let inp = Validate.unwrap_stream pe.proc_binding.input in
251→ let in0_t, in1_t = match Validate.resolve type_env inp with
252→ | Product [t0; t1] -> (t0, t1)
253→ | t -> (t, t)
254→ in
255→ Inline.forward_merge_any ~name:pe.proc_name type_env in0_t in1_t
256→ (find_pull pe.proc_name "in0")
257→ (find_pull pe.proc_name "in1")
258→ (find_push pe.proc_name "out")
259→ | Barrier ->
260→ let inp = Validate.unwrap_stream pe.proc_binding.input in
261→ let in_types = match Validate.resolve type_env inp with
262→ | Product ts -> ts
263→ | _ -> failwith "barrier: expected product input"
264→ in
265→ let n = List.length in_types in
266→ let recv_socks = List.init n (fun i ->
267→ find_pull pe.proc_name (Printf.sprintf "in%d" i)) in
268→ Inline.forward_barrier ~name:pe.proc_name type_env
269→ in_types recv_socks
270→ (find_push pe.proc_name "out")
271→ | Discard ->
272→ let msg_type = Validate.unwrap_stream pe.proc_binding.input in
273→ Inline.drain_discard ~name:pe.proc_name type_env msg_type
274→ (find_pull pe.proc_name "in")
275→ | Log label ->
276→ Inline.forward_log label
277→ (find_pull pe.proc_name "in")
278→ | FormatJson ->
279→ Inline.forward_format_json ~name:pe.proc_name
280→ (find_pull pe.proc_name "in")
281→ (find_push pe.proc_name "out")
282→ | ParseJson ->
283→ Inline.forward_parse_json ~name:pe.proc_name
284→ (find_pull pe.proc_name "in")
285→ (find_push pe.proc_name "out")
286→ | Map expr ->
287→ let in_type = Validate.unwrap_stream pe.proc_binding.input in
288→ let out_type = Validate.unwrap_stream pe.proc_binding.output in
289→ Inline.forward_map ~name:pe.proc_name type_env
290→ in_type out_type expr
291→ (find_pull pe.proc_name "in")
292→ (find_push pe.proc_name "out")
293→ | Filter expr ->
294→ let msg_type = Validate.unwrap_stream pe.proc_binding.input in
295→ Inline.forward_filter ~name:pe.proc_name type_env msg_type expr
296→ (find_pull pe.proc_name "in")
297→ (find_push pe.proc_name "out")
298→ | Project n ->
299→ Inline.forward_project ~name:pe.proc_name n
300→ (find_pull pe.proc_name "in")
301→ (find_push pe.proc_name "out")
302→ | Tagger (indices, label) ->
303→ Inline.forward_tagger ~name:pe.proc_name indices label
304→ (find_pull pe.proc_name "in")
305→ (find_push pe.proc_name "out")
306→ | Agent _ | Plumb _ ->
307→ (* Should not reach here — filtered by proc_kind *)
308→ failwith "unreachable: subprocess in inline dispatch"
309→ | Tool _ | Lower _ | Builtin _ ->
310→ failwith "unreachable: tool binding in inline dispatch"
311→ with
312→ | Inline.Morphism_fatal (label, _extras) ->
313→ Runtime.Log.error "fabric"
314→ ["event", `String "morphism_fatal";
315→ "morphism", `String label]
316→ | Transport.Transport_error (_, msg) ->
317→ Runtime.Log.debug "fabric"
318→ ["event", `String "transport_closed";
319→ "morphism", `String pe.proc_name;
320→ "detail", `String msg]);
321→ `Stop_daemon)
322→ ) routing_table.Runtime.Routing.processes;
323→
324→ (* The cancel function: force-close all sockets. This closes the
325→ underlying ZMQ sockets directly, bypassing zmq-eio's close
326→ protocol. The zmq-eio event loops will fail on their next
327→ operation and terminate, allowing the Eio switch to exit. *)
328→ let cancel () =
329→ Hashtbl.iter (fun _ s ->
330→ (try T.force_close s with _ -> ())) push_sockets;
331→ Hashtbl.iter (fun _ s ->
332→ (try T.force_close s with _ -> ())) pull_sockets
333→ in
334→ { ctx; supervisor = sup; push_sockets; pull_sockets; pipeline_id; cancel }
335→
336→(** Re-export Bridge for external access. *)
337→module Bridge = Bridge
338→
339→(** Shut down the pipeline.
340→ Closes all sockets (triggering recv timeout/failure in system
341→ threads), then destroys the ZMQ context and cleans up. *)
342→let shutdown t =
343→ let _ = Supervisor.shutdown t.supervisor in
344→ t.cancel ();
345→ (* Wait for system threads blocked in recv to notice the socket
346→ closure (up to the 1s receive timeout) *)
347→ Unix.sleepf 1.5;
348→ T.destroy_context t.ctx;
349→ cleanup_fabric_dir t.pipeline_id
350→ 1→(** Plumb execution — create channels, spawn processes.
2→
3→ The plumb interpreter: takes a program and a main binding (with Plumb impl),
4→ creates Unix pipes for boundary ports and declared channels, executes
5→ process bindings, and returns handles to the pipeline's boundary ports.
6→
7→ Structural morphisms (id, copy, merge, discard) execute inline as
8→ forwarding threads. Only agents and nested plumbs spawn subprocesses —
9→ they need isolation for API credentials, file permissions, and failure
10→ domains.
11→
12→ Structural morphisms can be spawned directly by keyword without a
13→ named binding: [spawn copy(ch_in, ch_a, ch_b)]. Types are inferred
14→ from the channel declarations and checked for consistency.
15→
16→ Each spawned child receives a synthesized minimal spec containing only its
17→ own binding and the type declarations it depends on. This gives proper
18→ lexical scoping — children see only their own binding, regardless of where
19→ it was defined (top-level or plumb-local). *)
20→
21→open Language
22→open Forward
23→open Spawn
24→
25→(** Re-export types for external callers (e.g. bin/plumb, bin/chat). *)
26→type tool_env = Tool_dispatch.tool_env = { docroot : string option }
27→let default_tool_env = Tool_dispatch.default_tool_env
28→type observation = Tool_dispatch.observation = { agent : string; payload : Yojson.Safe.t }
29→type oitl_request = Tool_dispatch.oitl_request = {
30→ oitl_id : string;
31→ oitl_tool : string;
32→ oitl_input : Yojson.Safe.t;
33→ oitl_agent : string;
34→}
35→type oitl_response = Tool_dispatch.oitl_response = {
36→ oitl_approved : bool;
37→ oitl_policy : string option;
38→}
39→
40→(** Re-export functions from Spawn and Tool_dispatch for backward compatibility. *)
41→let synthesize_agent_spec = Spawn.synthesize_agent_spec
42→let required_types_multi = Spawn.required_types_multi
43→let timeout_of_binding = Spawn.timeout_of_binding
44→let waitpid_timed = Spawn.waitpid_timed
45→let dispatch_tool = Tool_dispatch.dispatch_tool
46→let tool_dispatch_loop = Tool_dispatch.tool_dispatch_loop
47→let tool_error_response = Tool_dispatch.tool_error_response
48→
49→(** {1 Running Pipeline} *)
50→ 1→(** CLI envelope for processes. *)
2→
3→let read_file path =
4→ try
5→ let ic = open_in path in
6→ let n = in_channel_length ic in
7→ let s = Bytes.create n in
8→ really_input ic s 0 n;
9→ close_in ic;
10→ Bytes.to_string s
11→ with Sys_error msg ->
12→ Error.raise_ Config "config_error" msg
13→
14→(** Resolve docroot from CLI args or environment.
15→ [--docroot DIR] takes priority over [PLUMB_DOCROOT].
16→ Returns [None] if neither is set (runtime defaults to cwd). *)
17→let resolve_docroot args =
18→ let rec scan = function
19→ | "--docroot" :: dir :: _ -> Some dir
20→ | _ :: rest -> scan rest
21→ | [] -> None
22→ in
23→ match scan (Array.to_list args) with
24→ | Some _ as v -> v
25→ | None -> Sys.getenv_opt "PLUMB_DOCROOT"
26→
27→(** Flags that take a value argument. *)
28→let flags_with_value = ["--docroot"; "--provider"; "--model"]
29→
30→(** Boolean flags (no value argument). *)
31→let bool_flags = ["--auto-approve"; "--verbose"]
32→
33→(** Strip known flags from argv, returning remaining args. *)
34→let strip_docroot args =
35→ let rec go acc = function
36→ | x :: _ :: rest when List.mem x flags_with_value -> go acc rest
37→ | x :: rest when List.mem x bool_flags -> go acc rest
38→ | x :: rest -> go (x :: acc) rest
39→ | [] -> List.rev acc
40→ in
41→ go [] (Array.to_list args) |> Array.of_list
42→
43→(** Resolve a flag value from CLI args.
44→ Returns [Some value] if [--flag value] is found, [None] otherwise. *)
45→let resolve_flag args flag =
46→ let rec scan = function
47→ | x :: v :: _ when x = flag -> Some v
48→ | _ :: rest -> scan rest
49→ | [] -> None
50→ in
51→ scan (Array.to_list args)
52→
53→let main ~usage ~run =
54→ if Array.exists (fun s -> s = "--version") Sys.argv then begin
55→ print_endline Version.string;
56→ exit 0
57→ end;
58→ let args = strip_docroot Sys.argv in
59→ if Array.length args >= 2 && args.(1) = "--help" then begin
60→ print_endline usage;
61→ exit 0
62→ end;
63→ if Array.length args < 2 then begin
64→ Printf.eprintf "%s\n" usage;
65→ exit 2
66→ end;
67→ let spec_path = args.(1) in
68→ try run spec_path
69→ with
70→ | Error.Process_error err ->
71→ Printf.eprintf "%s\n" (Error.to_string err);
72→ exit (Error.exit_code err)
73→ | Sys.Break -> exit 130
74→ | exn ->
75→ Printf.eprintf "{\"error\":\"internal\",\"detail\":\"%s\"}\n"
76→ (String.escaped (Printexc.to_string exn));
77→ exit 1
78→ 1→(** Routing table — compiler output for the ZMQ fabric.
2→
3→ Two-phase compilation:
4→ 1. The compiler walks the plumb body's spawn list and emits
5→ intermediate routes that reference channel names as nodes.
6→ 2. [resolve_channels] eliminates channel intermediaries by
7→ cross-joining writer and reader routes into direct
8→ process-to-process routes. This is the operational realisation
9→ of ν-restriction in composition (f ; g = (νc)(f(in,c) | g(c,out))).
10→
11→ The fabric reads the resolved routing table to create ZMQ sockets.
12→
13→ Design decisions D1 (types in lib/runtime) and D10 (routing_table
14→ terminology). See PLMB-037. *)
15→
16→open Language
17→
18→(** Channel transport pattern, determined statically by the compiler. *)
19→type channel_kind =
20→ | Point_to_point
21→ (** PUSH/PULL — reliable, blocking backpressure. Default for all
22→ algebraic channels except copy fan-out. *)
23→ | Fan_out
24→ (** PUB/SUB — copy morphism only. HWM set to unlimited. *)
25→ | Bidirectional
26→ (** PAIR — control channels between supervisor and inline morphisms. *)
27→
28→(** Whether a process runs inline (Eio fibre) or as a subprocess. *)
29→type process_kind =
30→ | Inline
31→ (** Structural morphisms: id, copy, merge, barrier, map, filter, etc.
32→ Run as Eio fibres within the fabric. *)
33→ | Subprocess
34→ (** Agents and nested plumb pipelines. Fork/exec'd by the supervisor. *)
35→
36→(** A route: one directional connection between two processes. *)
37→type route = {
38→ src : string; (** source process name *)
39→ src_port : string; (** source port name *)
40→ dst : string; (** destination process name *)
41→ dst_port : string; (** destination port name *)
42→ kind : channel_kind; (** transport pattern *)
43→ topic : string; (** provenance tag for PUB/SUB filtering *)
44→ via : string option; (** original channel name, if resolved *)
45→}
46→
47→(** A process entry in the routing table. *)
48→type process_entry = {
49→ proc_name : string;
50→ proc_kind : process_kind;
51→ proc_impl : process_impl;
52→ proc_binding : binding;
53→}
54→
55→(** The routing table: compiler output consumed by the fabric.
56→
57→ Invariant: all process names are unique within a table.
58→ The compiler guarantees this via per-binding instance numbering
59→ (e.g. [copy0], [copy1], [merge0]). *)
60→type routing_table = { 1→(** Spec loading — parse a plumb program, find a binding by impl kind,
2→ build a type environment.
3→
4→ Spawned children receive synthesized minimal specs (one binding + its
5→ type dependencies), so the search here only needs to match by impl kind
6→ among top-level bindings. *)
7→
8→open Language
9→
10→type loaded_binding = {
11→ binding : binding;
12→ type_env : Validate.type_env;
13→}
14→
15→(** Structural morphism names are reserved — they cannot be used as
16→ binding names because their meaning in spawn position must be
17→ unambiguous. Use a different name: [let passthrough : A -> A = id]. *)
18→let reserved_names = ["id"; "copy"; "merge"; "discard"; "map"; "filter"; "barrier"; "tool"]
19→
20→let check_reserved_names bindings =
21→ List.iter (fun (b : binding) ->
22→ if List.mem b.bd_name reserved_names then
23→ Error.raise_ Config "config_error"
24→ (Printf.sprintf
25→ "%s is a structural morphism and cannot be used as a binding name"
26→ b.bd_name)
27→ ) bindings
28→
29→(** Totality determines which impl kinds admit bare types (base category).
30→ Total morphisms produce exactly one output per input — they can
31→ operate at either categorical level. Non-total morphisms (filter,
32→ merge, barrier) and stateful processes (agent, plumb) are Kleisli-only. *)
33→let admits_bare_type = function
34→ | Id | Map _ | Copy | Discard | Tool _ | Lower _ | Builtin _
35→ | FormatJson | ParseJson -> true
36→ | Filter _ | Merge | MergeAny | Barrier | Project _ | Agent _ | Plumb _
37→ | Tagger _ | Log _ -> false
38→
39→(** Whether a stream morphism can be lowered to a tool via η ; f ; ε.
40→ Requires totality: the process must always produce at least one output. *) 1→(** Process spawning, spec synthesis, and subprocess infrastructure.
2→
3→ Extracted from [lib/plumb/spawn.ml] to break the dependency between
4→ the fabric and the plumb library. Dune-site paths (bin_dir,
5→ resource_find) are injected by callers — this library stays
6→ dune-site-free. *)
7→
8→open Language
9→
10→(** {1 Binary Resolution} *)
11→
12→let binary_of_impl = function
13→ | Agent _ -> "plumb-agent"
14→ | Plumb _ -> "plumb"
15→ | Tool _ | Lower _ | Builtin _ -> failwith "tools are not spawned as pipeline nodes"
16→ | Id | Copy | Merge | MergeAny | Discard | Barrier | Project _ | Map _
17→ | Filter _ | Tagger _ | Log _ | FormatJson | ParseJson ->
18→ failwith "structural morphisms do not need binaries"
19→
20→(** Resolve a sibling process binary. Tries two strategies:
21→ 1. Sibling in the same directory (works after [dune install])
22→ 2. Sibling in the dune _build tree (swaps the binary's subdirectory)
23→ Falls back to bare name for PATH lookup.
24→
25→ [~bin_dir] is the directory containing the current binary. Pass
26→ [Prefix_path.bin_dir] from the plumb library, or a test fixture. *)
27→let resolve_binary ~bin_dir name =
28→ let dir = match bin_dir with
29→ | Some d -> d
30→ | None -> Filename.dirname Sys.executable_name
31→ in
32→ let installed = Filename.concat dir name in
33→ if Sys.file_exists installed then installed
34→ else
35→ let parent = Filename.dirname dir in
36→ let base = Filename.basename Sys.executable_name in
37→ let build = Filename.concat (Filename.concat parent name) base in
38→ if Sys.file_exists build then build
39→ else
40→ (* In the dune build tree, directories use the short name (e.g. "agent") 60→type routing_table = {
61→ processes : process_entry list;
62→ (** All processes, in dependency order (leaves first). *)
63→ routes : route list;
64→ (** All wiring instructions. *)
65→ seeded_channels : string list;
66→ (** Channels pre-loaded with a null token (Petri net initial markings). *)
67→}
68→
69→(** Classify a process implementation as inline or subprocess. *)
70→let process_kind_of_impl = function
71→ | Agent _ | Plumb _ -> Subprocess
72→ | _ -> Inline
73→
74→(** Determine the channel kind for a (src, src_port) pair.
75→ If the same source port appears in multiple routes, it's Fan_out
76→ (copy morphism). Control channels are Bidirectional. Otherwise
77→ Point_to_point. *)
78→let classify_channel_kinds routes =
79→ (* Count how many routes share each (src, src_port) pair *)
80→ let tbl = Hashtbl.create 16 in
81→ List.iter (fun r ->
82→ let key = (r.src, r.src_port) in
83→ let n = try Hashtbl.find tbl key with Not_found -> 0 in
84→ Hashtbl.replace tbl key (n + 1)
85→ ) routes;
86→ List.map (fun r ->
87→ if r.kind = Bidirectional then r
88→ else
89→ let key = (r.src, r.src_port) in
90→ let n = try Hashtbl.find tbl key with Not_found -> 1 in
91→ if n > 1 then { r with kind = Fan_out }
92→ else r
93→ ) routes
94→
95→(** Look up the per-message type for a channel by name.
96→ Checks boundary ports first, then declared channels. *)
97→let find_channel_type ~ports ~boundary_in_types ~boundary_out_types
98→ ~n_in ~pb ch_name =
99→ match List.find_index (fun p -> p = ch_name) ports with
100→ | Some i when i < n_in -> Some (List.nth boundary_in_types i)
101→ | Some i -> Some (List.nth boundary_out_types (i - n_in))
102→ | None ->
103→ match List.find_opt (fun (ch : channel_decl) ->
104→ ch.ch_name = ch_name) pb.pb_channels with
105→ | Some ch -> Some ch.ch_type
106→ | None -> None
107→
108→(** Resolve channel intermediaries into direct process-to-process routes.
109→
110→ Channels are bound names (ν-restriction). In the intermediate routing
111→ table, they appear as route endpoints — but they are not processes and
112→ have no state. This pass eliminates them by cross-joining writers and
113→ readers through each channel, producing direct routes.
114→
115→ After resolution, no declared channel name appears as a route src or dst.
116→ The original channel name is preserved in the [via] field for debugging. *)
117→let resolve_channels ~processes ~boundary_ports routes ~channels =
118→ let channel_names =
119→ List.map (fun (ch : Language.channel_decl) -> ch.ch_name) channels in
120→ let is_channel name =
121→ List.mem name channel_names in
122→ let is_process name =
123→ List.exists (fun (pe : process_entry) -> pe.proc_name = name) processes in
124→ let is_boundary name =
125→ List.mem name boundary_ports in
126→ let is_real_node name =
127→ is_process name || is_boundary name in
128→
129→ (* Partition routes *)
130→ let direct = ref [] in
131→ let writers = Hashtbl.create 8 in (* channel → routes writing to it *)
132→ let readers = Hashtbl.create 8 in (* channel → routes reading from it *)
133→
134→ List.iter (fun r ->
135→ let dst_is_ch = is_channel r.dst in
136→ let src_is_ch = is_channel r.src in
137→ match dst_is_ch, src_is_ch with
138→ | false, false ->
139→ (* Both endpoints are real nodes — keep as-is *)
140→ direct := r :: !direct
141→ | true, false ->
142→ (* Process writes to channel *)
143→ let ch = r.dst in
144→ let existing = try Hashtbl.find writers ch with Not_found -> [] in
145→ Hashtbl.replace writers ch (r :: existing)
146→ | false, true ->
147→ (* Process reads from channel *)
148→ let ch = r.src in
149→ let existing = try Hashtbl.find readers ch with Not_found -> [] in
150→ Hashtbl.replace readers ch (r :: existing)
151→ | true, true ->
152→ (* Channel-to-channel: should not happen *)
153→ failwith (Printf.sprintf
154→ "resolve_channels: channel-to-channel route %s → %s" r.src r.dst)
155→ ) routes;
156→
157→ (* Cross-join writers and readers for each channel *)
158→ let resolved = ref (List.rev !direct) in
159→ List.iter (fun ch_name ->
160→ let ch_writers = try Hashtbl.find writers ch_name with Not_found -> [] in
161→ let ch_readers = try Hashtbl.find readers ch_name with Not_found -> [] in
162→ let n_writers = List.length ch_writers in
163→ let n_readers = List.length ch_readers in
164→
165→ match n_writers, n_readers with
166→ | 0, 0 -> () (* unused channel *)
167→ | _, 0 ->
168→ (* Writers with no readers — telemetry sinks. Re-emit unchanged. *)
169→ List.iter (fun w -> resolved := w :: !resolved) ch_writers
170→ | 0, _ ->
171→ (* Readers with no writers — likely compiler bug. *)
172→ Log.warn "routing"
173→ ["event", `String "channel_no_writers";
174→ "channel", `String ch_name]
175→ | _ when n_writers > 1 && n_readers > 1 ->
176→ failwith (Printf.sprintf
177→ "resolve_channels: channel %s has %d writers and %d readers \
178→ — no categorical meaning" ch_name n_writers n_readers)
179→ | _ ->
180→ List.iter (fun w ->
181→ List.iter (fun rd ->
182→ resolved := {
183→ src = w.src;
184→ src_port = w.src_port;
185→ dst = rd.dst;
186→ dst_port = rd.dst_port;
187→ kind = Point_to_point;
188→ topic = rd.topic;
189→ via = Some ch_name;
190→ } :: !resolved
191→ ) ch_readers
192→ ) ch_writers
193→ ) channel_names;
194→
195→ (* Validate: no channel names remain as src or dst *)
196→ List.iter (fun r ->
197→ if is_channel r.src && is_real_node r.dst then
198→ failwith (Printf.sprintf
199→ "resolve_channels: channel %s still appears as route src" r.src);
200→ if is_channel r.dst && is_real_node r.src then
201→ failwith (Printf.sprintf
202→ "resolve_channels: channel %s still appears as route dst" r.dst)
203→ ) !resolved;
204→
205→ List.rev !resolved
206→
207→let compile ~prog ~ports ~pb ~type_env ~main_binding ~seeded =
208→ let boundary_in_types = Validate.port_element_types type_env main_binding.input in
209→ let boundary_out_types = Validate.port_element_types type_env main_binding.output in
210→ let n_in = List.length boundary_in_types in
211→
212→ let ch_type name =
213→ match find_channel_type ~ports ~boundary_in_types ~boundary_out_types
214→ ~n_in ~pb name with
215→ | Some t -> t
216→ | None -> Unit
217→ in
218→
219→ let find_binding name =
220→ match List.find_opt (fun (b : binding) -> b.bd_name = name) pb.pb_bindings with
221→ | Some b -> Some b
222→ | None ->
223→ List.find_opt (fun (b : binding) -> b.bd_name = name) prog.prog_bindings
224→ in
225→
226→ (* Synthesise a binding for a direct structural morphism spawn
227→ with proper types inferred from channel declarations. *)
228→ let synthesise_structural sp_process channels =
229→ let impl = match sp_process with
230→ | "id" -> Id | "copy" -> Copy | "merge" -> Merge
231→ | "discard" -> Discard | "barrier" -> Barrier
232→ | "_format_json" -> FormatJson | "_parse_json" -> ParseJson
233→ | name -> failwith (Printf.sprintf "unknown structural morphism: %s" name)
234→ in
235→ match impl with
236→ | Id ->
237→ let t = ch_type (List.nth channels 0) in
238→ { bd_name = sp_process; bd_attrs = [];
239→ input = Stream t; output = Stream t; impl }
240→ | Copy ->
241→ let t = ch_type (List.nth channels 0) in
242→ { bd_name = sp_process; bd_attrs = [];
243→ input = Stream t;
244→ output = Stream (Product [t; t]); impl }
245→ | Merge ->
246→ let t0 = ch_type (List.nth channels 0) in
247→ let t1 = ch_type (List.nth channels 1) in
248→ let out_t = if Validate.type_eq type_env t0 t1 then t0 else Sum [t0; t1] in
249→ { bd_name = sp_process; bd_attrs = [];
250→ input = Stream (Product [t0; t1]);
251→ output = Stream out_t; impl }
252→ | Barrier ->
253→ let n = List.length channels - 1 in
254→ let in_types = List.init n (fun i -> ch_type (List.nth channels i)) in
255→ { bd_name = sp_process; bd_attrs = [];
256→ input = Stream (Product in_types);
257→ output = Stream (Product in_types); impl }
258→ | Discard ->
259→ let t = ch_type (List.nth channels 0) in
260→ { bd_name = sp_process; bd_attrs = [];
261→ input = Stream t; output = Stream Unit; impl }
262→ | FormatJson ->
263→ { bd_name = sp_process; bd_attrs = [];
264→ input = Stream (Primitive Json);
265→ output = Stream (Primitive String); impl }
266→ | ParseJson ->
267→ { bd_name = sp_process; bd_attrs = [];
268→ input = Stream (Primitive String);
269→ output = Stream (Primitive Json); impl }
270→ | _ ->
271→ { bd_name = sp_process; bd_attrs = [];
272→ input = Unit; output = Unit; impl }
273→ in
274→
275→ let processes = ref [] in
276→ let routes = ref [] in
277→
278→ (* Per-binding-name counter for unique instance names.
279→ Every spawn gets a numbered name: copy0, copy1, merge0, etc. *)
280→ let name_counters = Hashtbl.create 8 in
281→ let fresh_proc_name base =
282→ let n = try Hashtbl.find name_counters base with Not_found -> 0 in
283→ Hashtbl.replace name_counters base (n + 1);
284→ Printf.sprintf "%s%d" base n
285→ in
286→
287→ List.iter (fun (sp : spawn_stmt) ->
288→ let channels = List.map (function
289→ | Positional ch -> ch
290→ | Labeled (_, ch) -> ch
291→ ) sp.sp_args in
292→
293→ let binding = match find_binding sp.sp_process with
294→ | Some b -> b
295→ | None -> synthesise_structural sp.sp_process channels
296→ in
297→
298→ let proc_name = fresh_proc_name sp.sp_process in
299→ let binding = { binding with bd_name = proc_name } in
300→ let kind = process_kind_of_impl binding.impl in
301→
302→ processes := {
303→ proc_name;
304→ proc_kind = kind;
305→ proc_impl = binding.impl;
306→ proc_binding = binding;
307→ } :: !processes;
308→
309→ (* Determine port layout from the implementation kind.
310→ Each channel argument maps to a port. Convention:
311→ - Input ports: read from the named channel
312→ - Output ports: write to the named channel
313→
314→ A route is created for each channel connecting it to the
315→ process. The src/dst naming follows the data flow direction:
316→ the channel's writer is the src, the reader is the dst. *)
317→ let add_route ~src ~src_port ~dst ~dst_port ~channel_name =
318→ routes := {
319→ src; src_port; dst; dst_port;
320→ kind = Point_to_point;
321→ topic = channel_name;
322→ via = None;
323→ } :: !routes
324→ in
325→
326→ (* For inline morphisms, the process reads from input channels and
327→ writes to output channels. We model this as:
328→ - Input channel: route from channel → process (process reads)
329→ - Output channel: route from process → channel (process writes)
330→
331→ Since channels are named entities shared between processes, the
332→ route connects the producing process's output to the consuming
333→ process's input via the channel name. *)
334→ match binding.impl with
335→ | Id | FormatJson | ParseJson | Map _ | Filter _ | Tagger _ ->
336→ (* 2 channels: [input, output] *)
337→ let ch_in = List.nth channels 0 in
338→ let ch_out = List.nth channels 1 in
339→ add_route ~src:ch_in ~src_port:"out" ~dst:proc_name ~dst_port:"in"
340→ ~channel_name:ch_in;
341→ add_route ~src:proc_name ~src_port:"out" ~dst:ch_out ~dst_port:"in"
342→ ~channel_name:ch_out
343→
344→ | Discard ->
345→ (* 2 channels: [input, output(unit)] *)
346→ let ch_in = List.nth channels 0 in
347→ add_route ~src:ch_in ~src_port:"out" ~dst:proc_name ~dst_port:"in"
348→ ~channel_name:ch_in
349→
350→ | Log _ ->
351→ (* 1 channel: [input] *)
352→ let ch_in = List.nth channels 0 in
353→ add_route ~src:ch_in ~src_port:"out" ~dst:proc_name ~dst_port:"in"
354→ ~channel_name:ch_in
355→
356→ | Copy ->
357→ (* 3 channels: [input, output0, output1] *)
358→ let ch_in = List.nth channels 0 in
359→ let ch_out0 = List.nth channels 1 in
360→ let ch_out1 = List.nth channels 2 in
361→ add_route ~src:ch_in ~src_port:"out" ~dst:proc_name ~dst_port:"in"
362→ ~channel_name:ch_in;
363→ add_route ~src:proc_name ~src_port:"out0" ~dst:ch_out0 ~dst_port:"in"
364→ ~channel_name:ch_out0;
365→ add_route ~src:proc_name ~src_port:"out1" ~dst:ch_out1 ~dst_port:"in"
366→ ~channel_name:ch_out1
367→
368→ | Merge | MergeAny ->
369→ (* 3 channels: [input0, input1, output] *)
370→ let ch_in0 = List.nth channels 0 in
371→ let ch_in1 = List.nth channels 1 in
372→ let ch_out = List.nth channels 2 in
373→ add_route ~src:ch_in0 ~src_port:"out" ~dst:proc_name ~dst_port:"in0"
374→ ~channel_name:ch_in0;
375→ add_route ~src:ch_in1 ~src_port:"out" ~dst:proc_name ~dst_port:"in1"
376→ ~channel_name:ch_in1;
377→ add_route ~src:proc_name ~src_port:"out" ~dst:ch_out ~dst_port:"in"
378→ ~channel_name:ch_out
379→
380→ | Barrier ->
381→ (* N+1 channels: [input0..inputN-1, output] *)
382→ let n = List.length channels - 1 in
383→ for i = 0 to n - 1 do
384→ let ch = List.nth channels i in
385→ add_route ~src:ch ~src_port:"out" ~dst:proc_name
386→ ~dst_port:(Printf.sprintf "in%d" i)
387→ ~channel_name:ch
388→ done;
389→ let ch_out = List.nth channels n in
390→ add_route ~src:proc_name ~src_port:"out" ~dst:ch_out ~dst_port:"in"
391→ ~channel_name:ch_out
392→
393→ | Project n_proj ->
394→ (* 2 channels: [input, output] *)
395→ ignore n_proj;
396→ let ch_in = List.nth channels 0 in
397→ let ch_out = List.nth channels 1 in
398→ add_route ~src:ch_in ~src_port:"out" ~dst:proc_name ~dst_port:"in"
399→ ~channel_name:ch_in;
400→ add_route ~src:proc_name ~src_port:"out" ~dst:ch_out ~dst_port:"in"
401→ ~channel_name:ch_out
402→
403→ | Agent _ ->
404→ (* Variable channels: data_in, [ctrl_in], data_out, [ctrl_out], [telemetry] *)
405→ let n = List.length channels in
406→ (* Minimum: input, output (2 channels)
407→ With control: input, ctrl_in, output, ctrl_out (4)
408→ With telemetry: +1 *)
409→ if n >= 2 then begin
410→ let ch_in = List.nth channels 0 in
411→ add_route ~src:ch_in ~src_port:"out" ~dst:proc_name ~dst_port:"data_in"
412→ ~channel_name:ch_in;
413→ (* The output channel is at index 1 (no ctrl) or 2 (with ctrl) *)
414→ let out_idx = if n >= 4 then 2 else 1 in
415→ let ch_out = List.nth channels out_idx in
416→ add_route ~src:proc_name ~src_port:"data_out" ~dst:ch_out ~dst_port:"in"
417→ ~channel_name:ch_out;
418→ (* Control channels if present *)
419→ if n >= 4 then begin
420→ let ch_ctrl_in = List.nth channels 1 in
421→ let ch_ctrl_out = List.nth channels 3 in
422→ add_route ~src:ch_ctrl_in ~src_port:"out" ~dst:proc_name ~dst_port:"ctrl_in"
423→ ~channel_name:ch_ctrl_in;
424→ add_route ~src:proc_name ~src_port:"ctrl_out" ~dst:ch_ctrl_out ~dst_port:"in"
425→ ~channel_name:ch_ctrl_out
426→ end;
427→ (* Telemetry if present *)
428→ let telem_idx = if n >= 4 then 4 else 2 in
429→ if telem_idx < n then begin
430→ let ch_telem = List.nth channels telem_idx in
431→ add_route ~src:proc_name ~src_port:"telemetry" ~dst:ch_telem ~dst_port:"in"
432→ ~channel_name:ch_telem
433→ end
434→ end
435→
436→ | Plumb (plumb_ports, _) ->
437→ (* N port channels, matching the plumb body's port list *)
438→ List.iteri (fun i ch ->
439→ let port = List.nth plumb_ports i in
440→ (* Determine direction from port index relative to input/output *)
441→ add_route ~src:ch ~src_port:"out" ~dst:proc_name ~dst_port:port
442→ ~channel_name:ch
443→ ) channels
444→
445→ | Tool _ | Lower _ | Builtin _ ->
446→ (* These are not spawned directly — they're invoked via tool dispatch *)
447→ ()
448→ ) pb.pb_spawns;
449→
450→ (* Channel-node routes are intermediate — resolve them into direct
451→ process-to-process routes before the fabric sees them. *)
452→ let intermediate_routes = List.rev !routes in
453→ let resolved = resolve_channels
454→ ~processes:(List.rev !processes)
455→ ~boundary_ports:ports
456→ ~channels:pb.pb_channels
457→ intermediate_routes in
458→ let routes = classify_channel_kinds resolved in
459→ {
460→ processes = List.rev !processes;
461→ routes;
462→ seeded_channels = seeded;
463→ }
464→
465→(** Pretty-print a resolved routing table for debugging.
466→ Expects channel intermediaries to be resolved; prints the [via]
467→ field when present to show channel provenance. *)
468→let pp_routing_table rt =
469→ let buf = Buffer.create 256 in
470→ Buffer.add_string buf "Processes:\n";
471→ List.iter (fun pe ->
472→ let impl_str = match pe.proc_impl with
473→ | Id -> "id" | Copy -> "copy" | Merge -> "merge"
474→ | MergeAny -> "merge_any" | Discard -> "discard"
475→ | Barrier -> "barrier" | Map _ -> "map" | Filter _ -> "filter"
476→ | Tagger _ -> "tagger" | Log _ -> "log" | FormatJson -> "format_json"
477→ | ParseJson -> "parse_json" | Project n -> Printf.sprintf "project(%d)" n
478→ | Agent _ -> "agent" | Plumb _ -> "plumb" | Tool _ -> "tool"
479→ | Lower _ -> "lower" | Builtin _ -> "builtin"
480→ in
481→ Buffer.add_string buf (Printf.sprintf " %s (%s, %s)\n" pe.proc_name
482→ (match pe.proc_kind with Inline -> "inline" | Subprocess -> "subprocess")
483→ impl_str)
484→ ) rt.processes;
485→ Buffer.add_string buf "Routes:\n";
486→ List.iter (fun r ->
487→ let via_str = match r.via with
488→ | Some ch -> Printf.sprintf " via=%s" ch
489→ | None -> ""
490→ in
491→ Buffer.add_string buf (Printf.sprintf " %s:%s -> %s:%s [%s] topic=%s%s\n"
492→ r.src r.src_port r.dst r.dst_port
493→ (match r.kind with
494→ | Point_to_point -> "push/pull"
495→ | Fan_out -> "pub/sub"
496→ | Bidirectional -> "pair")
497→ r.topic via_str)
498→ ) rt.routes;
499→ if rt.seeded_channels <> [] then begin
500→ Buffer.add_string buf "Seeded: ";
501→ Buffer.add_string buf (String.concat ", " rt.seeded_channels);
502→ Buffer.add_char buf '\n'
503→ end;
504→ Buffer.contents buf
505→<persisted-output>
Output too large (72KB). Full output saved to: /Users/ww/.claude/projects/-Users-ww-src-plumbing/d438378e-8111-4168-abaf-ad446f02709d/tool-results/toolu_01FQYw4rLJW7yqJKZJWps4eg.txt
Preview (first 2KB):
50→
51→type t = {
52→ pids : (int * string) list;
53→ threads : Thread.t list;
54→ input_oc : out_channel;
55→ output_ic : in_channel;
56→ boundary_fds : (string * Unix.file_descr * Unix.file_descr) list;
57→ (** All boundary ports: (port_name, read_fd, write_fd).
58→ Input ports: runtime writes to write_fd, morphisms read from read_fd.
59→ Output ports: morphisms write to write_fd, runtime reads from read_fd.
60→
61→ TODO: leaky abstraction — raw fds on a public type. Only needed
62→ by recursive N-port Plumb execution. Consider returning a richer
63→ internal type and projecting to a public surface that exposes only
64→ input_oc / output_ic + send_control / shutdown functions. *)
65→ stderr_reader : (Thread.t * Buffer.t) option;
66→ ctrl_write_fds : Unix.file_descr list;
67→ (** TODO: same concern — callers should use send_control_all / shutdown,
68→ not reach into the fd list directly. *)
69→ tmp_files : string list;
70→}
71→
72→let input t = t.input_oc
73→let output t = t.output_ic
74→
75→(** Send a control message to a single agent's control input fd.
76→ Silences EPIPE (agent already exited); logs other errors. *)
77→let send_control fd msg =
78→ let line = Yojson.Safe.to_string msg ^ "\n" in
79→ let b = Bytes.of_string line in
80→ let len = Bytes.length b in
81→ let rec write_all off =
82→ if off < len then
83→ let n = Unix.write fd b off (len - off) in
84→ write_all (off + n)
85→ in
86→ try write_all 0
87→ with
88→ | Unix.Unix_error ((Unix.EPIPE | Unix.EBADF), _, _) -> ()
89→ | Unix.Unix_error (e, fn, _) ->
90→ Runtime.Log.warn "send_control"
91→ [("error", `String (Unix.error_message e));
92→ ("fn", `String fn)]
93→
94→(** Send a control message to all agents in the pipeline. *)
...
</persisted-output>{
"pattern": "tool_dispatch_loop",
"path": "/Users/ww/src/plumbing/lib",
"output_mode": "files_with_matches"
} 1→(** Tool dispatch: execution, OITL approval, and MCP connection management. *)
2→
3→open Language
4→open Spawn
5→
6→(** Runtime environment for tool dispatch. Threaded through
7→ execute → tool_dispatch_loop → dispatch_tool → execute_builtin. *)
8→type tool_env = {
9→ docroot : string option;
10→ (** Sandbox root for builtin document operations.
11→ [None] means use [Sys.getcwd ()] at dispatch time. *)
12→}
13→
14→let default_tool_env = { docroot = None }
15→
16→type observation = {
17→ agent : string;
18→ payload : Yojson.Safe.t;
19→}
20→
21→(** OITL (operator-in-the-loop) request/response types.
22→ Used for tool confirmation: the dispatch loop asks the operator
23→ before executing a tool call. Types match the OITL wire protocol
24→ JSON schema for future fd 8/9 migration. *)
25→type oitl_request = {
26→ oitl_id : string;
27→ oitl_tool : string;
28→ oitl_input : Yojson.Safe.t;
29→ oitl_agent : string;
30→}
31→
32→(* TODO: oitl_policy could be a variant [Once | Always] with [None] for
33→ denial, catching typos at compile time. Deferred because the string
34→ values match the wire protocol schema — changing to a variant now means
35→ adding serialisation when fd 8/9 lands. Do both together.
36→ Smart constructors (approve_once, approve_always, deny) would also
37→ collapse the ~6 repeated record literals across exec/chat/tests. *)
38→type oitl_response = {
39→ oitl_approved : bool;
40→ oitl_policy : string option; (** "once" | "always" | None=denied *)
41→}
42→
43→(** {1 Tool Dispatch}
44→
45→ The plumb runtime mediates tool calls between agents and their
46→ tool bindings. When an agent has tools, the runtime creates a
47→ tool dispatch pipe pair and runs a dispatch loop in a thread.
48→ The agent sends tool requests via __port envelopes on stdout;
49→ the demux routes them to the dispatch loop. Responses flow back
50→ through a mux forwarder as tool_resp envelopes on stdin.
51→
52→ Tool = lowering: [(A → B) ← (!A → !B)] via [η_A ; f ; ε_B].
53→ Any total morphism can serve as a tool — the discriminant is totality
54→ (exactly one output per input). See [Spec.is_lowerable]. *)
55→
56→(** Fork a subprocess (agent or plumb) for tool execution: one message
57→ in, one message out. This is the Kleisli counit ε applied after
58→ the unit η — tool is the categorical lowering operation. *)
59→let fork_tool_subprocess ~tool_call_id ~base_dir type_env prog (tool_binding : binding)
60→ (process_binding : binding) (input : Yojson.Safe.t) =
61→ let impl_name = match process_binding.impl with
62→ | Agent _ -> "plumb-agent" | Plumb _ -> "plumb"
63→ | _ -> failwith "fork_tool_subprocess: not a subprocess impl"
64→ in
65→ (* Resolve prompt paths before spec synthesis *)
66→ let process_binding = match process_binding.impl with
67→ | Agent config ->
68→ { process_binding with impl = Agent (resolve_prompts_config ~base_dir config) }
69→ | _ -> process_binding
70→ in
71→ (* Handle product output: if agent has (!B, !T), use { output = Stream B } *)
72→ let child_binding =
73→ if impl_name = "plumb-agent" then
74→ match agent_output_decomposition type_env process_binding.output with
75→ | Some (result_t, _) -> { process_binding with output = Stream result_t }
76→ | None -> process_binding
77→ else process_binding
78→ in
79→ let spec_source = synthesize_spec prog child_binding in
80→ let spec_path = write_temp_spec spec_source in{
"pattern": "Prefix_path|resource_find",
"path": "/Users/ww/src/plumbing/lib/plumb",
"output_mode": "files_with_matches"
} 1→(library
2→ (name fabric)
3→ (public_name plumbing.fabric)
4→ (libraries transport supervisor runtime language spawn_lib yojson eio eio.unix unix threads.posix))
5→ 80→ let spec_path = write_temp_spec spec_source in
81→ let bin_path = resolve_binary impl_name in
82→ (* Create pipes for child stdin/stdout/stderr *)
83→ let child_in_r, child_in_w = Unix.pipe ~cloexec:true () in
84→ let child_out_r, child_out_w = Unix.pipe ~cloexec:true () in
85→ let child_err_r, child_err_w = Unix.pipe ~cloexec:true () in
86→ let err_buf = Buffer.create 256 in
87→ let pid = spawn_subprocess ~bin_path ~spec_path
88→ ~stdin_fd:child_in_r ~stdout_fd:child_out_w ~stderr_fd:child_err_w
89→ ~env:(credential_env () @ provider_env ()) () in
90→ (* spawn_subprocess closes child_in_r, child_out_w, child_err_w in parent *)
91→ Runtime.Log.debug "fork_tool_subprocess"
92→ ["pid", `Int pid;
93→ "tool", `String tool_binding.bd_name;
94→ "process", `String process_binding.bd_name;
95→ "tool_call_id", `String tool_call_id];
96→ (* Start stderr reader thread to avoid pipe buffer deadlock *)
97→ let err_thread = Thread.create (fun () ->
98→ let ic = Unix.in_channel_of_descr child_err_r in
99→ (try while true do
100→ Buffer.add_string err_buf (input_line ic);
101→ Buffer.add_char err_buf '\n'
102→ done with End_of_file -> ());
103→ close_in_noerr ic
104→ ) () in
105→ (* Write input as JSON line, close stdin *)
106→ let oc = Unix.out_channel_of_descr child_in_w in
107→ output_string oc (Yojson.Safe.to_string input);
108→ output_char oc '\n';
109→ close_out oc;
110→ (* Read one JSON line from stdout *)
111→ let ic = Unix.in_channel_of_descr child_out_r in
112→ let result =
113→ try
114→ let line = input_line ic in
115→ close_in ic;
116→ let json = Yojson.Safe.from_string line in
117→ (* Validate output against tool's output type *)
118→ (match Validate.check type_env tool_binding.output json with
119→ | Ok cleaned -> Ok cleaned
120→ | Error errs ->
121→ Error (Printf.sprintf "output validation failed:\n%s"
122→ (Validate.error_string errs)))
123→ with
124→ | End_of_file ->
125→ close_in_noerr ic;
126→ Error "" (* placeholder — stderr content added below *)
127→ | exn ->
128→ close_in_noerr ic;
129→ Error (Printf.sprintf "tool %s (process %s): %s"
130→ tool_binding.bd_name process_binding.bd_name
131→ (Printexc.to_string exn))
132→ in
133→ (* Wait for child to exit (with optional timeout), then collect stderr *)
134→ let timeout_s = match Spawn.timeout_of_binding tool_binding with
135→ | Some t -> t | None -> 0.0 in
136→ let status, timed_out = Spawn.waitpid_timed ~timeout_seconds:timeout_s pid in
137→ Thread.join err_thread;
138→ let stderr_msg = String.trim (Buffer.contents err_buf) in
139→ (try Sys.remove spec_path with Sys_error _ -> ());
140→ if timed_out then
141→ Runtime.Log.warn "tool_timeout"
142→ ["tool", `String tool_binding.bd_name;
143→ "process", `String process_binding.bd_name;
144→ "timeout_seconds", `Float timeout_s;
145→ "tool_call_id", `String tool_call_id];
146→ (* Use stderr content for empty error messages *)
147→ let result = if timed_out then
148→ Error (Printf.sprintf "tool %s (process %s): timed out after %.0fs"
149→ tool_binding.bd_name process_binding.bd_name timeout_s)
150→ else match result with
151→ | Error "" ->
152→ let msg = if stderr_msg <> "" then
153→ Printf.sprintf "tool %s (process %s): %s"
154→ tool_binding.bd_name process_binding.bd_name stderr_msg
155→ else
156→ Printf.sprintf "tool %s (process %s): process produced no output"
157→ tool_binding.bd_name process_binding.bd_name
158→ in
159→ Error msg
160→ | other -> other
161→ in
162→ let is_error = match status with
163→ | Unix.WEXITED 0 -> Result.is_error result
164→ | _ -> true
165→ in
166→ let content = match result with
167→ | Ok json -> Yojson.Safe.to_string json
168→ | Error msg -> Runtime.Error.(to_string (make Unavailable "process_error" msg))
169→ in
170→ (content, is_error)
171→
172→(** Execute a builtin operation. Dispatches on the [builtin_op]
173→ variant. Document ops use a [Docstore.t] from the tool_env docroot.
174→ [call_pipeline] is a callback for the [Call] builtin — breaks the
175→ forward reference with [execute]. *)
176→let execute_builtin ~tool_call_id ~call_pipeline ~search_path tool_env type_env
177→ (tool_binding : binding) (op : builtin_op) (input : Yojson.Safe.t) =
178→ let open Yojson.Safe.Util in
179→ let t0 = Unix.gettimeofday () in
180→ let root = match tool_env.docroot with
181→ | Some r -> r
182→ | None -> Sys.getcwd ()
183→ in
184→ let store =
185→ try Docstore.create ~root
186→ with Invalid_argument msg ->
187→ Runtime.Log.error "builtin_tool"
188→ ["error", `String msg; "tool_call_id", `String tool_call_id];
189→ failwith msg
190→ in
191→ let docstore_error e =
192→ let open Runtime.Error in
193→ let category, code = match e with
194→ | Docstore.Path_traversal _ -> (Denied, "docstore_denied")
195→ | Docstore.Not_found _ -> (Not_found, "docstore_not_found")
196→ | Docstore.Permission_denied _ -> (Denied, "docstore_denied")
197→ | Docstore.Invalid_pattern _ -> (Invalid, "docstore_invalid")
198→ | Docstore.Io_error _ -> (Unavailable, "docstore_io_error")
199→ in
200→ (to_string (make category code (Docstore.error_to_string e)), true)
201→ in
202→ let result = match op with
203→ | Doc_read ->
204→ let path = input |> member "path" |> to_string in
205→ let offset = try Some (input |> member "offset" |> to_int)
206→ with Type_error _ -> None in
207→ let count = try Some (input |> member "count" |> to_int)
208→ with Type_error _ -> None in
209→ (match Docstore.read store ~path ?offset ?count () with
210→ | Ok content ->
211→ (Yojson.Safe.to_string (`Assoc [("content", `String content)]), false)
212→ | Error e ->
213→ docstore_error e)
214→ | Doc_write ->
215→ let path = input |> member "path" |> to_string in
216→ let content = input |> member "content" |> to_string in
217→ (match Docstore.write store ~path ~content with
218→ | Ok () ->
219→ (Yojson.Safe.to_string (`Assoc [("ok", `Bool true)]), false)
220→ | Error e ->
221→ docstore_error e)
222→ | Doc_edit ->
223→ let path = input |> member "path" |> to_string in
224→ let pattern = input |> member "pattern" |> to_string in
225→ let replacement = input |> member "replacement" |> to_string in
226→ (match Docstore.edit store ~path ~pattern ~replacement with
227→ | Ok n ->
228→ (Yojson.Safe.to_string (`Assoc [("replacements", `Int n)]), false)
229→ | Error e ->
230→ docstore_error e)
231→ | Doc_grep ->
232→ let pattern = input |> member "pattern" |> to_string in
233→ let path = try Some (input |> member "path" |> to_string)
234→ with Type_error _ -> None in
235→ let max_results = try Some (input |> member "max_results" |> to_int)
236→ with Type_error _ -> None in
237→ (match Docstore.grep store ~pattern ?path ?max_results () with
238→ | Ok matches ->
239→ let json_matches = List.map (fun (m : Docstore.match_result) ->
240→ `Assoc [
241→ ("file", `String m.file);
242→ ("line_number", `Int m.line_number);
243→ ("line", `String m.line);
244→ ]
245→ ) matches in
246→ (Yojson.Safe.to_string (`Assoc [("matches", `List json_matches)]), false)
247→ | Error e ->
248→ docstore_error e)
249→ | Doc_list ->
250→ let prefix = try Some (input |> member "prefix" |> to_string)
251→ with Type_error _ -> None in
252→ let max_results = try Some (input |> member "max_results" |> to_int)
253→ with Type_error _ -> None in
254→ (match Docstore.list store ?prefix ?max_results () with
255→ | Ok (paths, truncated) ->
256→ (Yojson.Safe.to_string (`Assoc [
257→ ("paths", `List (List.map (fun p -> `String p) paths));
258→ ("truncated", `Bool truncated);
259→ ]), false)
260→ | Error e ->
261→ docstore_error e)
262→ | Doc_delete ->
263→ let path = input |> member "path" |> to_string in
264→ Runtime.Log.info "builtin_tool"
265→ ["op", `String "delete"; "path", `String path;
266→ "tool_call_id", `String tool_call_id];
267→ (match Docstore.delete store ~path with
268→ | Ok () ->
269→ (Yojson.Safe.to_string (`Assoc [("ok", `Bool true)]), false)
270→ | Error e ->
271→ docstore_error e)
272→ | Exec ->
273→ let command = input |> member "command" |> to_string in
274→ Runtime.Log.info "builtin_tool"
275→ ["op", `String "exec"; "command", `String command;
276→ "tool_call_id", `String tool_call_id];
277→ (* Create pipes for stdout and stderr capture *)
278→ let out_r, out_w = Unix.pipe ~cloexec:true () in
279→ let err_r, err_w = Unix.pipe ~cloexec:true () in
280→ let env_array = build_env [] in
281→ let pid = Unix.fork () in
282→ if pid = 0 then begin
283→ (* Child: set cwd, wire fds, exec shell. _exit on all failure
284→ paths — never run at_exit handlers in forked child. *)
285→ (try Unix.chdir root
286→ with Unix.Unix_error (err, _, _) ->
287→ Runtime.Log.error "builtin_exec"
288→ ["event", `String "chdir_failed";
289→ "path", `String root;
290→ "detail", `String (Unix.error_message err)];
291→ Unix._exit 2);
292→ Unix.dup2 out_w Unix.stdout; Unix.close out_w; Unix.close out_r;
293→ Unix.dup2 err_w Unix.stderr; Unix.close err_w; Unix.close err_r;
294→ (* Close stdin — exec commands do not read from it *)
295→ Unix.close Unix.stdin;
296→ (try Unix.execve "/bin/sh" [| "/bin/sh"; "-c"; command |] env_array
297→ with exn ->
298→ Runtime.Log.error "builtin_exec"
299→ ["event", `String "execve_failed";
300→ "binary", `String "/bin/sh";
301→ "detail", `String (Printexc.to_string exn)];
302→ Unix._exit 127)
303→ end else begin
304→ (* Parent: close write ends, read with cap *)
305→ Unix.close out_w;
306→ Unix.close err_w;
307→ (* Two reader threads to avoid deadlock *)
308→ let out_result = ref ("", false) in
309→ let err_result = ref ("", false) in
310→ let out_thread = Thread.create (fun () ->
311→ out_result := read_capped out_r max_exec_output_bytes;
312→ Unix.close out_r
313→ ) () in
314→ let err_thread = Thread.create (fun () ->
315→ err_result := read_capped err_r max_exec_output_bytes;
316→ Unix.close err_r
317→ ) () in
318→ (* Timeout: wait with select on a timer *)
319→ let timed_out = ref false in
320→ let deadline = Unix.gettimeofday () +. exec_timeout_seconds in
321→ let rec wait_child () =
322→ let now = Unix.gettimeofday () in
323→ if now >= deadline then begin
324→ timed_out := true;
325→ (try Unix.kill pid Sys.sigkill with Unix.Unix_error _ -> ());
326→ snd (Unix.waitpid [] pid)
327→ end else
328→ match Unix.waitpid [Unix.WNOHANG] pid with
329→ | 0, _ ->
330→ ignore (Unix.select [] [] [] 0.1);
331→ wait_child ()
332→ | _, status -> status
333→ in
334→ let status = wait_child () in
335→ Thread.join out_thread;
336→ Thread.join err_thread;
337→ let stdout_s, stdout_trunc = !out_result in
338→ let stderr_s, stderr_trunc = !err_result in
339→ let exit_code = match status with
340→ | Unix.WEXITED c -> c
341→ | Unix.WSIGNALED s -> 128 + s
342→ | Unix.WSTOPPED s -> 128 + s
343→ in
344→ let fields = [
345→ ("stdout", `String stdout_s);
346→ ("stderr", `String stderr_s);
347→ ("exit_code", `Int exit_code);
348→ ] @ (if stdout_trunc || stderr_trunc then
349→ [("truncated", `Bool true)] else [])
350→ @ (if !timed_out then
351→ [("timed_out", `Bool true)] else [])
352→ in
353→ (Yojson.Safe.to_string (`Assoc fields), false)
354→ end
355→ | Check ->
356→ let source = input |> member "source" |> to_string in
357→ (try
358→ let prog = Language.Loader.load_string ~search_path
359→ ~importing_dir:root source in
360→ let _prog, _type_env =
361→ Runtime.Spec.validate_program prog in
362→ (Yojson.Safe.to_string (`Assoc [("ok", `Bool true)]), false)
363→ with
364→ | Language.Loader.Load_error msg ->
365→ (Yojson.Safe.to_string (`Assoc [
366→ ("ok", `Bool false); ("error", `String msg)
367→ ]), true)
368→ | Language.Parse_error msg ->
369→ (Yojson.Safe.to_string (`Assoc [
370→ ("ok", `Bool false); ("error", `String msg)
371→ ]), true)
372→ | Runtime.Error.Process_error e ->
373→ (Yojson.Safe.to_string (`Assoc [
374→ ("ok", `Bool false);
375→ ("error", `String (e.Runtime.Error.detail))
376→ ]), true)
377→ | exn ->
378→ (Yojson.Safe.to_string (`Assoc [
379→ ("ok", `Bool false);
380→ ("error", `String (Printexc.to_string exn))
381→ ]), true))
382→ | Call ->
383→ let source = input |> member "source" |> to_string in
384→ let call_input = input |> member "input" in
385→ call_pipeline source call_input
386→ in
387→ let duration_ms = (Unix.gettimeofday () -. t0) *. 1000.0 in
388→ let op_name = Language.Pp.string_of_builtin_op op in
389→ let _, is_error = result in
390→ Runtime.Log.info "builtin_tool"
391→ [ "op", `String op_name;
392→ "duration_ms", `Float duration_ms;
393→ "ok", `Bool (not is_error);
394→ "tool_call_id", `String tool_call_id ];
395→ ignore type_env; ignore tool_binding;
396→ result
397→
398→(** Execute a tool by dispatching on the process binding's impl kind.
399→ Inline morphisms (id, map, copy, discard) execute directly.
400→ Agents and plumbs fork a subprocess. *)
401→let execute_tool ~tool_call_id ~base_dir type_env prog (tool_binding : binding)
402→ (process_binding : binding) (input : Yojson.Safe.t) =
403→ match process_binding.impl with
404→ | Agent _ | Plumb _ ->
405→ fork_tool_subprocess ~tool_call_id ~base_dir type_env prog tool_binding process_binding input
406→ | Id ->
407→ (Yojson.Safe.to_string input, false)
408→ | Map expr ->
409→ let result = Expr.eval input expr in
410→ (match Validate.check type_env tool_binding.output result with
411→ | Ok cleaned -> (Yojson.Safe.to_string cleaned, false)
412→ | Error errs ->
413→ (Runtime.Error.(to_string (make Invalid "validation_error"
414→ (Printf.sprintf "output validation failed:\n%s"
415→ (Validate.error_string errs)))), true))
416→ | Copy ->
417→ (Yojson.Safe.to_string (`List [input; input]), false)
418→ | Discard ->
419→ (Yojson.Safe.to_string `Null, false)
420→ | Filter _ | Merge | MergeAny | Barrier | Project _ ->
421→ let impl_name = match process_binding.impl with
422→ | Filter _ -> "filter" | Merge | MergeAny -> "merge"
423→ | Barrier -> "barrier" | Project _ -> "project"
424→ | _ -> "unknown"
425→ in
426→ (Runtime.Error.(to_string (make Invalid "tool_not_lowerable"
427→ (Printf.sprintf "cannot lower %s to a tool (not total)" impl_name))), true)
428→ | Builtin _ ->
429→ (* Builtins are handled by execute_builtin before reaching here *)
430→ failwith "unreachable: Builtin should be dispatched by execute_builtin"
431→ | Tool _ | Lower _ ->
432→ (Runtime.Error.(to_string (make Invalid "tool_nesting_error"
433→ "cannot nest tool inside tool")), true)
434→ | FormatJson ->
435→ (Yojson.Safe.to_string (`String (Yojson.Safe.to_string input)), false)
436→ | ParseJson ->
437→ let s = match input with
438→ | `String s -> s
439→ | _ -> Yojson.Safe.to_string input
440→ in
441→ (match
442→ try Ok (Yojson.Safe.from_string s)
443→ with Yojson.Json_error msg -> Error msg
444→ with
445→ | Ok json -> (Yojson.Safe.to_string json, false)
446→ | Error msg ->
447→ (Runtime.Error.(to_string (make Invalid "parse_error"
448→ (Printf.sprintf "parse_json: %s" msg))), true))
449→ | Tagger _ | Log _ ->
450→ (Runtime.Error.(to_string (make Internal "internal_error"
451→ "compiler-internal morphism, not a tool")), true)
452→
453→(** Build an error response for tool dispatch. *)
454→let tool_error_response ?source id category code msg =
455→ let extras = match source with
456→ | Some src -> [("source", `String src)]
457→ | None -> []
458→ in
459→ let error_json = Runtime.Error.(to_string (make ~extras category code msg)) in
460→ `Assoc [
461→ ("id", `String id);
462→ ("content", `String error_json);
463→ ("is_error", `Bool true);
464→ ]
465→
466→(** Raised when a tool binding has an MCP config — escapes the nested
467→ match to be caught by [dispatch_tool]'s top-level handler. *)
468→exception Mcp_tool_dispatch of (string * config_value) list * string * Yojson.Safe.t
469→
470→(** Normalise an MCP server config to a cache key string. *)
471→let mcp_config_cache_key config =
472→ let get_str key =
473→ List.assoc_opt key config |> Option.map (function
474→ | Str s -> s | _ -> "") |> Option.value ~default:""
475→ in
476→ match get_str "url" with
477→ | "" -> Printf.sprintf "stdio:%s" (get_str "command")
478→ | url -> Printf.sprintf "http:%s" url
479→
480→(** Normalise an MCP tool config value into a flat key-value list.
481→ Handles [Str] (URL shorthand) and [Obj] (full config). *)
482→let normalise_mcp_tool_config config =
483→ match List.assoc_opt "mcp" config with
484→ | Some (Str url) -> [("url", Str url)]
485→ | Some (Obj kvs) -> kvs
486→ | Some _ -> failwith "tool mcp config must be a string URL or an object"
487→ | None -> failwith "tool mcp config not found"
488→
489→(** Get or create a cached MCP connection for a tool's MCP config. *)
490→let get_or_create_mcp_connection mcp_cache config =
491→ let config = normalise_mcp_tool_config config in
492→ let key = mcp_config_cache_key config in
493→ match List.assoc_opt key !mcp_cache with
494→ | Some conn -> conn
495→ | None ->
496→ let conn = match Spawn.connect_mcp_server config with
497→ | Ok c -> c
498→ | Error msg -> failwith msg
499→ in
500→ mcp_cache := (key, conn) :: !mcp_cache;
501→ conn
502→
503→(** Dispatch a single tool request. Resolves the process binding
504→ (for [Tool {process: name}]) or uses the tool binding itself
505→ (for bare-typed morphisms like [map], [id]).
506→ [~mcp_cache] is a mutable assoc list of MCP connections keyed by
507→ normalised server config string, used for per-tool MCP dispatch. *)
508→let dispatch_tool ~tool_call_id ~base_dir ~call_pipeline ~search_path tool_env type_env prog
509→ tool_bindings ~mcp_cache id name (input : Yojson.Safe.t) =
510→ match List.find_opt (fun (b : binding) -> short_name b.bd_name = name) tool_bindings with
511→ | None ->
512→ tool_error_response id Not_found "unknown_tool" (Printf.sprintf "unknown tool: %s" name)
513→ | Some tool_binding ->
514→ try
515→ let validated_input = match Validate.check type_env tool_binding.input input with
516→ | Ok v -> v
517→ | Error errs ->
518→ failwith (Printf.sprintf "input validation failed: %s"
519→ (Validate.error_string errs))
520→ in
521→ let content, is_error = match tool_binding.impl with
522→ | Builtin op ->
523→ execute_builtin ~tool_call_id ~call_pipeline ~search_path tool_env type_env tool_binding op validated_input
524→ | _ ->
525→ let process_binding = match tool_binding.impl with
526→ | Lower process_name ->
527→ (match List.find_opt (fun b ->
528→ b.bd_name = process_name
529→ ) prog.prog_bindings with
530→ | Some pb -> pb
531→ | None ->
532→ failwith (Printf.sprintf "tool %s: process '%s' not found"
533→ tool_binding.bd_name process_name))
534→ | Tool config ->
535→ (match List.find_map (fun (k, v) ->
536→ if k = "process" then
537→ (match v with Ident pname -> Some pname | _ -> None)
538→ else None
539→ ) config with
540→ | Some process_name ->
541→ (match List.find_opt (fun b ->
542→ b.bd_name = process_name
543→ ) prog.prog_bindings with
544→ | Some pb -> pb
545→ | None ->
546→ failwith (Printf.sprintf "tool %s: process '%s' not found"
547→ tool_binding.bd_name process_name))
548→ | None ->
549→ (* Check for MCP-backed tool *)
550→ match List.assoc_opt "mcp" config, List.assoc_opt "name" config with
551→ | Some _, Some (Str tool_name) ->
552→ (* MCP dispatch handled below *)
553→ raise (Mcp_tool_dispatch (config, tool_name, validated_input))
554→ | Some _, None ->
555→ failwith (Printf.sprintf "tool %s: mcp config requires 'name' key"
556→ tool_binding.bd_name)
557→ | Some _, Some _ ->
558→ failwith (Printf.sprintf "tool %s: 'name' must be a string"
559→ tool_binding.bd_name)
560→ | None, _ ->
561→ failwith "tool requires 'process', 'mcp', or structural impl")
562→ | _ -> tool_binding (* self-backing: id, map, copy, discard *)
563→ in
564→ execute_tool ~tool_call_id ~base_dir type_env prog tool_binding process_binding validated_input
565→ in
566→ `Assoc [
567→ ("id", `String id);
568→ ("content", `String content);
569→ ("is_error", `Bool is_error);
570→ ]
571→ with
572→ | Mcp_tool_dispatch (config, tool_name, validated_input) ->
573→ (try
574→ let conn = get_or_create_mcp_connection mcp_cache config in
575→ let (content, is_error) =
576→ Mcp_client.call_tool conn ~name:tool_name ~input:validated_input in
577→ `Assoc [
578→ ("id", `String id);
579→ ("content", `String content);
580→ ("is_error", `Bool is_error);
581→ ]
582→ with
583→ | Failure msg -> tool_error_response ~source:name id Unavailable "mcp_error" msg
584→ | exn -> tool_error_response ~source:name id Unavailable "mcp_error" (Printexc.to_string exn))
585→ | Failure msg -> tool_error_response ~source:name id Unavailable "mcp_error" msg
586→ | exn -> tool_error_response ~source:name id Unavailable "mcp_error" (Printexc.to_string exn)
587→
588→(** Dispatch a tool call to an MCP server connection.
589→ Searches each connection's tool list for a matching name
590→ (qualified or raw). Strips the prefix before calling the server.
591→ Returns the standard response JSON. *)
592→let dispatch_mcp_tool ~tool_call_id mcp_connections id name input =
593→ let result = List.find_map (fun conn ->
594→ let tools = Mcp_client.tools conn in
595→ match List.find_opt (fun (t : Mcp_client.mcp_tool) ->
596→ t.qualified_name = name || t.name = name
597→ ) tools with
598→ | Some tool ->
599→ (* Warn if unqualified name is ambiguous across servers *)
600→ if tool.qualified_name <> name then begin
601→ let other_matches = List.exists (fun other_conn ->
602→ other_conn != conn &&
603→ List.exists (fun (t : Mcp_client.mcp_tool) ->
604→ t.name = name) (Mcp_client.tools other_conn)
605→ ) mcp_connections in
606→ if other_matches then
607→ Runtime.Log.warn "mcp_tool_ambiguous"
608→ [("name", `String name);
609→ ("matched", `String tool.qualified_name);
610→ ("tool_call_id", `String tool_call_id)]
611→ end;
612→ Some (Mcp_client.call_tool conn ~name:tool.name ~input)
613→ | None -> None
614→ ) mcp_connections in
615→ match result with
616→ | Some (content, is_error) ->
617→ `Assoc [
618→ ("id", `String id);
619→ ("content", `String content);
620→ ("is_error", `Bool is_error);
621→ ]
622→ | None ->
623→ tool_error_response ~source:name id Not_found "unknown_tool"
624→ (Printf.sprintf "unknown tool: %s (not found in any MCP server)" name)
625→
626→(** Tool dispatch loop: reads JSON requests from req_fd, writes
627→ responses to resp_fd. Runs in a thread. Exits on EOF.
628→
629→ Plumbing tool bindings are checked first; MCP connections are
630→ the fallback for dynamically discovered tools. *)
631→(* TODO: parameter sprawl — ~base_dir, ~call_pipeline, ~search_path,
632→ tool_env, type_env, prog, tool_bindings, ~mcp_connections are always
633→ passed together. Bundle into a dispatch_context record. Same for
634→ the callback params (?on_observe, ?on_oitl_request) on execute —
635→ group into a hooks record. *)
636→let tool_dispatch_loop ~base_dir ~call_pipeline ~search_path tool_env type_env prog
637→ tool_bindings ~mcp_connections ?on_oitl_request ~agent_name req_fd resp_fd =
638→ let ic = Unix.in_channel_of_descr req_fd in
639→ let oc = Unix.out_channel_of_descr resp_fd in
640→ let mcp_cache = ref [] in
641→ let session_approvals = Hashtbl.create 8 in
642→ let request_counter = ref 0 in
643→ (try
644→ while true do
645→ let line = input_line ic in
646→ let req = Yojson.Safe.from_string line in
647→ let open Yojson.Safe.Util in
648→ let id = req |> member "id" |> to_string in
649→ let name = req |> member "name" |> to_string in
650→ let input = req |> member "input" in
651→ (* OITL: check operator approval before dispatching *)
652→ let approved = match on_oitl_request with
653→ | None -> true
654→ | Some _ when Hashtbl.mem session_approvals name -> true
655→ | Some cb ->
656→ incr request_counter;
657→ let oitl_req = {
658→ oitl_id = Printf.sprintf "tc-%s-%d" agent_name !request_counter;
659→ oitl_tool = name;
660→ oitl_input = input;
661→ oitl_agent = agent_name;
662→ } in
663→ let oitl_resp = cb oitl_req in
664→ if oitl_resp.oitl_approved then begin
665→ (match oitl_resp.oitl_policy with
666→ | Some "always" -> Hashtbl.replace session_approvals name ()
667→ | _ -> ());
668→ true
669→ end else false
670→ in
671→ let resp =
672→ if not approved then
673→ tool_error_response id Denied "tool_denied" "tool call denied by operator"
674→ else if List.exists (fun (b : binding) ->
675→ short_name b.bd_name = name
676→ ) tool_bindings then
677→ dispatch_tool ~tool_call_id:id ~base_dir ~call_pipeline ~search_path tool_env type_env
678→ prog tool_bindings ~mcp_cache id name input
679→ else
680→ dispatch_mcp_tool ~tool_call_id:id mcp_connections id name input
681→ in
682→ output_string oc (Yojson.Safe.to_string resp);
683→ output_char oc '\n';
684→ flush oc
685→ done
686→ with End_of_file -> ());
687→ (* Shut down MCP connections when the dispatch loop ends. *)
688→ List.iter (fun c ->
689→ try Mcp_client.shutdown c
690→ with exn ->
691→ Runtime.Log.warn "mcp_shutdown"
692→ ["error", `String (Printexc.to_string exn)]
693→ ) mcp_connections;
694→ List.iter (fun (_, c) ->
695→ try Mcp_client.shutdown c
696→ with exn ->
697→ Runtime.Log.warn "mcp_shutdown"
698→ ["error", `String (Printexc.to_string exn)]
699→ ) !mcp_cache;
700→ close_in_noerr ic;
701→ close_out_noerr oc
702→{
"pattern": "Prefix_path\\.bin_dir",
"path": "/Users/ww/src/plumbing/lib/plumb",
"output_mode": "content"
} 1→(** Resource file search path assembly.
2→
3→ Builds the search path for runtime resource files (system prompts,
4→ grammar) from:
5→ 1. [PLUMB_RESOURCES] environment variable (colon-separated directories)
6→ 2. Binary-relative [../share/plumbing/resources] (tarball / install prefix)
7→ 3. Installed resource directories discovered via dune sites *)
8→
9→let search_path () =
10→ let from_env = match Sys.getenv_opt "PLUMB_RESOURCES" with
11→ | None -> []
12→ | Some s ->
13→ String.split_on_char ':' s
14→ |> List.filter (fun s -> s <> "")
15→ in
16→ from_env @ Prefix_path.share_subdir "resources" @ Sites.Sites.resources
17→
18→let find ?(paths = search_path ()) name =
19→ if String.length name = 0 then
20→ Runtime.Error.raise_ Config "config_error" "empty string in prompts array";
21→ match List.find_opt (fun dir ->
22→ Sys.file_exists (Filename.concat dir name)
23→ ) paths with
24→ | Some dir -> Filename.concat dir name
25→ | None ->
26→ let dirs = String.concat ":" paths in
27→ Runtime.Error.raise_ Config "config_error"
28→ (Printf.sprintf "cannot find resource file %s (searched: %s)" name dirs)
29→{
"pattern": "val build_type_env|let build_type_env",
"path": "/Users/ww/src/plumbing/lib/runtime",
"output_mode": "content"
} 1→(** Installation prefix discovery.
2→
3→ Given a binary at [prefix/bin/plumb], derives
4→ [prefix/share/plumbing/subdir]. Uses {!Unix.realpath} to resolve
5→ symlinks (Nix result links, user symlinks into [/usr/local/bin]). *)
6→
7→let bin_dir =
8→ lazy begin
9→ let exe =
10→ try Unix.realpath Sys.executable_name
11→ with Unix.Unix_error _ | Invalid_argument _ -> Sys.executable_name
12→ in
13→ if Filename.is_relative exe then None
14→ else Some (Filename.dirname exe)
15→ end
16→
17→let share_subdir subdir =
18→ match Lazy.force bin_dir with
19→ | None -> []
20→ | Some dir ->
21→ let prefix = Filename.dirname dir in
22→ let candidate =
23→ Filename.concat prefix
24→ (Filename.concat "share"
25→ (Filename.concat "plumbing" subdir))
26→ in
27→ if Sys.file_exists candidate
28→ && (try Sys.is_directory candidate with Sys_error _ -> false)
29→ then [candidate]
30→ else []
31→{
"file_path": "/Users/ww/src/plumbing/lib/plumb/exec.ml",
"offset": 50,
"limit": 50
} 50→
51→type t = {
52→ pids : (int * string) list;
53→ threads : Thread.t list;
54→ input_oc : out_channel;
55→ output_ic : in_channel;
56→ boundary_fds : (string * Unix.file_descr * Unix.file_descr) list;
57→ (** All boundary ports: (port_name, read_fd, write_fd).
58→ Input ports: runtime writes to write_fd, morphisms read from read_fd.
59→ Output ports: morphisms write to write_fd, runtime reads from read_fd.
60→
61→ TODO: leaky abstraction — raw fds on a public type. Only needed
62→ by recursive N-port Plumb execution. Consider returning a richer
63→ internal type and projecting to a public surface that exposes only
64→ input_oc / output_ic + send_control / shutdown functions. *)
65→ stderr_reader : (Thread.t * Buffer.t) option;
66→ ctrl_write_fds : Unix.file_descr list;
67→ (** TODO: same concern — callers should use send_control_all / shutdown,
68→ not reach into the fd list directly. *)
69→ tmp_files : string list;
70→}
71→
72→let input t = t.input_oc
73→let output t = t.output_ic
74→
75→(** Send a control message to a single agent's control input fd.
76→ Silences EPIPE (agent already exited); logs other errors. *)
77→let send_control fd msg =
78→ let line = Yojson.Safe.to_string msg ^ "\n" in
79→ let b = Bytes.of_string line in
80→ let len = Bytes.length b in
81→ let rec write_all off =
82→ if off < len then
83→ let n = Unix.write fd b off (len - off) in
84→ write_all (off + n)
85→ in
86→ try write_all 0
87→ with
88→ | Unix.Unix_error ((Unix.EPIPE | Unix.EBADF), _, _) -> ()
89→ | Unix.Unix_error (e, fn, _) ->
90→ Runtime.Log.warn "send_control"
91→ [("error", `String (Unix.error_message e));
92→ ("fn", `String fn)]
93→
94→(** Send a control message to all agents in the pipeline. *)
95→let send_control_all t msg =
96→ List.iter (fun fd -> send_control fd msg) t.ctrl_write_fds
97→
98→let release_ctrl t =
99→ (* Send stop to each agent then close executor's control write-fds.{
"file_path": "/Users/ww/src/plumbing/lib/runtime/spec.ml",
"offset": 40,
"limit": 60
} 40→ Requires totality: the process must always produce at least one output. *)
41→let is_lowerable = function
42→ | Agent _ | Plumb _ | Id | Map _ | Copy | Discard -> true
43→ | Filter _ -> false (* not total: may produce 0 outputs *)
44→ | Merge | MergeAny -> false (* requires interleaving *)
45→ | Barrier -> false (* requires synchronisation *)
46→ | Project _ -> false (* requires product stream *)
47→ | Tool _ -> false (* already base: nesting is nonsensical *)
48→ | Lower _ -> false (* already lowered: lowering a lowering is nonsensical *)
49→ | Builtin _ -> false (* already base: OS-level primitive *)
50→ | Tagger _ -> false (* stateful: protocol-boundary counter *)
51→ | Log _ -> false (* compiler-internal: debug tap *)
52→ | FormatJson -> true (* total: one output per input *)
53→ | ParseJson -> true (* total: one output per input *)
54→
55→(** Validate categorical level consistency.
56→ - Tool impl: must have bare types (A -> B)
57→ - Id, Map, Copy, Discard: may have bare or stream types
58→ - All other impls: must have stream types (!A -> !B) *)
59→let has_stream = function
60→ | Stream _ -> true
61→ | Product ts -> List.exists (function Stream _ -> true | _ -> false) ts
62→ | _ -> false
63→
64→let check_categorical_types bindings =
65→ List.iter (fun (b : binding) ->
66→ let is_bare = not (has_stream b.input) && not (has_stream b.output) in
67→ if is_bare && not (admits_bare_type b.impl) then
68→ Error.raise_ Config "config_error"
69→ (Printf.sprintf "%s: %s requires stream types !A -> !B"
70→ b.bd_name (match b.impl with
71→ | Agent _ -> "agent" | Plumb _ -> "plumb"
72→ | Filter _ -> "filter" | Merge | MergeAny -> "merge"
73→ | Barrier -> "barrier" | Project _ -> "project"
74→ | _ -> "this impl kind"));
75→ if not is_bare && (match b.impl with Tool _ | Lower _ | Builtin _ -> true | _ -> false) then
76→ Error.raise_ Config "config_error"
77→ (Printf.sprintf "%s: tool must use bare types A -> B, not streams"
78→ b.bd_name)
79→ ) bindings
80→
81→(** Validate that a named process is lowerable and its types match the
82→ tool binding's types (after stripping streams). *)
83→let check_lower_types b process_name bindings =
84→ match List.find_opt (fun pb -> pb.bd_name = process_name) bindings with
85→ | Some pb ->
86→ if not (is_lowerable pb.impl) then
87→ Error.raise_ Config "config_error"
88→ (Printf.sprintf
89→ "%s: cannot lower %s to a tool (%s is not total)"
90→ b.bd_name process_name process_name);
91→ let strip = function Stream t -> t | t -> t in
92→ let proc_in = strip pb.input in
93→ let proc_out = strip pb.output in
94→ if not (Validate.type_eq [] b.input proc_in) then
95→ Error.raise_ Config "config_error"
96→ (Printf.sprintf
97→ "%s: tool input type %s does not match process %s input %s"
98→ b.bd_name (pp_type_expr [] b.input)
99→ process_name (pp_type_expr [] proc_in));{
"pattern": "let build_type_env|build_type_env",
"path": "/Users/ww/src/plumbing",
"output_mode": "content",
"glob": "*.ml"
}<persisted-output> Output too large (44.5KB). Full output saved to: /Users/ww/.claude/projects/-Users-ww-src-plumbing/d438378e-8111-4168-abaf-ad446f02709d/tool-results/toolu_01Q7rTF2Phvd2uDNerS1KqwJ.txt Preview (first 2KB): bin/chat/main.ml:219: let type_env = Language.Validate.build_type_env prog.prog_types in test/test_routing.ml:13: let type_env = Validate.build_type_env prog.prog_types in test/test_routing.ml:103: let type_env = Validate.build_type_env prog.prog_types in test/test_routing.ml:219: let type_env = Validate.build_type_env prog.prog_types in test/test_session_compile.ml:518: let type_env = Validate.build_type_env prog.prog_types in test/test_session_compile.ml:615: let type_env = Validate.build_type_env prog.prog_types in test/test_session_compile.ml:683: let type_env = Validate.build_type_env prog.prog_types in parser.ml:132: let type_env = Validate.build_type_env (List.rev !types) in test/test_fabric.ml:22: try Validate.build_type_env prog.prog_types test/test_session_exec.ml:401: let type_env = Validate.build_type_env prog.prog_types in test/test_validate.ml:91: let env = build_type_env [ test/test_validate.ml:107: let _ = build_type_env [ test/test_validate.ml:148: let env = build_type_env [ test/test_validate.ml:171: let env = build_type_env [] in test/test_validate.ml:178: let env = build_type_env [] in test/test_validate.ml:188: let env = build_type_env [] in test/test_validate.ml:200: let env = build_type_env [] in test/test_validate.ml:210: let env = build_type_env [] in test/test_validate.ml:216: let env = build_type_env [] in test/test_validate.ml:227: let env = build_type_env [] in test/test_validate.ml:242: let env = build_type_env [] in test/test_validate.ml:254: let env = build_type_env [ test/test_validate.ml:269: let env = build_type_env [] in test/test_validate.ml:281: let env = build_type_env [] in test/test_validate.ml:299: let env = build_type_env [ test/test_validate.ml:308: let env = build_type_env [] in test/test_validate.ml:317: let env = build_type_env [] in test/test_validate.ml:325: let env = build_type_env [ test/test_validate.ml:360: let env = build_type_env [] in ... </persisted-output>
{
"file_path": "/Users/ww/src/plumbing/lib/runtime/spec.ml",
"offset": 98,
"limit": 60
} 98→ b.bd_name (pp_type_expr [] b.input)
99→ process_name (pp_type_expr [] proc_in));
100→ if not (Validate.type_eq [] b.output proc_out) then
101→ Error.raise_ Config "config_error"
102→ (Printf.sprintf
103→ "%s: tool output type %s does not match process %s output %s"
104→ b.bd_name (pp_type_expr [] b.output)
105→ process_name (pp_type_expr [] proc_out))
106→ | None -> () (* process not in scope here; runtime will catch it *)
107→
108→let check_tool_process_types bindings =
109→ List.iter (fun (b : binding) ->
110→ match b.impl with
111→ | Lower process_name ->
112→ check_lower_types b process_name bindings
113→ | Tool config ->
114→ (match List.find_map (fun (k, v) ->
115→ if k = "process" then
116→ (match v with Ident name -> Some name | _ -> None)
117→ else None
118→ ) config with
119→ | Some process_name -> check_lower_types b process_name bindings
120→ | None ->
121→ (* Check MCP tool config: if mcp is present, name is required *)
122→ let has_mcp = List.exists (fun (k, _) -> k = "mcp") config in
123→ let has_name = List.exists (fun (k, _) -> k = "name") config in
124→ if has_mcp && not has_name then
125→ raise (Error.Process_error (Error.make Config "config_error"
126→ (Printf.sprintf "%s: tool with 'mcp' config requires 'name' key"
127→ b.bd_name)))
128→ )
129→ | _ -> ()
130→ ) bindings
131→
132→(** Validate filter/map expression types at declaration site.
133→ Inline filter/map in wiring bodies are checked in the desugarer;
134→ top-level bindings bypass it so need an explicit check here. *)
135→let check_expr_types type_env bindings =
136→ List.iter (fun (b : binding) ->
137→ let strip_stream = function Stream t -> t | t -> t in
138→ match b.impl with
139→ | Filter expr ->
140→ (* Filter requires stream types — check_categorical_types
141→ guarantees this, so unwrap_stream is safe. *)
142→ let inp_t = Validate.unwrap_stream b.input in
143→ (* Filter is an endomorphism: input = output. *)
144→ let out_t = Validate.unwrap_stream b.output in
145→ if not (Validate.type_eq type_env inp_t out_t) then
146→ Error.raise_ Config "config_error"
147→ (Printf.sprintf
148→ "%s: filter must have equal input and output types, \
149→ got %s -> %s"
150→ b.bd_name (pp_type_expr type_env inp_t)
151→ (pp_type_expr type_env out_t));
152→ let result_t =
153→ (try Validate.infer_type type_env inp_t expr
154→ with Validate.Schema_error msg ->
155→ Error.raise_ Config "config_error"
156→ (Printf.sprintf "%s: filter type error: %s"
157→ b.bd_name msg))Now let me look at validate_program and load_main_from_file to understand the full loading pipeline.
{
"pattern": "let validate_program|let load_main_from_file",
"path": "/Users/ww/src/plumbing/lib/runtime/spec.ml",
"output_mode": "content",
"-A": 20
}276:let validate_program prog = 277- let type_env = 278- try Validate.build_type_env prog.prog_types 279- with Validate.Schema_error msg -> 280- raise (Error.Process_error (Error.make Config "config_error" msg)) 281- in 282- (try 283- List.iter (fun td -> Validate.check_sums type_env td.td_type) prog.prog_types; 284- List.iter (fun b -> 285- Validate.check_sums type_env b.input; 286- Validate.check_sums type_env b.output 287- ) prog.prog_bindings 288- with Validate.Schema_error msg -> 289- raise (Error.Process_error (Error.make Config "config_error" msg))); 290- check_reserved_names prog.prog_bindings; 291- check_categorical_types prog.prog_bindings; 292- check_tool_process_types prog.prog_bindings; 293- check_expr_types type_env prog.prog_bindings; 294- (prog, type_env) 295- 296-(** Load a plumb program from a file path, resolving [use] declarations, -- 298:let load_main_from_file ?(search_path = []) path = 299- let prog = 300- try Language.Loader.load_file ~search_path path 301- with Language.Loader.Load_error msg -> 302- raise (Error.Process_error (Error.make Config "config_error" msg)) 303- in 304- let prog, type_env = validate_program prog in 305- let prog = desugar_plumbs type_env prog in 306- let main = find_main prog in 307- (prog, main) 308- 309-(** Load a plumb program from source text, resolving [use] declarations, 310- and find its [main] binding. *) 311-let load_main_from_string ?(search_path = []) ?(importing_dir = Sys.getcwd ()) 312- ?source_name source = 313- let prog = 314- try Language.Loader.load_string ~search_path ~importing_dir 315- ?source_name source 316- with Language.Loader.Load_error msg -> 317- raise (Error.Process_error (Error.make Config "config_error" msg)) 318- in
Good, load_main_from_file returns (prog, main) but doesn't return type_env. I need to rebuild it. Let me also check the find_main function and what it returns.
{
"pattern": "let find_main",
"path": "/Users/ww/src/plumbing/lib/runtime/spec.ml",
"output_mode": "content",
"-A": 20
}205:let find_main prog = 206- match List.find_opt (fun b -> b.bd_name = "main") prog.prog_bindings with 207- | Some b -> b 208- | None -> Error.raise_ Config "config_error" "no main binding" 209- 210-(** Load a plumb program and find its [main] binding. *) 211-let load_main source = 212- let prog = 213- try parse source 214- with Parse_error msg -> 215- raise (Error.Process_error (Error.make Config "config_error" msg)) 216- in 217- let type_env = 218- try Validate.build_type_env prog.prog_types 219- with Validate.Schema_error msg -> 220- raise (Error.Process_error (Error.make Config "config_error" msg)) 221- in 222- (* Check disjointness of sum types in declarations and bindings *) 223- (try 224- List.iter (fun td -> Validate.check_sums type_env td.td_type) prog.prog_types; 225- List.iter (fun b ->
{
"pattern": "spawn_subprocess",
"path": "/Users/ww/src/plumbing/lib/spawn/spawn_lib.ml",
"output_mode": "content",
"-A": 30
}137:let spawn_subprocess ~bin_path ~spec_path
138- ~stdin_fd ~stdout_fd ?stderr_fd
139- ?(env=[]) ?cwd () =
140- let env_array = build_env env in
141- let pid = Unix.fork () in
142- if pid = 0 then begin
143- (* Child process: wire fds and exec. Use _exit (not exit) on all
144- failure paths to avoid running at_exit handlers in the child. *)
145- (match cwd with
146- | Some d ->
147- (try Unix.chdir d
148- with Unix.Unix_error (err, _, _) ->
149: Runtime.Log.error "spawn_subprocess"
150- ["event", `String "chdir_failed";
151- "path", `String d;
152- "detail", `String (Unix.error_message err)];
153- Unix._exit 2)
154- | None -> ());
155- Unix.dup2 stdin_fd Unix.stdin;
156- Unix.close stdin_fd;
157- Unix.dup2 stdout_fd Unix.stdout;
158- Unix.close stdout_fd;
159- (match stderr_fd with
160- | Some fd -> Unix.dup2 fd Unix.stderr; Unix.close fd
161- | None -> ());
162- (try Unix.execve bin_path [| bin_path; spec_path |] env_array
163- with exn ->
164: Runtime.Log.error "spawn_subprocess"
165- ["event", `String "execve_failed";
166- "binary", `String bin_path;
167- "detail", `String (Printexc.to_string exn)];
168- Unix._exit 2)
169- end else begin
170- Unix.close stdin_fd;
171- Unix.close stdout_fd;
172- (match stderr_fd with Some fd -> Unix.close fd | None -> ());
173- pid
174- end
175-
176-(** {1 Type Dependency Collection} *)
177-
178-module StringSet = Set.Make(String)
179-
180-(** Collect all Named type references from a type expression. *)
181-let rec collect_named acc = function
182- | Named name -> StringSet.add name acc
183- | Record fields -> List.fold_left (fun a f -> collect_named a f.typ) acc fields
184- | List t -> collect_named acc t
185- | Stream t -> collect_named acc t
186- | Sum variants -> List.fold_left collect_named acc variants
187- | Product components -> List.fold_left collect_named acc components
188- | Primitive _ | Unit -> acc
189-
190-(** Find the type declarations a binding depends on, transitively.
191- Walks the binding's input/output types, then chases Named references
192- through type declarations until reaching a fixed point. *)
193-let required_types (prog : program) (binding : binding) =
194- let all_type_decls = prog.prog_types ingit worktree add /tmp/plmb-037-step5b plmb-037-step5b 2>/dev/null || (git branch plmb-037-step5b main 2>/dev/null; git worktree add /tmp/plmb-037-step5b plmb-037-step5b)
Preparing worktree (checking out 'plmb-037-step5b') HEAD is now at f09eec8 Merge plmb-037-step5a: agent subprocess bridge (Step 5a)
Good. Now let me implement the changes. I'll start with all the files in parallel since I understand the full picture.
(executable
(name main)
(public_name plumb)
(libraries plumb fabric language runtime spawn_lib
eio_main eio.unix unix threads.posix yojson))
<tool_use_error>File has not been read yet. Read it first before writing to it.</tool_use_error>
(** Plumb runtime — fabric-based executor for plumb programs.
Parses a plumb program, compiles a routing table, starts the ZMQ
fabric, and bridges boundary ports to stdin/stdout. The binary is
a thin I/O adapter — all process orchestration happens in the
fabric via ZMQ sockets. *)
open Language
let () =
(* Default to Info; PIPELINE_DEBUG=1 wins (already set to Debug in Log
module initialiser). --verbose lowers to Debug explicitly. *)
let verbose = Array.exists (fun s -> s = "--verbose") Sys.argv in
if verbose then Runtime.Log.set_level Runtime.Log.Debug
else if Sys.getenv_opt "PIPELINE_DEBUG" <> Some "1" then
Runtime.Log.set_level Runtime.Log.Info;
(* Extract --provider/--model/--auto-approve from argv and set env vars.
Entry.main strips these from argv before extracting the spec path. *)
(match Runtime.Entry.resolve_flag Sys.argv "--provider" with
| Some p -> Unix.putenv "PLUMB_PROVIDER" p | None -> ());
(match Runtime.Entry.resolve_flag Sys.argv "--model" with
| Some m -> Unix.putenv "PLUMB_MODEL" m | None -> ());
let auto_approve = Array.exists (fun s -> s = "--auto-approve") Sys.argv in
let docroot = Runtime.Entry.resolve_docroot Sys.argv in
let search_path = Plumb.Stdlib_path.search_path () in
let tool_env = { Plumb.Tool_dispatch.docroot } in
(* Signal handling: set a SIGINT handler that cancels the fabric.
The handler sets an atomic flag and calls fabric.cancel() if a
fabric is running; otherwise raises Sys.Break for Entry.main to
catch at exit code 130. *)
let fabric_ref : Fabric.t option ref = ref None in
Sys.set_signal Sys.sigint (Sys.Signal_handle (fun _signo ->
match !fabric_ref with
| Some f -> f.Fabric.cancel ()
| None -> raise Sys.Break));
Runtime.Entry.main
~usage:"Usage: plumb [--verbose] [--docroot DIR] [--provider NAME] \
[--model NAME] [--auto-approve] <spec.plumb>"
~run:(fun spec_path ->
let prog, main = Runtime.Spec.load_main_from_file ~search_path spec_path in
let on_oitl_request =
if auto_approve then
fun (_req : Plumb.Tool_dispatch.oitl_request) ->
Plumb.Tool_dispatch.{ oitl_approved = true; oitl_policy = Some "always" }
else
fun (req : Plumb.Tool_dispatch.oitl_request) ->
Printf.eprintf "plumb: denied tool call '%s' by agent '%s' \
(use --auto-approve)\n%!" req.oitl_tool req.oitl_agent;
Plumb.Tool_dispatch.{ oitl_approved = false; oitl_policy = None }
in
(* Extract plumb body from main binding *)
let ports, pb = match main.impl with
| Plumb (ports, pb) -> (ports, pb)
| _ -> Runtime.Error.raise_ Config "config_error"
"main binding must be a plumb"
in
let type_env = Validate.build_type_env prog.prog_types in
let routing_table = Runtime.Routing.compile ~prog ~ports ~pb
~type_env ~main_binding:main ~seeded:pb.pb_seeded in
Runtime.Log.debug "plumb"
["event", `String "routing_table";
"detail", `String (Runtime.Routing.pp_routing_table routing_table)];
(* Build the bridge process handler for agent subprocesses *)
let handler = Fabric.Bridge.make_handler
~prog ~type_env
~bin_dir:(Lazy.force Plumb.Prefix_path.bin_dir)
~resource_find:Plumb.Resource_path.find
~search_path
~tool_env
~base_dir:(Filename.dirname spec_path)
~on_oitl_request
() in
(* Determine boundary port names from the main binding's port list.
Convention: first n_in ports are inputs, rest are outputs. *)
let n_in = List.length
(Validate.port_element_types type_env main.input) in
let input_port = List.nth ports 0 in
let output_port = List.nth ports n_in in
(* Run the fabric under Eio *)
let fabric_handle = ref None in
Eio_main.run (fun _env ->
Eio.Switch.run (fun sw ->
let f = Fabric.execute ~sw ~routing_table ~type_env
~process_handler:handler () in
fabric_handle := Some f;
fabric_ref := Some f;
(* Writer daemon: stdin → boundary input socket *)
let push = Fabric.boundary_input f input_port in
Eio.Fiber.fork_daemon ~sw (fun () ->
(try while true do
let line = Eio_unix.run_in_systhread (fun () ->
input_line stdin) in
Transport.Zmq.send push ~topic:input_port line
done with End_of_file | Sys_error _ -> ());
Fabric.Bridge.send_eof push ~topic:input_port;
`Stop_daemon);
(* Reader: boundary output socket → stdout (main fibre) *)
let pull = Fabric.boundary_output f output_port in
(try while true do
let (_topic, payload) = Transport.Zmq.recv pull in
if Fabric.Bridge.is_eof payload then raise Exit;
Eio_unix.run_in_systhread (fun () ->
print_endline payload)
done with
| Exit -> ()
| Transport.Transport_error _ -> ());
(* Reader exited — cancel the fabric to unwind all fibres *)
f.cancel ()));
(* Outside switch scope — safe to call shutdown (may sleep) *)
match !fabric_handle with
| Some f -> Fabric.shutdown f
| None -> ())
<tool_use_error>File has not been read yet. Read it first before writing to it.</tool_use_error>
1→(** Plumb runtime — general-purpose executor for plumb programs.
2→
3→ Parses a plumb program, finds the main binding, creates channels,
4→ spawns processes, wires boundary ports, and forwards stdin/stdout
5→ through the pipeline. Concurrent streaming forwarding — handles
6→ arbitrary n:m message ratios (copy, merge, etc.). *)
7→
8→let () =
9→ (* Default to Info; PIPELINE_DEBUG=1 wins (already set to Debug in Log
10→ module initialiser). --verbose lowers to Debug explicitly. *)
11→ let verbose = Array.exists (fun s -> s = "--verbose") Sys.argv in
12→ if verbose then Runtime.Log.set_level Runtime.Log.Debug
13→ else if Sys.getenv_opt "PIPELINE_DEBUG" <> Some "1" then
14→ Runtime.Log.set_level Runtime.Log.Info;
15→ (* Extract --provider/--model/--auto-approve from argv and set env vars.
16→ Entry.main strips these from argv before extracting the spec path. *)
17→ (match Runtime.Entry.resolve_flag Sys.argv "--provider" with
18→ | Some p -> Unix.putenv "PLUMB_PROVIDER" p | None -> ());
19→ (match Runtime.Entry.resolve_flag Sys.argv "--model" with
20→ | Some m -> Unix.putenv "PLUMB_MODEL" m | None -> ());
21→ let auto_approve = Array.exists (fun s -> s = "--auto-approve") Sys.argv in
22→ let docroot = Runtime.Entry.resolve_docroot Sys.argv in
23→ let search_path = Plumb.Stdlib_path.search_path () in
24→ let tool_env = { Plumb.Exec.docroot } in
25→ Sys.catch_break true;
26→ Runtime.Entry.main
27→ ~usage:"Usage: plumb [--verbose] [--docroot DIR] [--provider NAME] [--model NAME] [--auto-approve] <spec.plumb>"
28→ ~run:(fun spec_path ->
29→ let prog, main = Runtime.Spec.load_main_from_file ~search_path spec_path in
30→ let on_oitl_request =
31→ if auto_approve then
32→ fun (_req : Plumb.Exec.oitl_request) ->
33→ Plumb.Exec.{ oitl_approved = true; oitl_policy = Some "always" }
34→ else
35→ fun (req : Plumb.Exec.oitl_request) ->
36→ Printf.eprintf "plumb: denied tool call '%s' by agent '%s' \
37→ (use --auto-approve)\n%!" req.oitl_tool req.oitl_agent;
38→ Plumb.Exec.{ oitl_approved = false; oitl_policy = None }
39→ in
40→ let running = Plumb.Exec.execute ~search_path ~tool_env
41→ ~on_oitl_request prog main spec_path in
42→ let oc = Plumb.Exec.input running in
43→ let ic = Plumb.Exec.output running in
44→ (* Writer thread: stdin → pipeline input.
45→ release_ctrl sends stop to agents and closes ctrl fds.
46→ On normal EOF the executor's mux thread does this too
47→ (after data processing) — double-close is idempotent.
48→ On Ctrl-C, release_ctrl is the only path that sends stop. *)
49→ let writer = Thread.create (fun () ->
50→ (try while true do
51→ let line = input_line stdin in
52→ Printf.fprintf oc "%s\n%!" line
53→ done with _ -> ());
54→ close_out_noerr oc;
55→ Plumb.Exec.release_ctrl running
56→ ) () in
57→ (* Main thread: pipeline output → stdout *)
58→ (try while true do
59→ let resp = input_line ic in
60→ print_endline resp
61→ done with
62→ | End_of_file -> ()
63→ | Sys.Break -> close_in_noerr stdin);
64→ Thread.join writer;
65→ ignore (Plumb.Exec.shutdown running))
66→ 1→(executable
2→ (name main)
3→ (public_name plumb)
4→ (libraries plumb language runtime unix threads.posix))
5→ 1→(** Bridge — stdio mux/demux adapter for agent subprocesses.
2→
3→ Natural transformation between ZMQ topic-framed sockets and the
4→ agent binary's port-multiplexed stdin/stdout protocol. The agent
5→ sees the same [{"__port":...,"msg":...}] envelopes as under exec.ml.
6→
7→ All pipe writes use [Mutex] + [Eio_unix.run_in_systhread] (R2).
8→
9→ The [__port] envelope protocol is confined to this module and the
10→ agent binary — it must never appear in [fabric.ml] or [inline.ml]
11→ (R7: the bridge is NOT a morphism, it is a natural transformation
12→ between two concrete stream representations). *)
13→
14→open Language
15→
16→module T = Transport.Zmq
17→
18→(** {1 EOF Sentinel}
19→
20→ An empty-payload message on a PUSH socket signals end-of-stream.
21→ The receiver exits its recv loop on seeing this sentinel. *)
22→
23→let send_eof sock ~topic =
24→ T.send sock ~topic ""
25→
26→let is_eof payload =
27→ String.length payload = 0
28→
29→(** {1 Process Handler Type}
30→
31→ The fabric calls a process handler for each subprocess entry.
32→ The handler receives the process's ZMQ socket endpoints and starts
33→ the subprocess. The fabric does not know how. *)
34→
35→type process_handler =
36→ sw:Eio.Switch.t ->
37→ proc_entry:Runtime.Routing.process_entry ->
38→ push_sockets:(string * [`Push] T.socket) list ->
39→ pull_sockets:(string * [`Pull] T.socket) list ->
40→ unit
41→
42→(** {1 Tool Dispatch Callback}
43→
44→ The bridge accepts an optional tool dispatch callback rather than
45→ depending on [Tool_dispatch] directly. The caller constructs
46→ this from [Tool_dispatch.tool_dispatch_loop]. *)
47→
48→type tool_dispatch =
49→ req_fd:Unix.file_descr ->
50→ resp_fd:Unix.file_descr ->
51→ agent_name:string ->
52→ unit
53→
54→(** {1 Observation Callback} *)
55→
56→type observation = {
57→ agent : string;
58→ payload : Yojson.Safe.t;
59→}
60→
61→(** {1 Pipe Write Helpers}
62→
63→ All writes to child stdin go through mux_envelope / mux_eof,
64→ protected by an OCaml [Mutex]. These MUST be called from within
65→ [Eio_unix.run_in_systhread] — [Mutex.lock] is not safe from
66→ Eio fibres. *)
67→
68→(** Write a port-tagged envelope to the child stdin.
69→ Caller must hold (or be inside run_in_systhread for) the mux mutex. *)
70→let write_envelope mux_m oc port json_str =
71→ match Yojson.Safe.from_string json_str with
72→ | _ ->
73→ let s = Printf.sprintf
74→ "{\"__port\":%S,\"msg\":%s}\n" port json_str in
75→ Mutex.lock mux_m;
76→ Fun.protect ~finally:(fun () -> Mutex.unlock mux_m) (fun () ->
77→ output_string oc s; flush oc)
78→ | exception Yojson.Json_error msg ->
79→ Runtime.Log.warn "bridge_mux"
80→ [("port", `String port); ("error", `String msg)]
81→
82→(** Write an EOF marker for a port. *)
83→let write_eof mux_m oc port =
84→ let s = Printf.sprintf
85→ "{\"__port\":%S,\"__eof\":true}\n" port in
86→ Mutex.lock mux_m;
87→ Fun.protect ~finally:(fun () -> Mutex.unlock mux_m) (fun () ->
88→ output_string oc s; flush oc)
89→
90→(** {1 Bridge Construction} *)
91→
92→(** Construct a process handler that spawns agent subprocesses with
93→ stdio bridge. *)
94→let make_handler
95→ ~(prog : program)
96→ ~type_env
97→ ~(bin_dir : string option)
98→ ~(resource_find : string -> string)
99→ ?(resolve_bin : (string -> string) option)
100→ ?(tool_dispatch_fn : tool_dispatch option)
101→ ?(on_observe : (observation -> unit) option)
102→ ()
103→ : process_handler =
104→ fun ~sw ~proc_entry ~push_sockets ~pull_sockets ->
105→ let pe = proc_entry in
106→ let binding = pe.Runtime.Routing.proc_binding in
107→ let config = match binding.impl with
108→ | Agent config -> config
109→ | _ -> failwith "bridge: expected Agent impl"
110→ in
111→
112→ (* Resolve prompt paths *)
113→ let config = Spawn_lib.resolve_prompts_config
114→ ~resource_find ~base_dir:(Sys.getcwd ()) config in
115→ let binding = { binding with impl = Agent config } in
116→
117→ (* Decompose input/output types *)
118→ let decomp_in = Spawn_lib.agent_input_decomposition type_env binding.input in
119→ let decomp_out = Spawn_lib.agent_output_decomposition type_env binding.output in
120→ let has_control = decomp_in <> None in
121→ let has_telemetry = decomp_out <> None in
122→
123→ (* Synthesize child spec with simple types (no products) *)
124→ let child_binding =
125→ let b = match decomp_in with
126→ | Some (data_t, _) -> { binding with input = Stream data_t }
127→ | None -> binding
128→ in
129→ match decomp_out with
130→ | Some (result_t, _) -> { b with output = Stream result_t }
131→ | None -> b
132→ in
133→
134→ (* Find tool bindings *)
135→ let tool_names = match List.assoc_opt "tools" config with
136→ | Some (Arr items) ->
137→ List.filter_map (fun item ->
138→ match item with Ident name -> Some name | _ -> None
139→ ) items
140→ | _ -> []
141→ in
142→ let tool_bindings = List.filter_map (fun name ->
143→ List.find_opt (fun (b : binding) ->
144→ b.bd_name = name && Validate.is_tool_binding b
145→ ) prog.prog_bindings
146→ ) tool_names in
147→ let has_tools = tool_bindings <> [] || tool_dispatch_fn <> None in
148→
149→ (* Synthesize spec *)
150→ let spec_source =
151→ if tool_bindings <> [] then
152→ Spawn_lib.synthesize_agent_spec prog child_binding tool_bindings
153→ else
154→ Spawn_lib.synthesize_spec prog child_binding
155→ in
156→ let spec_path = Spawn_lib.write_temp_spec spec_source in
157→
158→ (* Resolve binary *)
159→ let resolve = match resolve_bin with
160→ | Some f -> f
161→ | None -> Spawn_lib.resolve_binary ~bin_dir
162→ in
163→ let bin_path = resolve "plumb-agent" in
164→
165→ (* Create child stdin/stdout pipes *)
166→ let child_in_r, child_in_w = Unix.pipe ~cloexec:true () in
167→ let child_out_r, child_out_w = Unix.pipe ~cloexec:true () in
168→ let child_in_oc = Unix.out_channel_of_descr child_in_w in
169→
170→ (* MCP connections and env *)
171→ let mcp_connections = match List.assoc_opt "mcp" config with
172→ | Some (Arr items) ->
173→ List.filter_map (fun item ->
174→ match item with
175→ | Obj kvs ->
176→ (match Spawn_lib.connect_mcp_server kvs with
177→ | Ok conn -> Some conn
178→ | Error msg ->
179→ Runtime.Log.warn "mcp_connect_failed"
180→ [("detail", `String msg)];
181→ None)
182→ | _ -> None
183→ ) items
184→ | _ -> []
185→ in
186→ let mcp_env = if mcp_connections <> [] then begin
187→ let mcp_schemas = List.concat_map (fun conn ->
188→ List.map Mcp_client.tool_schema (Mcp_client.tools conn)
189→ ) mcp_connections in
190→ let mcp_path = Filename.temp_file "agentpi_mcp_" ".json" in
191→ let mcp_oc = open_out_gen
192→ [Open_wronly; Open_creat; Open_trunc] 0o600 mcp_path in
193→ Fun.protect ~finally:(fun () -> close_out mcp_oc) (fun () ->
194→ output_string mcp_oc
195→ (Yojson.Safe.to_string (`List mcp_schemas)));
196→ [("PLUMB_MCP_TOOLS", mcp_path)]
197→ end else [] in
198→
199→ (* Agent children need LLM credentials, docroot, and name *)
200→ let env = Spawn_lib.credential_env () @ Spawn_lib.provider_env ()
201→ @ [("PLUMB_AGENT_NAME", binding.bd_name)]
202→ @ mcp_env in
203→
204→ (* Spawn child — closes child_in_r and child_out_w in parent *)
205→ let pid = Spawn_lib.spawn_subprocess ~bin_path ~spec_path
206→ ~stdin_fd:child_in_r ~stdout_fd:child_out_w
207→ ~env () in
208→ Runtime.Log.info "bridge"
209→ ["event", `String "spawned";
210→ "agent", `String binding.bd_name;
211→ "pid", `Int pid;
212→ "has_control", `Bool has_control;
213→ "has_telemetry", `Bool has_telemetry;
214→ "has_tools", `Bool has_tools];
215→
216→ (* Mux: OCaml Mutex for system-thread safety (R2) *)
217→ let mux_m = Mutex.create () in
218→
219→ (* Track active mux sources; close child stdin when last exits *)
220→ let n_sources = 2 + (if has_tools && tool_dispatch_fn <> None then 1 else 0) in
221→ let mux_remaining = ref n_sources in
222→ let mux_rem_m = Mutex.create () in
223→ let mux_source_done () =
224→ Mutex.lock mux_rem_m;
225→ decr mux_remaining;
226→ let n = !mux_remaining in
227→ Mutex.unlock mux_rem_m;
228→ if n = 0 then close_out_noerr child_in_oc
229→ in
230→
231→ (* Find sockets by port name *)
232→ let find_pull port = List.assoc_opt port pull_sockets in
233→ let find_push port = List.assoc_opt port push_sockets in
234→
235→ (* === Tool dispatch setup === *)
236→ let tool_req_oc = ref None in
237→ let tool_resp_r = ref None in
238→ (match tool_dispatch_fn with
239→ | Some dispatch when has_tools ->
240→ let req_r, req_w = Unix.pipe ~cloexec:true () in
241→ let resp_r', resp_w = Unix.pipe ~cloexec:true () in
242→ tool_req_oc := Some (Unix.out_channel_of_descr req_w);
243→ tool_resp_r := Some resp_r';
244→ let _dispatch_thread = Thread.create (fun () ->
245→ dispatch ~req_fd:req_r ~resp_fd:resp_w
246→ ~agent_name:binding.bd_name
247→ ) () in
248→ ()
249→ | _ -> ());
250→
251→ (* === Mux: data input (ZMQ PULL → child stdin) ===
252→
253→ T.recv runs in Eio fibre context (uses Eio_unix.await_readable).
254→ Pipe writes use Eio_unix.run_in_systhread + Mutex. *)
255→ (match find_pull "data_in" with
256→ | Some data_in_sock ->
257→ Eio.Fiber.fork_daemon ~sw (fun () ->
258→ (try
259→ while true do
260→ let (_topic, payload) = T.recv data_in_sock in
261→ if is_eof payload then begin
262→ Eio_unix.run_in_systhread (fun () ->
263→ (try write_eof mux_m child_in_oc "input"
264→ with Sys_error _ -> ()));
265→ raise Exit
266→ end;
267→ Eio_unix.run_in_systhread (fun () ->
268→ try write_envelope mux_m child_in_oc "input" payload
269→ with Sys_error _ -> raise Exit)
270→ done
271→ with
272→ | Exit -> ()
273→ | Transport.Transport_error (_, _msg) -> ());
274→ (* After data EOF, send ctrl stop (R4) *)
275→ if has_control then
276→ Eio_unix.run_in_systhread (fun () ->
277→ try
278→ let s = "{\"__port\":\"ctrl_in\",\"msg\":{\"stop\":true}}\n" in
279→ Mutex.lock mux_m;
280→ Fun.protect ~finally:(fun () -> Mutex.unlock mux_m) (fun () ->
281→ output_string child_in_oc s; flush child_in_oc)
282→ with Sys_error _ -> ());
283→ Eio_unix.run_in_systhread (fun () -> mux_source_done ());
284→ `Stop_daemon)
285→ | None ->
286→ (* No data input socket — mux source done immediately *)
287→ mux_source_done ());
288→
289→ (* === Mux: control input (ZMQ PULL → child stdin) === *)
290→ (match find_pull "ctrl_in" with
291→ | Some ctrl_in_sock ->
292→ Eio.Fiber.fork_daemon ~sw (fun () ->
293→ (try
294→ while true do
295→ let (_topic, payload) = T.recv ctrl_in_sock in
296→ if is_eof payload then begin
297→ Eio_unix.run_in_systhread (fun () ->
298→ (try write_eof mux_m child_in_oc "ctrl_in"
299→ with Sys_error _ -> ()));
300→ raise Exit
301→ end;
302→ Eio_unix.run_in_systhread (fun () ->
303→ try write_envelope mux_m child_in_oc "ctrl_in" payload
304→ with Sys_error _ -> raise Exit)
305→ done
306→ with
307→ | Exit -> ()
308→ | Transport.Transport_error (_, _msg) -> ());
309→ Eio_unix.run_in_systhread (fun () -> mux_source_done ());
310→ `Stop_daemon)
311→ | None ->
312→ (* No control socket — mux source done immediately *)
313→ mux_source_done ());
314→
315→ (* === Mux: tool responses (pipe → child stdin) === *)
316→ (match !tool_resp_r with
317→ | Some resp_r' ->
318→ Eio.Fiber.fork_daemon ~sw (fun () ->
319→ Eio_unix.run_in_systhread (fun () ->
320→ let ic = Unix.in_channel_of_descr resp_r' in
321→ (try while true do
322→ let line = input_line ic in
323→ write_envelope mux_m child_in_oc "tool_resp" line
324→ done with End_of_file | Sys_error _ -> ());
325→ close_in_noerr ic;
326→ mux_source_done ());
327→ `Stop_daemon)
328→ | None -> ());
329→
330→ (* === Demux: child stdout → ZMQ PUSH sockets ===
331→
332→ Pipe reads use Eio_unix.run_in_systhread (blocking input_line).
333→ ZMQ sends run in Eio fibre context. *)
334→ let data_out_sock = find_push "data_out" in
335→ let ctrl_out_sock = find_push "ctrl_out" in
336→ let telemetry_sock = find_push "telemetry" in
337→
338→ Eio.Fiber.fork_daemon ~sw (fun () ->
339→ let ic = Unix.in_channel_of_descr child_out_r in
340→ (try while true do
341→ let line = Eio_unix.run_in_systhread (fun () ->
342→ input_line ic) in
343→ if String.length (String.trim line) > 0 then
344→ match Yojson.Safe.from_string line with
345→ | `Assoc fields ->
346→ let port = match List.assoc_opt "__port" fields with
347→ | Some (`String s) -> s | _ -> "output" in
348→ let msg = match List.assoc_opt "msg" fields with
349→ | Some m -> Yojson.Safe.to_string m
350→ | None -> line in
351→ (match port with
352→ | "output" ->
353→ Option.iter (fun sock ->
354→ T.send sock ~topic:binding.bd_name msg
355→ ) data_out_sock
356→ | "ctrl_out" ->
357→ Option.iter (fun sock ->
358→ T.send sock ~topic:binding.bd_name msg
359→ ) ctrl_out_sock
360→ | "telemetry" ->
361→ Option.iter (fun sock ->
362→ T.send sock ~topic:binding.bd_name msg
363→ ) telemetry_sock;
364→ (match on_observe with
365→ | Some cb ->
366→ (try
367→ let json = Yojson.Safe.from_string msg in
368→ cb { agent = binding.bd_name; payload = json }
369→ with _ -> ())
370→ | None -> ())
371→ | "tool_req" ->
372→ Eio_unix.run_in_systhread (fun () ->
373→ match !tool_req_oc with
374→ | Some oc ->
375→ output_string oc msg;
376→ output_char oc '\n';
377→ flush oc
378→ | None ->
379→ Runtime.Log.warn "bridge_demux"
380→ [("event", `String "tool_req_no_dispatch");
381→ ("agent", `String binding.bd_name)])
382→ | _ ->
383→ Runtime.Log.warn "bridge_demux"
384→ [("port", `String port);
385→ ("agent", `String binding.bd_name)])
386→ | _ ->
387→ (* Not an envelope — treat as raw output *)
388→ Option.iter (fun sock ->
389→ T.send sock ~topic:binding.bd_name line
390→ ) data_out_sock
391→ | exception exn ->
392→ Runtime.Log.debug "bridge_demux"
393→ ["agent", `String binding.bd_name;
394→ "event", `String "json_parse_failed";
395→ "error", `String (Printexc.to_string exn)];
396→ Option.iter (fun sock ->
397→ T.send sock ~topic:binding.bd_name line
398→ ) data_out_sock
399→ done with
400→ | End_of_file -> ()
401→ | Sys_error msg ->
402→ Runtime.Log.debug "bridge_demux"
403→ ["agent", `String binding.bd_name;
404→ "event", `String "pipe_closed";
405→ "detail", `String msg]);
406→ close_in_noerr ic;
407→ (* Send EOF sentinel on all output ZMQ sockets (R5) *)
408→ Option.iter (fun sock ->
409→ (try send_eof sock ~topic:binding.bd_name with _ -> ())
410→ ) data_out_sock;
411→ Option.iter (fun sock ->
412→ (try send_eof sock ~topic:binding.bd_name with _ -> ())
413→ ) ctrl_out_sock;
414→ Option.iter (fun sock ->
415→ (try send_eof sock ~topic:binding.bd_name with _ -> ())
416→ ) telemetry_sock;
417→ (* Close tool_req write-end to terminate dispatch loop *)
418→ Eio_unix.run_in_systhread (fun () ->
419→ (match !tool_req_oc with
420→ | Some oc -> close_out_noerr oc
421→ | None -> ());
422→ (* Reap child process (R6) *)
423→ let timeout_s = match Spawn_lib.timeout_of_binding binding with
424→ | Some t -> t | None -> 60.0 in
425→ let status, timed_out =
426→ Spawn_lib.waitpid_timed ~timeout_seconds:timeout_s pid in
427→ if timed_out then
428→ Runtime.Log.warn "bridge_timeout"
429→ ["agent", `String binding.bd_name;
430→ "timeout_seconds", `Float timeout_s];
431→ (match status with
432→ | Unix.WEXITED 0 -> ()
433→ | Unix.WEXITED code ->
434→ Runtime.Log.warn "bridge_exit"
435→ ["agent", `String binding.bd_name;
436→ "code", `Int code]
437→ | Unix.WSIGNALED sig_ ->
438→ Runtime.Log.warn "bridge_signal"
439→ ["agent", `String binding.bd_name;
440→ "signal", `Int sig_]
441→ | Unix.WSTOPPED _ -> ());
442→ (* Clean up temp files *)
443→ (try Sys.remove spec_path with Sys_error _ -> ()));
444→ `Stop_daemon);
445→
446→ ignore has_telemetry
447→Now let me check how tool_dispatch_loop is used in bridge vs how it needs Plumb.Tool_dispatch — I need to understand if the bridge can depend on Plumb.Tool_dispatch directly or if it needs the callback approach. Let me check the dependency structure.
1→(library
2→ (name fabric)
3→ (public_name plumbing.fabric)
4→ (libraries transport supervisor runtime language spawn_lib yojson eio eio.unix unix threads.posix))
5→ 1→(library
2→ (name plumb)
3→ (public_name plumbing)
4→ (libraries language runtime spawn_lib docstore mcp_client unix threads.posix dune-site))
5→
6→(generate_sites_module
7→ (module sites)
8→ (sites plumbing))
9→ 1→(library
2→ (name spawn_lib)
3→ (public_name plumbing.spawn)
4→ (libraries language runtime mcp_client unix))
5→{
"pattern": "Mcp_client",
"path": "/tmp/plmb-037-step5b/lib/fabric/bridge.ml",
"output_mode": "content"
}(executable
(name main)
(public_name plumb)
(libraries plumb fabric language runtime spawn_lib
eio_main eio.unix unix threads.posix yojson))
Now the bridge — expand tool_dispatch type, fix M1 (fork safety), M2 (join dispatch thread), M3 (MCP temp file cleanup):
(** {1 Tool Dispatch Callback}
The bridge accepts an optional tool dispatch callback rather than
depending on [Tool_dispatch] directly. The caller constructs
this from [Tool_dispatch.tool_dispatch_loop]. *)
type tool_dispatch =
req_fd:Unix.file_descr ->
resp_fd:Unix.file_descr ->
agent_name:string ->
unit(** {1 Tool Dispatch Callback}
The bridge accepts an optional tool dispatch callback rather than
depending on [Tool_dispatch] directly. The caller constructs
this from [Tool_dispatch.tool_dispatch_loop], capturing search_path,
tool_env, prog, type_env, etc. in the closure. The bridge passes
the per-agent [tool_bindings] and [mcp_connections] it resolves
internally. *)
type tool_dispatch =
req_fd:Unix.file_descr ->
resp_fd:Unix.file_descr ->
agent_name:string ->
tool_bindings:binding list ->
mcp_connections:Mcp_client.t list ->
unit (* Spawn child — closes child_in_r and child_out_w in parent *)
let pid = Spawn_lib.spawn_subprocess ~bin_path ~spec_path
~stdin_fd:child_in_r ~stdout_fd:child_out_w
~env () in (* Spawn child — closes child_in_r and child_out_w in parent.
Wrapped in run_in_systhread to avoid forking from an Eio fibre
(M1: scheduler state corruption). *)
let pid = Eio_unix.run_in_systhread (fun () ->
Spawn_lib.spawn_subprocess ~bin_path ~spec_path
~stdin_fd:child_in_r ~stdout_fd:child_out_w
~env ()) inNow fix has_tools check and track the MCP temp file path for cleanup (M3), and update the dispatch thread to pass new args and track it (M2):
let mcp_env = if mcp_connections <> [] then begin
let mcp_schemas = List.concat_map (fun conn ->
List.map Mcp_client.tool_schema (Mcp_client.tools conn)
) mcp_connections in
let mcp_path = Filename.temp_file "agentpi_mcp_" ".json" in
let mcp_oc = open_out_gen
[Open_wronly; Open_creat; Open_trunc] 0o600 mcp_path in
Fun.protect ~finally:(fun () -> close_out mcp_oc) (fun () ->
output_string mcp_oc
(Yojson.Safe.to_string (`List mcp_schemas)));
[("PLUMB_MCP_TOOLS", mcp_path)]
end else [] in let mcp_temp_path = ref None in
let mcp_env = if mcp_connections <> [] then begin
let mcp_schemas = List.concat_map (fun conn ->
List.map Mcp_client.tool_schema (Mcp_client.tools conn)
) mcp_connections in
let mcp_path = Filename.temp_file "agentpi_mcp_" ".json" in
mcp_temp_path := Some mcp_path;
let mcp_oc = open_out_gen
[Open_wronly; Open_creat; Open_trunc] 0o600 mcp_path in
Fun.protect ~finally:(fun () -> close_out mcp_oc) (fun () ->
output_string mcp_oc
(Yojson.Safe.to_string (`List mcp_schemas)));
[("PLUMB_MCP_TOOLS", mcp_path)]
end else [] in (* === Tool dispatch setup === *)
let tool_req_oc = ref None in
let tool_resp_r = ref None in
(match tool_dispatch_fn with
| Some dispatch when has_tools ->
let req_r, req_w = Unix.pipe ~cloexec:true () in
let resp_r', resp_w = Unix.pipe ~cloexec:true () in
tool_req_oc := Some (Unix.out_channel_of_descr req_w);
tool_resp_r := Some resp_r';
let _dispatch_thread = Thread.create (fun () ->
dispatch ~req_fd:req_r ~resp_fd:resp_w
~agent_name:binding.bd_name
) () in
()
| _ -> ()); (* === Tool dispatch setup === *)
let tool_req_oc = ref None in
let tool_resp_r = ref None in
let dispatch_thread = ref None in
(match tool_dispatch_fn with
| Some dispatch when has_tools ->
let req_r, req_w = Unix.pipe ~cloexec:true () in
let resp_r', resp_w = Unix.pipe ~cloexec:true () in
tool_req_oc := Some (Unix.out_channel_of_descr req_w);
tool_resp_r := Some resp_r';
dispatch_thread := Some (Thread.create (fun () ->
dispatch ~req_fd:req_r ~resp_fd:resp_w
~agent_name:binding.bd_name
~tool_bindings ~mcp_connections
) ())
| _ -> ());Now update the cleanup section in the demux to join the dispatch thread (M2) and clean up the MCP temp file (M3):
(* Close tool_req write-end to terminate dispatch loop *)
Eio_unix.run_in_systhread (fun () ->
(match !tool_req_oc with
| Some oc -> close_out_noerr oc
| None -> ());
(* Reap child process (R6) *)
let timeout_s = match Spawn_lib.timeout_of_binding binding with
| Some t -> t | None -> 60.0 in
let status, timed_out =
Spawn_lib.waitpid_timed ~timeout_seconds:timeout_s pid in
if timed_out then
Runtime.Log.warn "bridge_timeout"
["agent", `String binding.bd_name;
"timeout_seconds", `Float timeout_s];
(match status with
| Unix.WEXITED 0 -> ()
| Unix.WEXITED code ->
Runtime.Log.warn "bridge_exit"
["agent", `String binding.bd_name;
"code", `Int code]
| Unix.WSIGNALED sig_ ->
Runtime.Log.warn "bridge_signal"
["agent", `String binding.bd_name;
"signal", `Int sig_]
| Unix.WSTOPPED _ -> ());
(* Clean up temp files *)
(try Sys.remove spec_path with Sys_error _ -> ()));
`Stop_daemon); (* Close tool_req write-end to terminate dispatch loop *)
Eio_unix.run_in_systhread (fun () ->
(match !tool_req_oc with
| Some oc -> close_out_noerr oc
| None -> ());
(* Join dispatch thread (M2) *)
(match !dispatch_thread with
| Some t -> Thread.join t
| None -> ());
(* Reap child process (R6) *)
let timeout_s = match Spawn_lib.timeout_of_binding binding with
| Some t -> t | None -> 60.0 in
let status, timed_out =
Spawn_lib.waitpid_timed ~timeout_seconds:timeout_s pid in
if timed_out then
Runtime.Log.warn "bridge_timeout"
["agent", `String binding.bd_name;
"timeout_seconds", `Float timeout_s];
(match status with
| Unix.WEXITED 0 -> ()
| Unix.WEXITED code ->
Runtime.Log.warn "bridge_exit"
["agent", `String binding.bd_name;
"code", `Int code]
| Unix.WSIGNALED sig_ ->
Runtime.Log.warn "bridge_signal"
["agent", `String binding.bd_name;
"signal", `Int sig_]
| Unix.WSTOPPED _ -> ());
(* Clean up temp files (M3: also clean MCP temp file) *)
(try Sys.remove spec_path with Sys_error _ -> ());
(match !mcp_temp_path with
| Some p -> (try Sys.remove p with Sys_error _ -> ())
| None -> ()));
`Stop_daemon);(** Plumb runtime — fabric-based executor for plumb programs.
Parses a plumb program, compiles a routing table, starts the ZMQ
fabric, and bridges boundary ports to stdin/stdout. The binary is
a thin I/O adapter — all process orchestration happens in the
fabric via ZMQ sockets. *)
open Language
let () =
(* Default to Info; PIPELINE_DEBUG=1 wins (already set to Debug in Log
module initialiser). --verbose lowers to Debug explicitly. *)
let verbose = Array.exists (fun s -> s = "--verbose") Sys.argv in
if verbose then Runtime.Log.set_level Runtime.Log.Debug
else if Sys.getenv_opt "PIPELINE_DEBUG" <> Some "1" then
Runtime.Log.set_level Runtime.Log.Info;
(* Extract --provider/--model/--auto-approve from argv and set env vars.
Entry.main strips these from argv before extracting the spec path. *)
(match Runtime.Entry.resolve_flag Sys.argv "--provider" with
| Some p -> Unix.putenv "PLUMB_PROVIDER" p | None -> ());
(match Runtime.Entry.resolve_flag Sys.argv "--model" with
| Some m -> Unix.putenv "PLUMB_MODEL" m | None -> ());
let auto_approve = Array.exists (fun s -> s = "--auto-approve") Sys.argv in
let docroot = Runtime.Entry.resolve_docroot Sys.argv in
let search_path = Plumb.Stdlib_path.search_path () in
let tool_env = { Plumb.Tool_dispatch.docroot } in
(* Signal handling: install SIGINT handler before Eio_main.run.
When a fabric is running, the handler calls fabric.cancel() to
trigger orderly shutdown. When no fabric is running, it raises
Sys.Break for Entry.main to catch at exit code 130. *)
let fabric_ref : Fabric.t option ref = ref None in
Sys.set_signal Sys.sigint (Sys.Signal_handle (fun _signo ->
match !fabric_ref with
| Some f -> f.Fabric.cancel ()
| None -> raise Sys.Break));
Runtime.Entry.main
~usage:"Usage: plumb [--verbose] [--docroot DIR] [--provider NAME] \
[--model NAME] [--auto-approve] <spec.plumb>"
~run:(fun spec_path ->
let prog, main = Runtime.Spec.load_main_from_file ~search_path spec_path in
let on_oitl_request =
if auto_approve then
fun (_req : Plumb.Tool_dispatch.oitl_request) ->
Plumb.Tool_dispatch.{ oitl_approved = true; oitl_policy = Some "always" }
else
fun (req : Plumb.Tool_dispatch.oitl_request) ->
Printf.eprintf "plumb: denied tool call '%s' by agent '%s' \
(use --auto-approve)\n%!" req.oitl_tool req.oitl_agent;
Plumb.Tool_dispatch.{ oitl_approved = false; oitl_policy = None }
in
(* Extract plumb body from main binding *)
let ports, pb = match main.impl with
| Plumb (ports, pb) -> (ports, pb)
| _ -> Runtime.Error.raise_ Config "config_error"
"main binding must be a plumb"
in
let type_env = Validate.build_type_env prog.prog_types in
let routing_table = Runtime.Routing.compile ~prog ~ports ~pb
~type_env ~main_binding:main ~seeded:pb.pb_seeded in
Runtime.Log.debug "plumb"
["event", `String "routing_table";
"detail", `String (Runtime.Routing.pp_routing_table routing_table)];
(* Tool dispatch callback — captures search_path, tool_env, prog,
type_env, on_oitl_request, base_dir in the closure. The bridge
passes per-agent tool_bindings and mcp_connections. *)
let base_dir = Filename.dirname spec_path in
let tool_dispatch_fn
~req_fd ~resp_fd ~agent_name ~tool_bindings ~mcp_connections =
let call_pipeline _source _input =
(Runtime.Error.(to_string (make Unavailable "not_implemented"
"recursive pipeline invocation not available in fabric mode")),
true)
in
Plumb.Tool_dispatch.tool_dispatch_loop
~base_dir ~call_pipeline ~search_path tool_env type_env prog
tool_bindings ~mcp_connections ~on_oitl_request
~agent_name req_fd resp_fd
in
(* Build the bridge process handler for agent subprocesses *)
let handler = Fabric.Bridge.make_handler
~prog ~type_env
~bin_dir:(Lazy.force Plumb.Prefix_path.bin_dir)
~resource_find:Plumb.Resource_path.find
~tool_dispatch_fn:(Some tool_dispatch_fn)
() in
(* Determine boundary port names from the main binding's port list.
Convention: first n_in ports are inputs, rest are outputs. *)
let n_in = List.length
(Validate.port_element_types type_env main.input) in
let input_port = List.nth ports 0 in
let output_port = List.nth ports n_in in
(* Run the fabric under Eio *)
let fabric_handle = ref None in
Eio_main.run (fun _env ->
Eio.Switch.run (fun sw ->
let f = Fabric.execute ~sw ~routing_table ~type_env
~process_handler:handler () in
fabric_handle := Some f;
fabric_ref := Some f;
(* Writer daemon: stdin → boundary input socket *)
let push = Fabric.boundary_input f input_port in
Eio.Fiber.fork_daemon ~sw (fun () ->
(try while true do
let line = Eio_unix.run_in_systhread (fun () ->
input_line stdin) in
Transport.Zmq.send push ~topic:input_port line
done with End_of_file | Sys_error _ -> ());
Fabric.Bridge.send_eof push ~topic:input_port;
`Stop_daemon);
(* Reader: boundary output socket → stdout (main fibre) *)
let pull = Fabric.boundary_output f output_port in
(try while true do
let (_topic, payload) = Transport.Zmq.recv pull in
if Fabric.Bridge.is_eof payload then raise Exit;
Eio_unix.run_in_systhread (fun () ->
print_endline payload)
done with
| Exit -> ()
| Transport.Transport_error _ -> ());
(* Reader exited — cancel the fabric to unwind all fibres *)
f.cancel ()));
(* Outside switch scope — safe to call shutdown (may sleep) *)
(match !fabric_handle with
| Some f -> Fabric.shutdown f
| None -> ()))
cd /tmp/plmb-037-step5b && nix develop --command dune build --root=. 2>&1 | head -80
warning: Git tree '/private/tmp/plmb-037-step5b' is dirty
Warning: 15 states have shift/reduce conflicts.
Warning: 16 shift/reduce conflicts were arbitrarily resolved.
File "bin/plumb/main.ml", line 26, characters 19-38:
26 | let tool_env = { Plumb.Tool_dispatch.docroot } in
^^^^^^^^^^^^^^^^^^^
Error: Unbound module Plumb.Tool_dispatch{
"pattern": "module Tool_dispatch|Tool_dispatch",
"path": "/tmp/plmb-037-step5b/lib/plumb",
"glob": "*.ml",
"output_mode": "files_with_matches"
}So Tool_dispatch is not directly exposed through Plumb — it's in lib/plumb/tool_dispatch.ml but the plumb library wraps it through exec.ml. Let me check what's re-exported.
{
"pattern": "^type|^let.*=.*Tool_dispatch",
"path": "/tmp/plmb-037-step5b/lib/plumb/exec.ml",
"output_mode": "content",
"limit": 20
}<tool_use_error>InputValidationError: Grep failed due to the following issue: An unexpected parameter `limit` was provided</tool_use_error>
{
"pattern": "Tool_dispatch",
"path": "/tmp/plmb-037-step5b/lib/plumb/exec.ml",
"output_mode": "content"
}26:type tool_env = Tool_dispatch.tool_env = { docroot : string option }
27:let default_tool_env = Tool_dispatch.default_tool_env
28:type observation = Tool_dispatch.observation = { agent : string; payload : Yojson.Safe.t }
29:type oitl_request = Tool_dispatch.oitl_request = {
35:type oitl_response = Tool_dispatch.oitl_response = {
40:(** Re-export functions from Spawn and Tool_dispatch for backward compatibility. *)
45:let dispatch_tool = Tool_dispatch.dispatch_tool
46:let tool_dispatch_loop = Tool_dispatch.tool_dispatch_loop
47:let tool_error_response = Tool_dispatch.tool_error_response
1054: Some (fun (obs : Tool_dispatch.observation) ->
1118: Tool_dispatch.tool_dispatch_loop ~base_dir ~call_pipeline ~search_path tool_env type_envSo Exec re-exports these types. I need to use Plumb.Exec.tool_env, Plumb.Exec.oitl_request, etc. Or I could use Plumb.Exec.tool_dispatch_loop. Let me update the main.ml to use the correct module paths.
Plumb.Tool_dispatch
Plumb.Exec
The file /tmp/plmb-037-step5b/bin/plumb/main.ml has been updated. All occurrences of 'Plumb.Tool_dispatch' were successfully replaced with 'Plumb.Exec'.
cd /tmp/plmb-037-step5b && nix develop --command dune build --root=. 2>&1 | head -80
warning: Git tree '/private/tmp/plmb-037-step5b' is dirty
File "bin/plumb/main.ml", line 82, characters 29-46:
82 | ~bin_dir:(Lazy.force Plumb.Prefix_path.bin_dir)
^^^^^^^^^^^^^^^^^
Error: Unbound module Plumb.Prefix_path{
"pattern": "Prefix_path|Resource_path",
"path": "/tmp/plmb-037-step5b/lib/plumb",
"glob": "*.mli",
"output_mode": "files_with_matches"
}/tmp/plmb-037-step5b/lib/plumb/exec.ml /tmp/plmb-037-step5b/lib/plumb/forward.ml /tmp/plmb-037-step5b/lib/plumb/plumb.ml /tmp/plmb-037-step5b/lib/plumb/prefix_path.ml /tmp/plmb-037-step5b/lib/plumb/resource_path.ml /tmp/plmb-037-step5b/lib/plumb/spawn.ml /tmp/plmb-037-step5b/lib/plumb/stdlib_path.ml /tmp/plmb-037-step5b/lib/plumb/tool_dispatch.ml
1→(** Plumb — interpreter for the plumbing language.
2→
3→ Creates channels, spawns processes, wires boundary ports. *)
4→
5→module Exec = Exec
6→module Stdlib_path = Stdlib_path
7→Plumb only re-exports Exec and Stdlib_path. Prefix_path and Resource_path are in the plumb library but not re-exported through the wrapper module. I can access them as Prefix_path and Resource_path directly since they're in the same library, or use them without the Plumb. prefix since bin/plumb depends on the plumb library.
Wait — dune libraries wrap all modules under the library name unless (wrapped false) is set. The library name is plumb. So the modules are Plumb.Prefix_path, Plumb.Resource_path, etc. But the plumb.ml file acts as the library's main module and only re-exports Exec and Stdlib_path.
I need to add re-exports for Prefix_path and Resource_path too, or use a different approach.
(** Plumb — interpreter for the plumbing language.
Creates channels, spawns processes, wires boundary ports. *)
module Exec = Exec
module Stdlib_path = Stdlib_path(** Plumb — interpreter for the plumbing language.
Creates channels, spawns processes, wires boundary ports. *)
module Exec = Exec
module Stdlib_path = Stdlib_path
module Prefix_path = Prefix_path
module Resource_path = Resource_path
module Tool_dispatch = Tool_dispatchNow update main.ml to use Plumb.Tool_dispatch for the dispatch loop call (since it's now exposed), and keep Plumb.Exec for the types that are already re-exported through Exec:
1→(** Plumb runtime — fabric-based executor for plumb programs.
2→
3→ Parses a plumb program, compiles a routing table, starts the ZMQ
4→ fabric, and bridges boundary ports to stdin/stdout. The binary is
5→ a thin I/O adapter — all process orchestration happens in the
6→ fabric via ZMQ sockets. *)
7→
8→open Language
9→
10→let () =
11→ (* Default to Info; PIPELINE_DEBUG=1 wins (already set to Debug in Log
12→ module initialiser). --verbose lowers to Debug explicitly. *)
13→ let verbose = Array.exists (fun s -> s = "--verbose") Sys.argv in
14→ if verbose then Runtime.Log.set_level Runtime.Log.Debug
15→ else if Sys.getenv_opt "PIPELINE_DEBUG" <> Some "1" then
16→ Runtime.Log.set_level Runtime.Log.Info;
17→ (* Extract --provider/--model/--auto-approve from argv and set env vars.
18→ Entry.main strips these from argv before extracting the spec path. *)
19→ (match Runtime.Entry.resolve_flag Sys.argv "--provider" with
20→ | Some p -> Unix.putenv "PLUMB_PROVIDER" p | None -> ());
21→ (match Runtime.Entry.resolve_flag Sys.argv "--model" with
22→ | Some m -> Unix.putenv "PLUMB_MODEL" m | None -> ());
23→ let auto_approve = Array.exists (fun s -> s = "--auto-approve") Sys.argv in
24→ let docroot = Runtime.Entry.resolve_docroot Sys.argv in
25→ let search_path = Plumb.Stdlib_path.search_path () in
26→ let tool_env = { Plumb.Exec.docroot } in
27→ (* Signal handling: install SIGINT handler before Eio_main.run.
28→ When a fabric is running, the handler calls fabric.cancel() to
29→ trigger orderly shutdown. When no fabric is running, it raises
30→ Sys.Break for Entry.main to catch at exit code 130. *)
31→ let fabric_ref : Fabric.t option ref = ref None in
32→ Sys.set_signal Sys.sigint (Sys.Signal_handle (fun _signo ->
33→ match !fabric_ref with
34→ | Some f -> f.Fabric.cancel ()
35→ | None -> raise Sys.Break));
36→ Runtime.Entry.main
37→ ~usage:"Usage: plumb [--verbose] [--docroot DIR] [--provider NAME] \
38→ [--model NAME] [--auto-approve] <spec.plumb>"
39→ ~run:(fun spec_path ->
40→ let prog, main = Runtime.Spec.load_main_from_file ~search_path spec_path in
41→ let on_oitl_request =
42→ if auto_approve then
43→ fun (_req : Plumb.Exec.oitl_request) ->
44→ Plumb.Exec.{ oitl_approved = true; oitl_policy = Some "always" }
45→ else
46→ fun (req : Plumb.Exec.oitl_request) ->
47→ Printf.eprintf "plumb: denied tool call '%s' by agent '%s' \
48→ (use --auto-approve)\n%!" req.oitl_tool req.oitl_agent;
49→ Plumb.Exec.{ oitl_approved = false; oitl_policy = None }
50→ in
51→ (* Extract plumb body from main binding *)
52→ let ports, pb = match main.impl with
53→ | Plumb (ports, pb) -> (ports, pb)
54→ | _ -> Runtime.Error.raise_ Config "config_error"
55→ "main binding must be a plumb"
56→ in
57→ let type_env = Validate.build_type_env prog.prog_types in
58→ let routing_table = Runtime.Routing.compile ~prog ~ports ~pb
59→ ~type_env ~main_binding:main ~seeded:pb.pb_seeded in
60→ Runtime.Log.debug "plumb"
61→ ["event", `String "routing_table";
62→ "detail", `String (Runtime.Routing.pp_routing_table routing_table)];
63→ (* Tool dispatch callback — captures search_path, tool_env, prog,
64→ type_env, on_oitl_request, base_dir in the closure. The bridge
65→ passes per-agent tool_bindings and mcp_connections. *)
66→ let base_dir = Filename.dirname spec_path in
67→ let tool_dispatch_fn
68→ ~req_fd ~resp_fd ~agent_name ~tool_bindings ~mcp_connections =
69→ let call_pipeline _source _input =
70→ (Runtime.Error.(to_string (make Unavailable "not_implemented"
71→ "recursive pipeline invocation not available in fabric mode")),
72→ true)
73→ in
74→ Plumb.Exec.tool_dispatch_loop
75→ ~base_dir ~call_pipeline ~search_path tool_env type_env prog
76→ tool_bindings ~mcp_connections ~on_oitl_request
77→ ~agent_name req_fd resp_fd
78→ in
79→ (* Build the bridge process handler for agent subprocesses *)
80→ let handler = Fabric.Bridge.make_handler
81→ ~prog ~type_env
82→ ~bin_dir:(Lazy.force Plumb.Prefix_path.bin_dir)
83→ ~resource_find:Plumb.Resource_path.find
84→ ~tool_dispatch_fn:(Some tool_dispatch_fn)
85→ () in
86→ (* Determine boundary port names from the main binding's port list.
87→ Convention: first n_in ports are inputs, rest are outputs. *)
88→ let n_in = List.length
89→ (Validate.port_element_types type_env main.input) in
90→ let input_port = List.nth ports 0 in
91→ let output_port = List.nth ports n_in in
92→ (* Run the fabric under Eio *)
93→ let fabric_handle = ref None in
94→ Eio_main.run (fun _env ->
95→ Eio.Switch.run (fun sw ->
96→ let f = Fabric.execute ~sw ~routing_table ~type_env
97→ ~process_handler:handler () in
98→ fabric_handle := Some f;
99→ fabric_ref := Some f;
100→ (* Writer daemon: stdin → boundary input socket *)
101→ let push = Fabric.boundary_input f input_port in
102→ Eio.Fiber.fork_daemon ~sw (fun () ->
103→ (try while true do
104→ let line = Eio_unix.run_in_systhread (fun () ->
105→ input_line stdin) in
106→ Transport.Zmq.send push ~topic:input_port line
107→ done with End_of_file | Sys_error _ -> ());
108→ Fabric.Bridge.send_eof push ~topic:input_port;
109→ `Stop_daemon);
110→ (* Reader: boundary output socket → stdout (main fibre) *)
111→ let pull = Fabric.boundary_output f output_port in
112→ (try while true do
113→ let (_topic, payload) = Transport.Zmq.recv pull in
114→ if Fabric.Bridge.is_eof payload then raise Exit;
115→ Eio_unix.run_in_systhread (fun () ->
116→ print_endline payload)
117→ done with
118→ | Exit -> ()
119→ | Transport.Transport_error _ -> ());
120→ (* Reader exited — cancel the fabric to unwind all fibres *)
121→ f.cancel ()));
122→ (* Outside switch scope — safe to call shutdown (may sleep) *)
123→ (match !fabric_handle with
124→ | Some f -> Fabric.shutdown f
125→ | None -> ()))
126→cd /tmp/plmb-037-step5b && nix develop --command dune build --root=. 2>&1 | head -80
warning: Git tree '/private/tmp/plmb-037-step5b' is dirty
File "bin/plumb/main.ml", line 84, characters 26-49:
84 | ~tool_dispatch_fn:(Some tool_dispatch_fn)
^^^^^^^^^^^^^^^^^^^^^^^
Error: This expression should not be a constructor, the expected type is
Fabric.Bridge.tool_dispatchThe tool_dispatch_fn parameter is ?(tool_dispatch_fn : tool_dispatch option), so I should pass the value directly (it's already optional):
~tool_dispatch_fn:(Some tool_dispatch_fn)
~tool_dispatch_fn
cd /tmp/plmb-037-step5b && nix develop --command dune build --root=. 2>&1 | head -80
cd /tmp/plmb-037-step5b && nix develop --command dune test --root=. 2>&1 | tail -40
Command running in background with ID: be6qc8b25. Output is being written to: /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/be6qc8b25.output
cat /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/be6qc8b25.output 2>/dev/null | tail -40
Tests are still running. Let me also check if there are any test files that reference Fabric.Bridge.tool_dispatch type and might need updating.
{
"pattern": "tool_dispatch_fn|tool_dispatch",
"path": "/tmp/plmb-037-step5b/test",
"output_mode": "content"
}/tmp/plmb-037-step5b/test/test_tool.ml:88:(* ── tool_dispatch_loop protocol tests ────────────────────── *) /tmp/plmb-037-step5b/test/test_tool.ml:96: Plumb.Exec.tool_dispatch_loop ~base_dir:"." ~call_pipeline:(fun _ _ -> failwith "no call") ~search_path:[] Plumb.Exec.default_tool_env [] prog [] ~mcp_connections:[] ~agent_name:"test" req_r resp_w /tmp/plmb-037-step5b/test/test_tool.ml:128: Plumb.Exec.tool_dispatch_loop ~base_dir:"." ~call_pipeline:(fun _ _ -> failwith "no call") ~search_path:[] Plumb.Exec.default_tool_env [] prog [] ~mcp_connections:[] ~agent_name:"test" req_r resp_w /tmp/plmb-037-step5b/test/test_tool.ml:158: Plumb.Exec.tool_dispatch_loop ~base_dir:"." ~call_pipeline:(fun _ _ -> failwith "no call") ~search_path:[] Plumb.Exec.default_tool_env [] prog [] ~mcp_connections:[] ~agent_name:"test" req_r resp_w /tmp/plmb-037-step5b/test/test_tool.ml:523:(** Helper: run tool_dispatch_loop with on_oitl_request and send a request *) /tmp/plmb-037-step5b/test/test_tool.ml:528: Plumb.Exec.tool_dispatch_loop ~base_dir:"." /tmp/plmb-037-step5b/test/test_tool.ml:781: "tool_dispatch_loop", [
The test files use Plumb.Exec.tool_dispatch_loop directly — they don't use Fabric.Bridge.tool_dispatch type. Good, those should be unaffected.
Let me check the fabric test:
{
"pattern": "tool_dispatch|make_handler",
"path": "/tmp/plmb-037-step5b/test/test_fabric.ml",
"output_mode": "content"
}{
"file_path": "/tmp/plmb-037-step5b/test/test_fabric.ml",
"offset": 55,
"limit": 20
} 55→(** Run a fabric pipeline with agent bridge, execute [f] with boundary sockets. *)
56→let with_fabric_agent src f =
57→ let prog, routing_table, type_env = parse_and_compile src in
58→ let pipeline_id = fresh_id () in
59→ Eio_main.run (fun _env ->
60→ Eio.Switch.run (fun sw ->
61→ let handler = Fabric.Bridge.make_handler
62→ ~prog ~type_env
63→ ~bin_dir:None
64→ ~resource_find:(fun name ->
65→ failwith ("resource_find not available in test: " ^ name))
66→ ~resolve_bin:find_agent_binary
67→ () in
68→ let fabric = Fabric.execute ~sw ~routing_table ~type_env
69→ ~pipeline_id ~process_handler:handler () in
70→ Unix.sleepf 0.2;
71→ let push_in = Fabric.boundary_input fabric "input" in
72→ let pull_out = Fabric.boundary_output fabric "output" in
73→ f push_in pull_out;
74→ fabric.cancel ()))Good — the test doesn't pass tool_dispatch_fn, so it uses the default None. This should still compile fine since we didn't change the parameter name or its optional nature.
Let me wait for the tests:
cat /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/be6qc8b25.output 2>/dev/null | tail -50
wc -l /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/be6qc8b25.output 2>/dev/null; tail -5 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/be6qc8b25.output 2>/dev/null
0 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/be6qc8b25.output
sleep 30 && cat /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/be6qc8b25.output 2>/dev/null | tail -50
ls -la /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/be6qc8b25.output 2>/dev/null
-rw-r--r--@ 1 ww wheel 0 Mar 14 09:23 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/be6qc8b25.output
sleep 60 && wc -l /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/be6qc8b25.output 2>/dev/null && tail -60 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/be6qc8b25.output 2>/dev/null
0 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/be6qc8b25.output
cd /tmp/plmb-037-step5b && nix develop --command dune test --root=. 2>&1 | tail -60
warning: Git tree '/private/tmp/plmb-037-step5b' is dirty Error: A running dune (pid: 3826) instance has locked the build directory. If this is not the case, please delete "_build/.lock".
kill 3826 2>/dev/null; sleep 2; rm -f /tmp/plmb-037-step5b/_build/.lock 2>/dev/null; cd /tmp/plmb-037-step5b && nix develop --command dune test --root=. 2>&1 | tail -60
Command running in background with ID: b79q4s635. Output is being written to: /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b79q4s635.output
sleep 90 && tail -60 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b79q4s635.output 2>/dev/null
ls -la /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b79q4s635.output
-rw-r--r--@ 1 ww wheel 0 Mar 14 09:30 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b79q4s635.output
The test output file is empty after 90s. The test might still be compiling/running. Let me check what's happening:
ww 5502 0.0 0.0 435299568 1312 ?? S 9:37AM 0:00.00 grep -i dune ww 5499 0.0 0.0 435307952 2560 ?? Ss 9:37AM 0:00.01 /bin/zsh -c source /Users/ww/.claude/shell-snapshots/snapshot-zsh-1773431411965-vq5jw6.sh && setopt NO_EXTENDED_GLOB 2>/dev/null || true && eval 'ps aux < /dev/null | grep -i dune | head -5' && pwd -P >| /var/folders/s3/yz2d7zf14wq_rf8sm0xg6mkm0000gn/T/claude-3e31-cwd ww 5412 0.0 0.1 435620080 11952 ?? S 9:30AM 0:00.60 dune test --root=. ww 5408 0.0 0.0 435307760 1184 ?? Ss 9:30AM 0:00.01 /bin/zsh -c source /Users/ww/.claude/shell-snapshots/snapshot-zsh-1773431411965-vq5jw6.sh && setopt NO_EXTENDED_GLOB 2>/dev/null || true && eval 'kill 3826 2>/dev/null ; sleep 2 ; rm -f /tmp/plmb-037-step5b/_build/.lock 2>/dev/null ; cd /tmp/plmb-037-step5b && nix develop --command dune test --root\=. 2>&1 < /dev/null | tail -60' && pwd -P >| /var/folders/s3/yz2d7zf14wq_rf8sm0xg6mkm0000gn/T/claude-1248-cwd
Dune test is still running (pid 5412). The tests are likely running fine but taking time (some tests involve subprocess spawning, ZMQ, etc.). Let me wait more:
sleep 120 && tail -60 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b79q4s635.output 2>/dev/null && echo "=== exit ===" && wc -l /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b79q4s635.output 2>/dev/null
=== exit ===
0 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b79q4s635.outputww 5412 0.0 0.1 435620080 11952 ?? S 9:30AM 0:00.64 dune test --root=. ww 5408 0.0 0.0 435307760 1184 ?? Ss 9:30AM 0:00.01 /bin/zsh -c source /Users/ww/.claude/shell-snapshots/snapshot-zsh-1773431411965-vq5jw6.sh && setopt NO_EXTENDED_GLOB 2>/dev/null || true && eval 'kill 3826 2>/dev/null ; sleep 2 ; rm -f /tmp/plmb-037-step5b/_build/.lock 2>/dev/null ; cd /tmp/plmb-037-step5b && nix develop --command dune test --root\=. 2>&1 < /dev/null | tail -60' && pwd -P >| /var/folders/s3/yz2d7zf14wq_rf8sm0xg6mkm0000gn/T/claude-1248-cwd
Dune is still running but with only 0.64s CPU time after 7+ minutes — it's stuck. Likely waiting on a subprocess. Let me check for children:
ps -p 5420 -o command= 2>/dev/null; ps -p 5413 -o command= 2>/dev/null
test_plumb.exe is hanging. This is likely an integration test that uses the old Exec.execute path or the new fabric path. Let me kill it and check what's different:
kill 5420 5412 5408 2>/dev/null; sleep 1; rm -f /tmp/plmb-037-step5b/_build/.lock 2>/dev/null
/tmp/plmb-037-step5b/test/test_agent_config.ml /tmp/plmb-037-step5b/test/test_agent_control.ml /tmp/plmb-037-step5b/test/test_agent_eliza.ml /tmp/plmb-037-step5b/test/test_docstore.ml /tmp/plmb-037-step5b/test/test_error.ml /tmp/plmb-037-step5b/test/test_expr.ml /tmp/plmb-037-step5b/test/test_fabric.ml /tmp/plmb-037-step5b/test/test_parser.ml /tmp/plmb-037-step5b/test/test_pinning.ml /tmp/plmb-037-step5b/test/test_plumb.ml /tmp/plmb-037-step5b/test/test_render_group.ml /tmp/plmb-037-step5b/test/test_routing.ml /tmp/plmb-037-step5b/test/test_session_compile.ml /tmp/plmb-037-step5b/test/test_session_exec.ml /tmp/plmb-037-step5b/test/test_supervisor.ml /tmp/plmb-037-step5b/test/test_tool.ml /tmp/plmb-037-step5b/test/test_transport.ml /tmp/plmb-037-step5b/test/test_validate.ml
1→(** Integration tests for the plumb runtime.
2→
3→ Runs the plumb binary as a subprocess, testing identity passthrough,
4→ copy (fan-out), merge (fan-in), and copy-merge composition. *)
5→
6→let plumb_bin = "../bin/plumb/main.exe"
7→let fixture path = "fixtures/" ^ path
8→
9→(** Run the plumb binary with given fixture and input,
10→ return (stdout, stderr, exit_code). *)
11→let run_plumb spec input =
12→ let spec_path = fixture spec in
13→ let stdout_read, stdout_write = Unix.pipe ~cloexec:true () in
14→ let stderr_read, stderr_write = Unix.pipe ~cloexec:true () in
15→ let stdin_read, stdin_write = Unix.pipe ~cloexec:true () in
16→ let pid = Unix.create_process plumb_bin
17→ [| plumb_bin; spec_path |]
18→ stdin_read stdout_write stderr_write in
19→ Unix.close stdin_read;
20→ Unix.close stdout_write;
21→ Unix.close stderr_write;
22→ (* Write input *)
23→ let oc = Unix.out_channel_of_descr stdin_write in
24→ output_string oc input;
25→ close_out oc;
26→ (* Read output *)
27→ let ic_out = Unix.in_channel_of_descr stdout_read in
28→ let ic_err = Unix.in_channel_of_descr stderr_read in
29→ let stdout = In_channel.input_all ic_out in
30→ let stderr = In_channel.input_all ic_err in
31→ close_in ic_out;
32→ close_in ic_err;
33→ let _, status = Unix.waitpid [] pid in
34→ let exit_code = match status with
35→ | Unix.WEXITED c -> c
36→ | Unix.WSIGNALED _ -> 128
37→ | Unix.WSTOPPED _ -> 128
38→ in
39→ (stdout, stderr, exit_code)
40→
41→(** Split output into non-empty lines. *)
42→let output_lines stdout =
43→ String.split_on_char '\n' stdout
44→ |> List.filter (fun s -> String.length (String.trim s) > 0)
45→
46→(** {1 Identity via plumb} *)
47→
48→let test_identity_passthrough () =
49→ let input = "{\"name\":\"Alice\",\"age\":30}\n" in
50→ let stdout, _stderr, code = run_plumb "identity.plumb" input inThis test runs the plumb binary. The binary now uses the fabric. The test is hanging which means there's likely a deadlock or the fabric isn't shutting down correctly. Let me check what fixtures look like and trace the issue.
(* The standard identity spec — used by most tests. *)
type Person = { name: string, age: int }
let passthrough : !Person -> !Person = id
let main : !Person -> !Person = plumb(input, output) {
spawn passthrough(input, output)
}This is an inline-only pipeline (id morphism, no agents). The fabric should handle this fine. Let me test it manually:
cd /tmp/plmb-037-step5b && echo '{"name":"Alice","age":30}' | timeout 10 nix develop --command ./_build/default/bin/plumb/main.exe test/fixtures/identity.plumb 2>&1; echo "exit: $?"cd /tmp/plmb-037-step5b && echo '{"name":"Alice","age":30}' | nix develop --command gtimeout 10 ./_build/default/bin/plumb/main.exe test/fixtures/identity.plumb 2>&1; echo "exit: $?"warning: Git tree '/private/tmp/plmb-037-step5b' is dirty /tmp/nix-shell.h7ZPy1: line 2200: exec: gtimeout: not found exit:
cd /tmp/plmb-037-step5b && (echo '{"name":"Alice","age":30}' | nix develop --command sh -c 'timeout 10 ./_build/default/bin/plumb/main.exe test/fixtures/identity.plumb' 2>/dev/null); echo "exit: $?"The output is correct but let me check if the binary exits cleanly. Let me look at how the inline id morphism handles EOF:
{
"pattern": "forward_id",
"path": "/tmp/plmb-037-step5b/lib/fabric/inline.ml",
"output_mode": "content",
"-A": 25
}67:let forward_id ~name type_env input_type recv_sock send_sock = 68- let label = Printf.sprintf "id[%s]" name in 69- recv_loop ~label recv_sock (fun json -> 70- match Validate.check type_env input_type json with 71- | Ok cleaned -> send_json send_sock ~topic:name cleaned 72- | Error errs -> 73- Runtime.Log.error label 74- ["error", `String (Validate.error_string errs)]; 75- raise Exit) 76- 77-(** Discard: validate and drop. *) 78-let drain_discard ~name type_env input_type recv_sock = 79- let label = Printf.sprintf "discard[%s]" name in 80- recv_loop ~label recv_sock (fun json -> 81- match Validate.check type_env input_type json with 82- | Ok _ -> () 83- | Error errs -> 84- Runtime.Log.error label 85- ["error", `String (Validate.error_string errs)]; 86- raise Exit) 87- 88-(** Debug tap: log and drop. *) 89-let forward_log label recv_sock = 90- let debug = Runtime.Log.get_level () = Runtime.Log.Debug in 91- recv_loop ~label recv_sock (fun json -> 92- if debug then
{
"pattern": "recv_loop",
"path": "/tmp/plmb-037-step5b/lib/fabric/inline.ml",
"output_mode": "content",
"-A": 25,
"head_limit": 30
}42:let recv_loop ~label sock f =
43- try while true do
44- let (_topic, payload) = T.recv sock in
45- if String.length payload > 0 then begin
46- let json =
47- try Yojson.Safe.from_string payload
48- with Yojson.Json_error msg ->
49- Runtime.Log.error label ["error", `String msg];
50- raise Exit
51- in
52- f json
53- end
54- done with
55- | Exit -> ()
56- | Transport.Transport_error (_, msg) ->
57- Runtime.Log.debug label
58- ["event", `String "transport_closed"; "detail", `String msg]
59-
60-(** Send a JSON value on a transport socket. *)
61-let send_json sock ~topic json =
62- T.send sock ~topic (Yojson.Safe.to_string json)
63-
64-(** {1 Simple Forwarding Morphisms} *)
65-
66-(** Identity: validate and forward. *)
67-let forward_id ~name type_env input_type recv_sock send_sock =
--
69: recv_loop ~label recv_sock (fun json ->
70- match Validate.check type_env input_type json with
71- | Ok cleaned -> send_json send_sock ~topic:name cleaned
[Showing results with pagination = limit: 30, offset: 0]I see the issue. The recv_loop receives an empty payload (which is the EOF sentinel) and just ignores it (String.length payload > 0 guard). It continues to loop, waiting for the next message. But there won't be any more messages — it should exit.
Wait, actually looking again: when an empty payload is received, recv_loop doesn't call f, but it doesn't raise Exit either — it just loops back to T.recv. Since no more messages come, it blocks on T.recv forever.
But this is existing code from before this step — the existing tests on main should have the same issue. Unless... the old plumb binary (using Exec) doesn't go through the fabric. The tests call run_plumb which runs the plumb binary. With my change, the binary now uses the fabric, so the inline morphisms need to handle EOF.
The inline morphism's recv_loop doesn't propagate EOF — it just ignores empty payloads. This means the id morphism never sends EOF downstream, so the reader in main.ml never gets EOF and never exits.
I need to fix the inline morphisms to propagate EOF. Let me check how send_eof works:
1→(** Inline morphism fibres — structural stream transformations over ZMQ.
2→
3→ Adapted from [Forward] (lib/plumb/forward.ml) to use the transport
4→ layer instead of Unix file descriptors. Each morphism runs as an
5→ Eio fibre, reading from PULL sockets and writing to PUSH sockets.
6→
7→ Fatal errors raise [Morphism_fatal] instead of calling [exit 1].
8→ The supervisor catches this at the fibre boundary. *)
9→
10→open Language
11→
12→exception Morphism_fatal of string * (string * Yojson.Safe.t) list
13→
14→(** Truncate a JSON value for logging. *)
15→let truncate_json ?(max_str=200) ?(max_items=5) json =
16→ let rec trunc = function
17→ | `String s when String.length s > max_str ->
18→ `String (String.sub s 0 max_str ^ "...")
19→ | `List items when List.length items > max_items ->
20→ let kept = List.filteri (fun i _ -> i < max_items) items in
21→ `List (List.map trunc kept @
22→ [`String (Printf.sprintf "...(%d more)" (List.length items - max_items))])
23→ | `List items -> `List (List.map trunc items)
24→ | `Assoc fields -> `Assoc (List.map (fun (k, v) -> (k, trunc v)) fields)
25→ | other -> other
26→ in
27→ trunc json
28→
29→let morphism_fatal label extras =
30→ Runtime.Log.error label extras;
31→ raise (Morphism_fatal (label, extras))
32→
33→(** {1 Transport-based message loop}
34→
35→ Common skeleton: receive JSON from a transport socket, parse,
36→ call handler, repeat until the socket closes or an error occurs. *)
37→
38→module T = Transport.Zmq
39→
40→(** Receive JSON messages from a socket, parse, and call [f].
41→ Exits on transport error (disconnect/close). *)
42→let recv_loop ~label sock f =The recv_loop ignores empty payloads but doesn't exit on them. For the fabric EOF protocol to work, inline morphisms need to:
1. Detect the EOF sentinel (empty payload)
2. Send EOF on their output socket(s)
3. Exit
Let me fix recv_loop to exit on EOF:
(** Receive JSON messages from a socket, parse, and call [f].
Exits on transport error (disconnect/close). *)
let recv_loop ~label sock f =
try while true do
let (_topic, payload) = T.recv sock in
if String.length payload > 0 then begin
let json =
try Yojson.Safe.from_string payload
with Yojson.Json_error msg ->
Runtime.Log.error label ["error", `String msg];
raise Exit
in
f json
end
done with
| Exit -> ()
| Transport.Transport_error (_, msg) ->
Runtime.Log.debug label
["event", `String "transport_closed"; "detail", `String msg](** Raised when an EOF sentinel (empty payload) is received. *)
exception Eof_received
(** Receive JSON messages from a socket, parse, and call [f].
Exits on transport error (disconnect/close) or EOF sentinel. *)
let recv_loop ~label sock f =
try while true do
let (_topic, payload) = T.recv sock in
if String.length payload = 0 then
raise Eof_received;
let json =
try Yojson.Safe.from_string payload
with Yojson.Json_error msg ->
Runtime.Log.error label ["error", `String msg];
raise Exit
in
f json
done with
| Exit | Eof_received -> ()
| Transport.Transport_error (_, msg) ->
Runtime.Log.debug label
["event", `String "transport_closed"; "detail", `String msg]But the inline morphisms also need to send EOF on their output sockets after the recv_loop exits. Currently forward_id doesn't do this. The recv_loop just returns and the daemon fibre exits without sending EOF downstream.
Looking at the fabric's main loop in fabric.ml, the inline morphism fibres are fork_daemon — they just return Stop_daemon. They don't send EOF on their output sockets. The bridge's demux does send EOF, but inline morphisms don't.
I need to add EOF propagation to all inline morphisms. The simplest approach: change recv_loop to return whether it received EOF, and have each morphism send EOF on its output socket(s) after the loop.
Actually, a cleaner approach: make a wrapper that handles it. Let me refactor:
1→(** Inline morphism fibres — structural stream transformations over ZMQ.
2→
3→ Adapted from [Forward] (lib/plumb/forward.ml) to use the transport
4→ layer instead of Unix file descriptors. Each morphism runs as an
5→ Eio fibre, reading from PULL sockets and writing to PUSH sockets.
6→
7→ Fatal errors raise [Morphism_fatal] instead of calling [exit 1].
8→ The supervisor catches this at the fibre boundary. *)
9→
10→open Language
11→
12→exception Morphism_fatal of string * (string * Yojson.Safe.t) list
13→
14→(** Truncate a JSON value for logging. *)
15→let truncate_json ?(max_str=200) ?(max_items=5) json =
16→ let rec trunc = function
17→ | `String s when String.length s > max_str ->
18→ `String (String.sub s 0 max_str ^ "...")
19→ | `List items when List.length items > max_items ->
20→ let kept = List.filteri (fun i _ -> i < max_items) items in
21→ `List (List.map trunc kept @
22→ [`String (Printf.sprintf "...(%d more)" (List.length items - max_items))])
23→ | `List items -> `List (List.map trunc items)
24→ | `Assoc fields -> `Assoc (List.map (fun (k, v) -> (k, trunc v)) fields)
25→ | other -> other
26→ in
27→ trunc json
28→
29→let morphism_fatal label extras =
30→ Runtime.Log.error label extras;
31→ raise (Morphism_fatal (label, extras))
32→
33→(** {1 Transport-based message loop}
34→
35→ Common skeleton: receive JSON from a transport socket, parse,
36→ call handler, repeat until the socket closes or an error occurs. *)
37→
38→module T = Transport.Zmq
39→
40→(** Raised when an EOF sentinel (empty payload) is received. *)
41→exception Eof_received
42→
43→(** Receive JSON messages from a socket, parse, and call [f].
44→ Exits on transport error (disconnect/close) or EOF sentinel. *)
45→let recv_loop ~label sock f =
46→ try while true do
47→ let (_topic, payload) = T.recv sock in
48→ if String.length payload = 0 then
49→ raise Eof_received;
50→ let json =
51→ try Yojson.Safe.from_string payload
52→ with Yojson.Json_error msg ->
53→ Runtime.Log.error label ["error", `String msg];
54→ raise Exit
55→ in
56→ f json
57→ done with
58→ | Exit | Eof_received -> ()
59→ | Transport.Transport_error (_, msg) ->
60→ Runtime.Log.debug label
61→ ["event", `String "transport_closed"; "detail", `String msg]
62→
63→(** Send a JSON value on a transport socket. *)
64→let send_json sock ~topic json =
65→ T.send sock ~topic (Yojson.Safe.to_string json)
66→
67→(** {1 Simple Forwarding Morphisms} *)
68→
69→(** Identity: validate and forward. *)
70→let forward_id ~name type_env input_type recv_sock send_sock =
71→ let label = Printf.sprintf "id[%s]" name in
72→ recv_loop ~label recv_sock (fun json ->
73→ match Validate.check type_env input_type json with
74→ | Ok cleaned -> send_json send_sock ~topic:name cleaned
75→ | Error errs ->
76→ Runtime.Log.error label
77→ ["error", `String (Validate.error_string errs)];
78→ raise Exit)
79→
80→(** Discard: validate and drop. *)
81→let drain_discard ~name type_env input_type recv_sock =
82→ let label = Printf.sprintf "discard[%s]" name in
83→ recv_loop ~label recv_sock (fun json ->
84→ match Validate.check type_env input_type json with
85→ | Ok _ -> ()
86→ | Error errs ->
87→ Runtime.Log.error label
88→ ["error", `String (Validate.error_string errs)];
89→ raise Exit)
90→
91→(** Debug tap: log and drop. *)
92→let forward_log label recv_sock =
93→ let debug = Runtime.Log.get_level () = Runtime.Log.Debug in
94→ recv_loop ~label recv_sock (fun json ->
95→ if debug then
96→ Runtime.Log.debug label ["value", truncate_json json])
97→
98→(** JSON serialisation: !json -> !string. *)
99→let forward_format_json ~name recv_sock send_sock =
100→ let label = Printf.sprintf "format_json[%s]" name in
101→ recv_loop ~label recv_sock (fun json ->
102→ let s = Yojson.Safe.to_string json in
103→ send_json send_sock ~topic:name (`String s))
104→
105→(** JSON deserialisation: !string -> !json. *)
106→let forward_parse_json ~name recv_sock send_sock =
107→ let label = Printf.sprintf "parse_json[%s]" name in
108→ recv_loop ~label recv_sock (fun json ->
109→ let s = match json with
110→ | `String s -> s
111→ | _ ->
112→ Runtime.Log.error label ["error", `String "expected JSON string"];
113→ raise Exit
114→ in
115→ let parsed =
116→ try Yojson.Safe.from_string s
117→ with Yojson.Json_error msg ->
118→ Runtime.Log.error label ["error", `String msg];
119→ raise Exit
120→ in
121→ send_json send_sock ~topic:name parsed)
122→
123→(** {1 Fan-out Morphisms} *)
124→
125→(** Copy: validate and send to two output sockets. *)
126→let forward_copy ~name type_env input_type recv_sock send_sock0 send_sock1 =
127→ let label = Printf.sprintf "copy[%s]" name in
128→ recv_loop ~label recv_sock (fun json ->
129→ match Validate.check type_env input_type json with
130→ | Ok cleaned ->
131→ let s = Yojson.Safe.to_string cleaned in
132→ (try T.send send_sock0 ~topic:name s with
133→ | Transport.Transport_error _ ->
134→ Runtime.Log.debug label ["event", `String "output0_closed"]);
135→ (try T.send send_sock1 ~topic:name s with
136→ | Transport.Transport_error _ ->
137→ Runtime.Log.debug label ["event", `String "output1_closed"])
138→ | Error errs ->
139→ Runtime.Log.error label
140→ ["error", `String (Validate.error_string errs)];
141→ raise Exit)
142→
143→(** {1 Fan-in Morphisms} *)
144→
145→(** Merge: read from two input sockets, validate per source, write to output.
146→ Uses Eio.Fiber.first to read from whichever socket has data. *)
147→let forward_merge ~name type_env input_type0 input_type1
148→ recv_sock0 recv_sock1 send_sock =
149→ let label = Printf.sprintf "merge[%s]" name in
150→ let reader recv_sock input_type sub_label =
151→ recv_loop ~label:sub_label recv_sock (fun json ->
152→ match Validate.check type_env input_type json with
153→ | Ok cleaned -> send_json send_sock ~topic:name cleaned
154→ | Error errs ->
155→ morphism_fatal sub_label
156→ ["error", `String (Validate.error_string errs);
157→ "value", truncate_json json])
158→ in
159→ Eio.Fiber.both
160→ (fun () -> reader recv_sock0 input_type0 (label ^ "/0"))
161→ (fun () -> reader recv_sock1 input_type1 (label ^ "/1"))
162→
163→(** Merge-any: same as merge but the output closes when the first
164→ input closes. Under ZMQ, this means cancelling the other fibre
165→ when one finishes. *)
166→let forward_merge_any ~name type_env input_type0 input_type1
167→ recv_sock0 recv_sock1 send_sock =
168→ let label = Printf.sprintf "merge_any[%s]" name in
169→ let reader recv_sock input_type sub_label () =
170→ recv_loop ~label:sub_label recv_sock (fun json ->
171→ match Validate.check type_env input_type json with
172→ | Ok cleaned -> send_json send_sock ~topic:name cleaned
173→ | Error errs ->
174→ morphism_fatal sub_label
175→ ["error", `String (Validate.error_string errs);
176→ "value", truncate_json json])
177→ in
178→ (* Eio.Fiber.first cancels the other fibre when one completes *)
179→ Eio.Fiber.first
180→ (reader recv_sock0 input_type0 (label ^ "/0"))
181→ (reader recv_sock1 input_type1 (label ^ "/1"))
182→
183→(** {1 Barrier (Synchronised N-ary Fan-in)} *)
184→
185→(** Barrier: wait for one message from each of N inputs, emit the
186→ product, repeat. Uses Eio.Mutex for synchronisation. *)
187→let forward_barrier ~name type_env input_types recv_socks send_sock =
188→ let n = List.length input_types in
189→ let barrier_label = Printf.sprintf "barrier[%s]" name in
190→ let m = Eio.Mutex.create () in
191→ let slots = Array.make n None in
192→ let filled = ref 0 in
193→ let eof = ref false in
194→ let cond = Eio.Condition.create () in
195→ let reader idx in_type recv_sock =
196→ let sub_label = Printf.sprintf "barrier[%s]/%d" name idx in
197→ recv_loop ~label:sub_label recv_sock (fun json ->
198→ match Validate.check type_env in_type json with
199→ | Error errs ->
200→ Runtime.Log.error sub_label
201→ ["error", `String (Validate.error_string errs)];
202→ raise Exit
203→ | Ok cleaned ->
204→ Eio.Mutex.use_rw ~protect:false m (fun () ->
205→ (* Wait if our slot is still occupied from previous round *)
206→ while slots.(idx) <> None do
207→ if !eof then raise Exit;
208→ Eio.Condition.await cond m
209→ done;
210→ slots.(idx) <- Some cleaned;
211→ filled := !filled + 1;
212→ if !filled = n then begin
213→ let values = Array.map
214→ (function Some v -> v | None -> assert false) slots in
215→ let product = `List (Array.to_list values) in
216→ Array.fill slots 0 n None;
217→ filled := 0;
218→ Eio.Condition.broadcast cond;
219→ (* Send outside the critical section would be cleaner but
220→ we need to hold the lock to prevent re-entry. The send
221→ yields to Eio which is fine with Eio.Mutex. *)
222→ (try send_json send_sock ~topic:barrier_label product
223→ with Transport.Transport_error (_, msg) ->
224→ Runtime.Log.debug barrier_label
225→ ["event", `String "output_closed"; "detail", `String msg])
226→ end));
227→ (* Mark EOF *)
228→ Eio.Mutex.use_rw ~protect:false m (fun () ->
229→ eof := true;
230→ Eio.Condition.broadcast cond)
231→ in
232→ let indexed = List.combine
233→ (List.combine input_types recv_socks)
234→ (List.init n Fun.id) in
235→ Eio.Fiber.all (List.map (fun ((in_type, recv_sock), idx) ->
236→ fun () -> reader idx in_type recv_sock
237→ ) indexed)
238→
239→(** {1 Expression Morphisms} *)
240→
241→(** Map: per-message pure transform. *)
242→let forward_map ~name type_env in_type out_type expr recv_sock send_sock =
243→ let label = Printf.sprintf "map[%s]" name in
244→ recv_loop ~label recv_sock (fun json ->
245→ let value = match Validate.check type_env in_type json with
246→ | Ok cleaned -> cleaned
247→ | Error errs ->
248→ Runtime.Log.warn label
249→ ["event", `String "input_type_mismatch";
250→ "error", `String (Validate.error_string errs);
251→ "value", truncate_json json];
252→ json
253→ in
254→ let result =
255→ try Expr.eval value expr
256→ with Expr.Eval_error msg ->
257→ morphism_fatal label
258→ ["error", `String msg; "value", truncate_json value]
259→ in
260→ match Validate.check type_env out_type result with
261→ | Ok validated -> send_json send_sock ~topic:name validated
262→ | Error errs ->
263→ morphism_fatal label
264→ ["error", `String (Validate.error_string errs);
265→ "value", truncate_json result])
266→
267→(** Filter: predicate-based stream filtering. *)
268→let forward_filter ~name type_env msg_type expr recv_sock send_sock =
269→ let label = Printf.sprintf "inline_filter[%s]" name in
270→ recv_loop ~label recv_sock (fun json ->
271→ let value = match Validate.check type_env msg_type json with
272→ | Ok cleaned -> cleaned
273→ | Error _ -> json
274→ in
275→ let result =
276→ try Expr.eval value expr
277→ with Expr.Eval_error _ -> `Bool false
278→ in
279→ match Expr.to_bool result with
280→ | b -> if b then send_json send_sock ~topic:name value
281→ | exception Expr.Eval_error _ -> ())
282→
283→(** Project: product projection. *)
284→let forward_project ~name n recv_sock send_sock =
285→ let label = Printf.sprintf "project[%s]" name in
286→ recv_loop ~label recv_sock (fun json ->
287→ match json with
288→ | `List items ->
289→ (match List.nth_opt items n with
290→ | Some v -> send_json send_sock ~topic:name v
291→ | None ->
292→ Runtime.Log.error label
293→ ["error", `String (Printf.sprintf
294→ "index %d out of bounds (length %d)" n (List.length items))];
295→ raise Exit)
296→ | _ ->
297→ Runtime.Log.error label ["error", `String "expected JSON array"];
298→ raise Exit)
299→
300→(** Tagger: stateful envelope tagger for session protocols. *)
301→let forward_tagger ~name indices label_opt recv_sock send_sock =
302→ let tag_label = Printf.sprintf "tagger[%s]" name in
303→ let step = ref 0 in
304→ recv_loop ~label:tag_label recv_sock (fun json ->
305→ let idx = match indices with
306→ | [] -> let s = !step in incr step; s
307→ | _ ->
308→ let s = List.nth indices (!step mod List.length indices) in
309→ incr step; s
310→ in
311→ let envelope = match label_opt with
312→ | None ->
313→ `Assoc [("__step", `Int idx); ("payload", json)]
314→ | Some lbl ->
315→ `Assoc [("__label", `String lbl); ("__step", `Int idx); ("payload", json)]
316→ in
317→ send_json send_sock ~topic:name envelope)
318→I need to add EOF propagation to inline morphisms. After recv_loop exits, each morphism should send an EOF sentinel (empty payload) on its output socket(s). Let me add this:
(** Send a JSON value on a transport socket. *) let send_json sock ~topic json = T.send sock ~topic (Yojson.Safe.to_string json)
(** Send a JSON value on a transport socket. *) let send_json sock ~topic json = T.send sock ~topic (Yojson.Safe.to_string json) (** Send an EOF sentinel (empty payload) on a transport socket. *) let send_eof sock ~topic = try T.send sock ~topic "" with _ -> ()
let forward_id ~name type_env input_type recv_sock send_sock =
let label = Printf.sprintf "id[%s]" name in
recv_loop ~label recv_sock (fun json ->
match Validate.check type_env input_type json with
| Ok cleaned -> send_json send_sock ~topic:name cleaned
| Error errs ->
Runtime.Log.error label
["error", `String (Validate.error_string errs)];
raise Exit)let forward_id ~name type_env input_type recv_sock send_sock =
let label = Printf.sprintf "id[%s]" name in
recv_loop ~label recv_sock (fun json ->
match Validate.check type_env input_type json with
| Ok cleaned -> send_json send_sock ~topic:name cleaned
| Error errs ->
Runtime.Log.error label
["error", `String (Validate.error_string errs)];
raise Exit);
send_eof send_sock ~topic:namelet forward_format_json ~name recv_sock send_sock =
let label = Printf.sprintf "format_json[%s]" name in
recv_loop ~label recv_sock (fun json ->
let s = Yojson.Safe.to_string json in
send_json send_sock ~topic:name (`String s))let forward_format_json ~name recv_sock send_sock =
let label = Printf.sprintf "format_json[%s]" name in
recv_loop ~label recv_sock (fun json ->
let s = Yojson.Safe.to_string json in
send_json send_sock ~topic:name (`String s));
send_eof send_sock ~topic:namelet forward_parse_json ~name recv_sock send_sock =
let label = Printf.sprintf "parse_json[%s]" name in
recv_loop ~label recv_sock (fun json ->
let s = match json with
| `String s -> s
| _ ->
Runtime.Log.error label ["error", `String "expected JSON string"];
raise Exit
in
let parsed =
try Yojson.Safe.from_string s
with Yojson.Json_error msg ->
Runtime.Log.error label ["error", `String msg];
raise Exit
in
send_json send_sock ~topic:name parsed)let forward_parse_json ~name recv_sock send_sock =
let label = Printf.sprintf "parse_json[%s]" name in
recv_loop ~label recv_sock (fun json ->
let s = match json with
| `String s -> s
| _ ->
Runtime.Log.error label ["error", `String "expected JSON string"];
raise Exit
in
let parsed =
try Yojson.Safe.from_string s
with Yojson.Json_error msg ->
Runtime.Log.error label ["error", `String msg];
raise Exit
in
send_json send_sock ~topic:name parsed);
send_eof send_sock ~topic:namelet forward_copy ~name type_env input_type recv_sock send_sock0 send_sock1 =
let label = Printf.sprintf "copy[%s]" name in
recv_loop ~label recv_sock (fun json ->
match Validate.check type_env input_type json with
| Ok cleaned ->
let s = Yojson.Safe.to_string cleaned in
(try T.send send_sock0 ~topic:name s with
| Transport.Transport_error _ ->
Runtime.Log.debug label ["event", `String "output0_closed"]);
(try T.send send_sock1 ~topic:name s with
| Transport.Transport_error _ ->
Runtime.Log.debug label ["event", `String "output1_closed"])
| Error errs ->
Runtime.Log.error label
["error", `String (Validate.error_string errs)];
raise Exit)let forward_copy ~name type_env input_type recv_sock send_sock0 send_sock1 =
let label = Printf.sprintf "copy[%s]" name in
recv_loop ~label recv_sock (fun json ->
match Validate.check type_env input_type json with
| Ok cleaned ->
let s = Yojson.Safe.to_string cleaned in
(try T.send send_sock0 ~topic:name s with
| Transport.Transport_error _ ->
Runtime.Log.debug label ["event", `String "output0_closed"]);
(try T.send send_sock1 ~topic:name s with
| Transport.Transport_error _ ->
Runtime.Log.debug label ["event", `String "output1_closed"])
| Error errs ->
Runtime.Log.error label
["error", `String (Validate.error_string errs)];
raise Exit);
send_eof send_sock0 ~topic:name;
send_eof send_sock1 ~topic:namelet forward_merge ~name type_env input_type0 input_type1
recv_sock0 recv_sock1 send_sock =
let label = Printf.sprintf "merge[%s]" name in
let reader recv_sock input_type sub_label =
recv_loop ~label:sub_label recv_sock (fun json ->
match Validate.check type_env input_type json with
| Ok cleaned -> send_json send_sock ~topic:name cleaned
| Error errs ->
morphism_fatal sub_label
["error", `String (Validate.error_string errs);
"value", truncate_json json])
in
Eio.Fiber.both
(fun () -> reader recv_sock0 input_type0 (label ^ "/0"))
(fun () -> reader recv_sock1 input_type1 (label ^ "/1"))let forward_merge ~name type_env input_type0 input_type1
recv_sock0 recv_sock1 send_sock =
let label = Printf.sprintf "merge[%s]" name in
let reader recv_sock input_type sub_label =
recv_loop ~label:sub_label recv_sock (fun json ->
match Validate.check type_env input_type json with
| Ok cleaned -> send_json send_sock ~topic:name cleaned
| Error errs ->
morphism_fatal sub_label
["error", `String (Validate.error_string errs);
"value", truncate_json json])
in
Eio.Fiber.both
(fun () -> reader recv_sock0 input_type0 (label ^ "/0"))
(fun () -> reader recv_sock1 input_type1 (label ^ "/1"));
send_eof send_sock ~topic:name (* Eio.Fiber.first cancels the other fibre when one completes *)
Eio.Fiber.first
(reader recv_sock0 input_type0 (label ^ "/0"))
(reader recv_sock1 input_type1 (label ^ "/1")) (* Eio.Fiber.first cancels the other fibre when one completes *)
Eio.Fiber.first
(reader recv_sock0 input_type0 (label ^ "/0"))
(reader recv_sock1 input_type1 (label ^ "/1"));
send_eof send_sock ~topic:name Eio.Fiber.all (List.map (fun ((in_type, recv_sock), idx) ->
fun () -> reader idx in_type recv_sock
) indexed) Eio.Fiber.all (List.map (fun ((in_type, recv_sock), idx) ->
fun () -> reader idx in_type recv_sock
) indexed);
send_eof send_sock ~topic:(Printf.sprintf "barrier[%s]" name) match Validate.check type_env out_type result with
| Ok validated -> send_json send_sock ~topic:name validated
| Error errs ->
morphism_fatal label
["error", `String (Validate.error_string errs);
"value", truncate_json result])
(** Filter: predicate-based stream filtering. *) match Validate.check type_env out_type result with
| Ok validated -> send_json send_sock ~topic:name validated
| Error errs ->
morphism_fatal label
["error", `String (Validate.error_string errs);
"value", truncate_json result]);
send_eof send_sock ~topic:name
(** Filter: predicate-based stream filtering. *) match Expr.to_bool result with
| b -> if b then send_json send_sock ~topic:name value
| exception Expr.Eval_error _ -> ())
(** Project: product projection. *) match Expr.to_bool result with
| b -> if b then send_json send_sock ~topic:name value
| exception Expr.Eval_error _ -> ());
send_eof send_sock ~topic:name
(** Project: product projection. *) | _ ->
Runtime.Log.error label ["error", `String "expected JSON array"];
raise Exit)
(** Tagger: stateful envelope tagger for session protocols. *) | _ ->
Runtime.Log.error label ["error", `String "expected JSON array"];
raise Exit);
send_eof send_sock ~topic:name
(** Tagger: stateful envelope tagger for session protocols. *)send_json send_sock ~topic:name envelope)
send_json send_sock ~topic:name envelope); send_eof send_sock ~topic:name
cd /tmp/plmb-037-step5b && nix develop --command dune build --root=. 2>&1 | head -20
cd /tmp/plmb-037-step5b && echo '{"name":"Alice","age":30}' | nix develop --command sh -c './_build/default/bin/plumb/main.exe test/fixtures/identity.plumb' 2>/dev/null; echo "exit: $?"cd /tmp/plmb-037-step5b && echo '{"name":"Alice","age":30}' | nix develop --command sh -c './_build/default/bin/plumb/main.exe test/fixtures/identity.plumb; echo "code=$?"' 2>/dev/nullcd /tmp/plmb-037-step5b && nix develop --command dune test --root=. 2>&1
Command was manually backgrounded by user with ID: b4muvrx1s. Output is being written to: /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b4muvrx1s.output
/tmp/plmb-037-step5b/test/fixtures/agent.plumb /tmp/plmb-037-step5b/test/fixtures/agent_control.plumb /tmp/plmb-037-step5b/test/fixtures/agent_control_arity.plumb /tmp/plmb-037-step5b/test/fixtures/agent_control_mismatch.plumb /tmp/plmb-037-step5b/test/fixtures/agent_control_telemetry.plumb /tmp/plmb-037-step5b/test/fixtures/agent_eliza.plumb /tmp/plmb-037-step5b/test/fixtures/agent_explicit_provider.plumb /tmp/plmb-037-step5b/test/fixtures/agent_no_provider.plumb /tmp/plmb-037-step5b/test/fixtures/agent_nonproduct_three.plumb /tmp/plmb-037-step5b/test/fixtures/agent_openai_provider.plumb
wc -l /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b4muvrx1s.output 2>/dev/null && tail -20 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b4muvrx1s.output 2>/dev/null
381 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b4muvrx1s.output [OK] serialisation 1 multi-trigger send/recv. [OK] loop 0 send/recv loop exec. [OK] <-> integration 0 full pipeline. [OK] <-> integration 1 bare alias error. [OK] <-> integration 2 multi-step channels. [OK] select 0 select exec. [OK] select 1 select + loop exec. Full test results in `/private/tmp/plmb-037-step5b/_build/default/test/_build/_tests/session_exec'. Test Successful in 0.010s. 10 tests run. Testing `fabric'. This run has ID `V2YZBKHR'. [OK] inline 0 identity. [OK] inline 1 copy + merge. [OK] agent 0 agent identity. [OK] agent 1 agent EOF. Full test results in `/private/tmp/plmb-037-step5b/_build/default/test/_build/_tests/fabric'. Test Successful in 3.663s. 4 tests run.
ww 5797 0.0 0.1 435619408 11952 ?? S 9:50AM 0:00.50 dune test --root=. ww 5795 0.0 0.0 435308672 1184 ?? Ss 9:50AM 0:00.00 /bin/zsh -c source /Users/ww/.claude/shell-snapshots/snapshot-zsh-1773431411965-vq5jw6.sh && setopt NO_EXTENDED_GLOB 2>/dev/null || true && eval 'cd /tmp/plmb-037-step5b && nix develop --command dune test --root=. 2>&1' \< /dev/null && pwd -P >| /var/folders/s3/yz2d7zf14wq_rf8sm0xg6mkm0000gn/T/claude-333d-cwd
pgrep -P 5797 2>/dev/null | xargs -I{} ps -p {} -o pid=,command= 2>/dev/nullIt's been running for 8 minutes — it's hanging. The identity pipeline works manually, so it must be a different test case. Let me check what test_plumb tests:
kill 5806 5797 5795 2>/dev/null; sleep 1; rm -f /tmp/plmb-037-step5b/_build/.lock
40→
41→(** Split output into non-empty lines. *)
42→let output_lines stdout =
43→ String.split_on_char '\n' stdout
44→ |> List.filter (fun s -> String.length (String.trim s) > 0)
45→
46→(** {1 Identity via plumb} *)
47→
48→let test_identity_passthrough () =
49→ let input = "{\"name\":\"Alice\",\"age\":30}\n" in
50→ let stdout, _stderr, code = run_plumb "identity.plumb" input in
51→ Alcotest.(check int) "exit 0" 0 code;
52→ Alcotest.(check string) "output" "{\"name\":\"Alice\",\"age\":30}\n" stdout
53→
54→let test_identity_multiple () =
55→ let input = "{\"name\":\"Alice\",\"age\":30}\n{\"name\":\"Bob\",\"age\":25}\n" in
56→ let stdout, _stderr, code = run_plumb "identity.plumb" input in
57→ Alcotest.(check int) "exit 0" 0 code;
58→ let lines = output_lines stdout in
59→ Alcotest.(check int) "two output lines" 2 (List.length lines)
60→
61→let test_identity_empty () =
62→ let stdout, _stderr, code = run_plumb "identity.plumb" "" in
63→ Alcotest.(check int) "exit 0" 0 code;
64→ Alcotest.(check string) "no output" "" stdout
65→
66→(** {1 Copy-merge} *)
67→
68→let test_copy_merge_doubles () =
69→ let input = "{\"text\":\"hello\"}\n" in
70→ let stdout, _stderr, code = run_plumb "copy.plumb" input in
71→ Alcotest.(check int) "exit 0" 0 code;
72→ let lines = output_lines stdout in
73→ Alcotest.(check int) "two output lines" 2 (List.length lines)
74→
75→let test_copy_merge_content () =
76→ let input = "{\"text\":\"hello\"}\n" in
77→ let stdout, _stderr, code = run_plumb "copy.plumb" input in
78→ Alcotest.(check int) "exit 0" 0 code;
79→ let lines = output_lines stdout in
80→ List.iter (fun line ->
81→ Alcotest.(check string) "content preserved" "{\"text\":\"hello\"}" line
82→ ) lines
83→
84→let test_copy_merge_multiple () =
85→ let input = "{\"text\":\"a\"}\n{\"text\":\"b\"}\n{\"text\":\"c\"}\n" in
86→ let stdout, _stderr, code = run_plumb "copy.plumb" input in
87→ Alcotest.(check int) "exit 0" 0 code;
88→ let lines = output_lines stdout in
89→ Alcotest.(check int) "six output lines" 6 (List.length lines);
90→ (* Each message should appear exactly twice *)
91→ let count msg = List.length (List.filter (fun l -> l = msg) lines) in
92→ Alcotest.(check int) "a appears twice" 2 (count "{\"text\":\"a\"}");
93→ Alcotest.(check int) "b appears twice" 2 (count "{\"text\":\"b\"}");
94→ Alcotest.(check int) "c appears twice" 2 (count "{\"text\":\"c\"}")
95→
96→let test_copy_merge_empty () =
97→ let stdout, _stderr, code = run_plumb "copy.plumb" "" in
98→ Alcotest.(check int) "exit 0" 0 code;
99→ Alcotest.(check string) "no output" "" stdout
100→
101→(** {1 Barrier} *)
102→
103→let test_barrier_basic () =
104→ let input = "{\"text\":\"hello\"}\n" in
105→ let stdout, _stderr, code = run_plumb "barrier.plumb" input in
106→ Alcotest.(check int) "exit 0" 0 code;
107→ let lines = output_lines stdout in
108→ Alcotest.(check int) "one output line" 1 (List.length lines);
109→ Alcotest.(check string) "product"
110→ "[{\"text\":\"hello\"},{\"text\":\"hello\"}]" (List.hd lines)
111→
112→let test_barrier_multiple () =
113→ let input = "{\"text\":\"a\"}\n{\"text\":\"b\"}\n{\"text\":\"c\"}\n" in
114→ let stdout, _stderr, code = run_plumb "barrier.plumb" input in
115→ Alcotest.(check int) "exit 0" 0 code;
116→ let lines = output_lines stdout in
117→ Alcotest.(check int) "three output lines" 3 (List.length lines);
118→ List.iter (fun line ->
119→ let json = Yojson.Safe.from_string line in
120→ match json with
121→ | `List [a; b] ->
122→ Alcotest.(check string) "pair elements equal"
123→ (Yojson.Safe.to_string a) (Yojson.Safe.to_string b)
124→ | _ -> Alcotest.fail "expected JSON array of 2 elements"
125→ ) lines
126→
127→let test_barrier_empty () =
128→ let stdout, _stderr, code = run_plumb "barrier.plumb" "" in
129→ Alcotest.(check int) "exit 0" 0 code;
130→ Alcotest.(check string) "no output" "" stdout
131→
132→let test_barrier_named () =
133→ let input = "{\"text\":\"hello\"}\n" in
134→ let stdout, _stderr, code = run_plumb "barrier_named.plumb" input in
135→ Alcotest.(check int) "exit 0" 0 code;
136→ let lines = output_lines stdout in
137→ Alcotest.(check int) "one output line" 1 (List.length lines);
138→ Alcotest.(check string) "product"
139→ "[{\"text\":\"hello\"},{\"text\":\"hello\"}]" (List.hd lines)
140→
141→let test_barrier_type_mismatch () =
142→ let _stdout, _stderr, code = run_plumb "barrier_type_mismatch.plumb" "" in
143→ Alcotest.(check int) "exit 2" 2 code
144→
145→let test_barrier_homogeneous () =
146→ let input = "{\"text\":\"hello\"}\n" in
147→ let stdout, _stderr, code = run_plumb "barrier_homogeneous.plumb" input in
148→ Alcotest.(check int) "exit 0" 0 code;
149→ let lines = output_lines stdout in
150→ Alcotest.(check int) "one output line" 1 (List.length lines);
151→ Alcotest.(check string) "product"
152→ "[{\"text\":\"hello\"},{\"text\":\"hello\"}]" (List.hd lines)
153→
154→let test_barrier_3way () =
155→ let input = "{\"text\":\"hello\"}\n" in
156→ let stdout, _stderr, code = run_plumb "barrier_3way.plumb" input in
157→ Alcotest.(check int) "exit 0" 0 code;
158→ let lines = output_lines stdout in
159→ Alcotest.(check int) "one output line" 1 (List.length lines);
160→ (* 3-way barrier produces flat 3-element JSON array *)
161→ Alcotest.(check string) "3-way product"
162→ "[{\"text\":\"hello\"},{\"text\":\"hello\"},{\"text\":\"hello\"}]"
163→ (List.hd lines)
164→
165→let test_barrier_3way_multiple () =
166→ let input = "{\"text\":\"a\"}\n{\"text\":\"b\"}\n" in
167→ let stdout, _stderr, code = run_plumb "barrier_3way.plumb" input in
168→ Alcotest.(check int) "exit 0" 0 code;
169→ let lines = output_lines stdout in
170→ Alcotest.(check int) "two output lines" 2 (List.length lines);
171→ List.iter (fun line ->
172→ let json = Yojson.Safe.from_string line in
173→ match json with
174→ | `List [a; b; c] ->
175→ Alcotest.(check string) "triple elements equal"
176→ (Yojson.Safe.to_string a) (Yojson.Safe.to_string b);
177→ Alcotest.(check string) "triple elements equal"
178→ (Yojson.Safe.to_string b) (Yojson.Safe.to_string c)
179→ | _ -> Alcotest.fail "expected JSON array of 3 elements"
180→ ) lines
181→
182→(** {1 Project tests} *)
183→
184→let test_project_second () =
185→ let input = "{\"text\":\"hello\"}\n" in
186→ let stdout, _stderr, code = run_plumb "project.plumb" input in
187→ Alcotest.(check int) "exit 0" 0 code;
188→ let lines = output_lines stdout in
189→ Alcotest.(check int) "one output line" 1 (List.length lines);
190→ Alcotest.(check string) "project(1) extracts second element"
191→ "{\"text\":\"hello\"}" (List.hd lines)
192→
193→let test_project_first () =
194→ let input = "{\"text\":\"hello\"}\n" in
195→ let stdout, _stderr, code = run_plumb "project_first.plumb" input in
196→ Alcotest.(check int) "exit 0" 0 code;
197→ let lines = output_lines stdout in
198→ Alcotest.(check int) "one output line" 1 (List.length lines);
199→ Alcotest.(check string) "project(0) extracts first element"
200→ "{\"text\":\"hello\"}" (List.hd lines)
201→
202→let test_project_oob () =
203→ let _stdout, _stderr, code = run_plumb "project_oob.plumb" "" in
204→ Alcotest.(check int) "exit 2 (config error)" 2 code
205→
206→(** {1 Direct structural morphism spawn} *)
207→
208→let test_direct_copy_merge () =
209→ let input = "{\"text\":\"hello\"}\n" in
210→ let stdout, _stderr, code = run_plumb "direct.plumb" input in
211→ Alcotest.(check int) "exit 0" 0 code;
212→ let lines = output_lines stdout in
213→ Alcotest.(check int) "two output lines" 2 (List.length lines);
214→ List.iter (fun line ->
215→ Alcotest.(check string) "content preserved" "{\"text\":\"hello\"}" line
216→ ) lines
217→
218→let test_direct_identity () =
219→ let input = "{\"name\":\"Alice\",\"age\":30}\n" in
220→ let stdout, _stderr, code = run_plumb "direct_id.plumb" input in
221→ Alcotest.(check int) "exit 0" 0 code;
222→ Alcotest.(check string) "output" "{\"name\":\"Alice\",\"age\":30}\n" stdout
223→
224→(** {1 Type checking at spawn boundaries} *)
225→
226→let test_type_mismatch_direct () =
227→ let _stdout, _stderr, code = run_plumb "type_mismatch.plumb" "" in
228→ Alcotest.(check int) "exit 2" 2 code
229→
230→let test_type_mismatch_named () =
231→ let _stdout, _stderr, code = run_plumb "named_mismatch.plumb" "" in
232→ Alcotest.(check int) "exit 2" 2 code
233→
234→let test_reserved_name_rejected () =
235→ let _stdout, _stderr, code = run_plumb "reserved_name.plumb" "" in
236→ Alcotest.(check int) "exit 2" 2 code
237→
238→(** {1 Heterogeneous merge} *)
239→
240→let test_heterogeneous_merge () =
241→ (* Only one arm exercised — ch_b has no writer, so merge reader 1
242→ gets immediate EOF from fd cleanup. Testing both arms without
243→ an agent is impractical since structural morphisms can't
244→ transform types. *)
245→ let input = "{\"text\":\"hello\"}\n" in
246→ let stdout, _stderr, code = run_plumb "hetero_merge.plumb" input in
247→ Alcotest.(check int) "exit 0" 0 code;
248→ let lines = output_lines stdout in
249→ Alcotest.(check int) "one line" 1 (List.length lines);
250→ Alcotest.(check string) "content" "{\"text\":\"hello\"}" (List.hd lines)
251→
252→let test_heterogeneous_merge_mismatch () =
253→ let _stdout, _stderr, code = run_plumb "hetero_mismatch.plumb" "" in
254→ Alcotest.(check int) "exit 2" 2 code
255→
256→(** {1 Product type checking} *)
257→
258→let test_copy_product_mismatch () =
259→ let _stdout, _stderr, code = run_plumb "copy_mismatch.plumb" "" in
260→ Alcotest.(check int) "exit 2" 2 code
261→
262→let test_copy_product_arity () =
263→ let _stdout, _stderr, code = run_plumb "copy_arity.plumb" "" in
264→ Alcotest.(check int) "exit 2" 2 code
265→
266→(** {1 Named spawn} *)
267→
268→let test_named_spawn_identity () =
269→ let input = "{\"name\":\"Alice\",\"age\":30}\n" in
270→ let stdout, _stderr, code = run_plumb "named_identity.plumb" input in
271→ Alcotest.(check int) "exit 0" 0 code;
272→ Alcotest.(check string) "output" "{\"name\":\"Alice\",\"age\":30}\n" stdout
273→
274→let test_named_spawn_copy_merge () =
275→ let input = "{\"text\":\"hello\"}\n" in
276→ let stdout, _stderr, code = run_plumb "named_copy.plumb" input in
277→ Alcotest.(check int) "exit 0" 0 code;
278→ let lines = output_lines stdout in
279→ Alcotest.(check int) "two output lines" 2 (List.length lines);
280→ List.iter (fun line ->
281→ Alcotest.(check string) "content preserved" "{\"text\":\"hello\"}" line
282→ ) lines
283→
284→let test_named_spawn_unknown_port () =
285→ let _stdout, _stderr, code = run_plumb "named_unknown.plumb" "" in
286→ Alcotest.(check int) "exit 2" 2 code
287→
288→let test_named_spawn_missing_port () =
289→ let _stdout, _stderr, code = run_plumb "named_missing.plumb" "" in
290→ Alcotest.(check int) "exit 2" 2 code
291→
292→(** {1 Agent telemetry port type checking} *)
293→
294→let test_agent_nonproduct_rejects_three () =
295→ let _stdout, _stderr, code = run_plumb "agent_nonproduct_three.plumb" "" in
296→ Alcotest.(check int) "exit 2" 2 code
297→
298→let test_agent_product_type_mismatch () =
299→ let _stdout, _stderr, code = run_plumb "agent_product_mismatch.plumb" "" in
300→ Alcotest.(check int) "exit 2" 2 code
301→
302→let test_agent_product_arity () =
303→ let _stdout, _stderr, code = run_plumb "agent_product_arity.plumb" "" in
304→ Alcotest.(check int) "exit 2" 2 code
305→
306→(** {1 Agent control port type checking} *)
307→
308→let test_agent_control_accepted () =
309→ (* Product input (!A, !C) with correct types — passes type-checking.
310→ With empty stdin, agent exits cleanly (exit 0). *)
311→ let _stdout, _stderr, code = run_plumb "agent_control.plumb" "" in
312→ Alcotest.(check int) "exit 0" 0 code
313→
314→let test_agent_control_mismatch () =
315→ let _stdout, _stderr, code = run_plumb "agent_control_mismatch.plumb" "" in
316→ Alcotest.(check int) "exit 2" 2 code
317→
318→let test_agent_control_arity () =
319→ let _stdout, _stderr, code = run_plumb "agent_control_arity.plumb" "" in
320→ Alcotest.(check int) "exit 2" 2 code
321→
322→let test_agent_control_telemetry () =
323→ (* 4-port agent — both control and telemetry *)
324→ let _stdout, _stderr, code = run_plumb "agent_control_telemetry.plumb" "" in
325→ Alcotest.(check int) "exit 0" 0 code
326→
327→let test_wiring_control () =
328→ (* Wiring with control-input agent — desugarer creates implicit channel. *)
329→ let _stdout, _stderr, code = run_plumb "wiring_control.plumb" "" in
330→ Alcotest.(check int) "exit 0" 0 code
331→
332→let test_wiring_telemetry_source () =
333→ (* Regression for PLMB-040 follow-up: agent@telemetry as chain source.
334→ Before the fix, the pre-allocated wsp channel was treated as a phantom
335→ process and raised "has no input channel". *)
336→ let _stdout, _stderr, code = run_plumb "wiring_telemetry_source.plumb" "" in
337→ Alcotest.(check int) "exit 0" 0 code
338→
339→(** {1 Filter} *)
340→
341→let test_filter_passes () =
342→ let input = "{\"score\":90,\"text\":\"great\"}\n" in
343→ let stdout, _stderr, code = run_plumb "filter.plumb" input in
344→ Alcotest.(check int) "exit 0" 0 code;
345→ let lines = output_lines stdout in
346→ Alcotest.(check int) "one output" 1 (List.length lines);
347→ Alcotest.(check string) "content" "{\"score\":90,\"text\":\"great\"}" (List.hd lines)
348→
349→let test_filter_drops () =
350→ let input = "{\"score\":50,\"text\":\"poor\"}\n" in
351→ let stdout, _stderr, code = run_plumb "filter.plumb" input in
352→ Alcotest.(check int) "exit 0" 0 code;
353→ Alcotest.(check string) "no output" "" stdout
354→
355→let test_filter_mixed () =
356→ let input =
357→ "{\"score\":90,\"text\":\"a\"}\n\
358→ {\"score\":50,\"text\":\"b\"}\n\
359→ {\"score\":85,\"text\":\"c\"}\n\
360→ {\"score\":10,\"text\":\"d\"}\n" in
361→ let stdout, _stderr, code = run_plumb "filter.plumb" input in
362→ Alcotest.(check int) "exit 0" 0 code;
363→ let lines = output_lines stdout in
364→ Alcotest.(check int) "two pass" 2 (List.length lines)
365→
366→let test_filter_empty () =
367→ let stdout, _stderr, code = run_plumb "filter.plumb" "" in
368→ Alcotest.(check int) "exit 0" 0 code;
369→ Alcotest.(check string) "no output" "" stdout
370→
371→let test_filter_type_mismatch () =
372→ let _stdout, _stderr, code = run_plumb "filter_mismatch.plumb" "" in
373→ Alcotest.(check int) "exit 2" 2 code
374→
375→(** Regression tests for BUG-160: filter expressions must be type-checked
376→ at compile time. *)
377→
378→let test_filter_field_on_string () =
379→ (* Field access on string type must fail — strings have no fields. *)
380→ let _stdout, stderr, code =
381→ run_plumb "filter_field_on_string.plumb" "" in
382→ Alcotest.(check int) "exit 2" 2 code;
383→ Alcotest.(check bool) "mentions type error" true
384→ (String.length stderr > 0)
385→
386→let test_filter_nonbool_predicate () =
387→ (* filter(score + 1) returns int, not bool — must fail. *)
388→ let _stdout, stderr, code =
389→ run_plumb "filter_nonbool_predicate.plumb" "" in
390→ Alcotest.(check int) "exit 2" 2 code;
391→ Alcotest.(check bool) "mentions type error" true
392→ (String.length stderr > 0)
393→
394→let test_filter_json_field () =
395→ (* Field access on json type should compile — json is dynamically typed. *)
396→ let _stdout, _stderr, code = run_plumb "filter_json_field.plumb" "" in
397→ Alcotest.(check int) "exit 0" 0 code
398→
399→let test_filter_json_compound () =
400→ (* Compound boolean on json fields should compile. *)
401→ let _stdout, _stderr, code =
402→ run_plumb "filter_json_compound.plumb" "" in
403→ Alcotest.(check int) "exit 0" 0 code
404→
405→(** {1 Top-level filter/map expression type checking}
406→
407→ Regression tests for BUG-160: top-level [let f = filter(expr)]
408→ and [let f = map(expr)] must have their expressions type-checked
409→ at compile time, not just inline forms in wiring bodies. *)
410→
411→let test_toplevel_filter_field_on_string () =
412→ let _stdout, stderr, code =
413→ run_plumb "toplevel_filter_field_on_string.plumb" "" in
414→ Alcotest.(check int) "exit 2" 2 code;
415→ Alcotest.(check bool) "mentions type error" true
416→ (String.length stderr > 0)
417→
418→let test_toplevel_filter_nonbool_predicate () =
419→ let _stdout, stderr, code =
420→ run_plumb "toplevel_filter_nonbool_predicate.plumb" "" in
421→ Alcotest.(check int) "exit 2" 2 code;
422→ Alcotest.(check bool) "mentions type error" true
423→ (String.length stderr > 0)
424→
425→let test_toplevel_filter_json_field () =
426→ let _stdout, _stderr, code =
427→ run_plumb "toplevel_filter_json_field.plumb" "" in
428→ Alcotest.(check int) "exit 0" 0 code
429→
430→let test_toplevel_filter_json_compound () =
431→ let _stdout, _stderr, code =
432→ run_plumb "toplevel_filter_json_compound.plumb" "" in
433→ Alcotest.(check int) "exit 0" 0 code
434→
435→let test_toplevel_filter_valid_record () =
436→ let input =
437→ "{\"score\":80,\"text\":\"a\"}\n\
438→ {\"score\":30,\"text\":\"b\"}\n" in
439→ let stdout, _stderr, code =
440→ run_plumb "toplevel_filter_valid_record.plumb" input in
441→ Alcotest.(check int) "exit 0" 0 code;
442→ let lines = output_lines stdout in
443→ Alcotest.(check int) "one passes filter" 1 (List.length lines)
444→
445→let test_toplevel_map_field_on_string () =
446→ let _stdout, stderr, code =
447→ run_plumb "toplevel_map_field_on_string.plumb" "" in
448→ Alcotest.(check int) "exit 2" 2 code;
449→ Alcotest.(check bool) "mentions type error" true
450→ (String.length stderr > 0)
451→
452→let test_toplevel_map_output_type_mismatch () =
453→ let _stdout, stderr, code =
454→ run_plumb "toplevel_map_output_type_mismatch.plumb" "" in
455→ Alcotest.(check int) "exit 2" 2 code;
456→ Alcotest.(check bool) "mentions type error" true
457→ (String.length stderr > 0)
458→
459→let test_toplevel_map_valid_expr () =
460→ let input = "{\"score\":5,\"text\":\"hi\"}\n" in
461→ let stdout, _stderr, code =
462→ run_plumb "toplevel_map_valid_expr.plumb" input in
463→ Alcotest.(check int) "exit 0" 0 code;
464→ let lines = output_lines stdout in
465→ Alcotest.(check int) "one output" 1 (List.length lines);
466→ Alcotest.(check string) "doubled score" "10" (List.hd lines)
467→
468→let test_toplevel_map_json_field () =
469→ let _stdout, _stderr, code =
470→ run_plumb "toplevel_map_json_field.plumb" "" in
471→ Alcotest.(check int) "exit 0" 0 code
472→
473→(** {1 Map} *)
474→
475→let test_map_transform () =
476→ let input = "{\"score\":90,\"text\":\"great\"}\n" in
477→ let stdout, _stderr, code = run_plumb "map.plumb" input in
478→ Alcotest.(check int) "exit 0" 0 code;
479→ let lines = output_lines stdout in
480→ Alcotest.(check int) "one output" 1 (List.length lines);
481→ Alcotest.(check string) "content" "{\"doubled\":180,\"pass\":true}" (List.hd lines)
482→
483→let test_map_below_threshold () =
484→ let input = "{\"score\":40,\"text\":\"poor\"}\n" in
485→ let stdout, _stderr, code = run_plumb "map.plumb" input in
486→ Alcotest.(check int) "exit 0" 0 code;
487→ let lines = output_lines stdout in
488→ Alcotest.(check int) "one output" 1 (List.length lines);
489→ Alcotest.(check string) "content" "{\"doubled\":80,\"pass\":false}" (List.hd lines)
490→
491→let test_map_projection () =
492→ let input = "{\"draft\":\"hello\",\"verdict\":true}\n" in
493→ let stdout, _stderr, code = run_plumb "map_project.plumb" input in
494→ Alcotest.(check int) "exit 0" 0 code;
495→ let lines = output_lines stdout in
496→ Alcotest.(check int) "one output" 1 (List.length lines);
497→ Alcotest.(check string) "content" "true" (List.hd lines)
498→
499→let test_map_empty () =
500→ let stdout, _stderr, code = run_plumb "map.plumb" "" in
501→ Alcotest.(check int) "exit 0" 0 code;
502→ Alcotest.(check string) "no output" "" stdout
503→
504→(** {1 Wiring (static form)} *)
505→
506→let test_wiring_linear () =
507→ let input = "{\"text\":\"hello\"}\n" in
508→ let stdout, _stderr, code = run_plumb "wiring_linear.plumb" input in
509→ Alcotest.(check int) "exit 0" 0 code;
510→ Alcotest.(check string) "passthrough" "{\"text\":\"hello\"}\n" stdout
511→
512→let test_wiring_copy () =
513→ let input = "{\"text\":\"hello\"}\n" in
514→ let stdout, _stderr, code = run_plumb "wiring_copy.plumb" input in
515→ Alcotest.(check int) "exit 0" 0 code;
516→ let lines = output_lines stdout in
517→ Alcotest.(check int) "doubled output" 2 (List.length lines);
518→ List.iter (fun line ->
519→ Alcotest.(check string) "content" "{\"text\":\"hello\"}" line
520→ ) lines
521→
522→let test_wiring_filter () =
523→ let input =
524→ "{\"score\":80,\"text\":\"good\"}\n\
525→ {\"score\":30,\"text\":\"bad\"}\n\
526→ {\"score\":50,\"text\":\"ok\"}\n" in
527→ let stdout, _stderr, code = run_plumb "wiring_filter.plumb" input in
528→ Alcotest.(check int) "exit 0" 0 code;
529→ let lines = output_lines stdout in
530→ Alcotest.(check int) "two pass" 2 (List.length lines)
531→
532→let test_wiring_map_expr () =
533→ let input = "{\"x\":5,\"label\":\"hi\"}\n{\"x\":10,\"label\":\"bye\"}\n" in
534→ let stdout, _stderr, code = run_plumb "wiring_map_expr.plumb" input in
535→ Alcotest.(check int) "exit 0" 0 code;
536→ let lines = output_lines stdout in
537→ Alcotest.(check int) "two lines" 2 (List.length lines);
538→ Alcotest.(check string) "doubled 5" "10" (List.nth lines 0);
539→ Alcotest.(check string) "doubled 10" "20" (List.nth lines 1)
540→
541→let test_wiring_map_record () =
542→ let input = "{\"score\":5,\"text\":\"hi\"}\n" in
543→ let stdout, _stderr, code = run_plumb "wiring_map_record.plumb" input in
544→ Alcotest.(check int) "exit 0" 0 code;
545→ let lines = output_lines stdout in
546→ Alcotest.(check int) "one line" 1 (List.length lines);
547→ Alcotest.(check string) "record output"
548→ "{\"doubled\":10,\"original\":5}" (List.hd lines)
549→
550→(** {1 Wiring heterogeneous merge (static form)} *)
551→
552→let test_wiring_hetero_merge () =
553→ (* Two map branches produce different types, merged into sum output.
554→ Both branches exercise because input fans out via auto-copy. *)
555→ let input = "{\"text\":\"hello\",\"count\":42}\n" in
556→ let stdout, _stderr, code = run_plumb "wiring_hetero_merge.plumb" input in
557→ Alcotest.(check int) "exit 0" 0 code;
558→ let lines = output_lines stdout in
559→ Alcotest.(check int) "two output lines" 2 (List.length lines);
560→ (* Both variants should appear (order depends on scheduling) *)
561→ let has_text = List.exists (fun l -> l = "{\"text\":\"hello\"}") lines in
562→ let has_count = List.exists (fun l -> l = "{\"count\":42}") lines in
563→ Alcotest.(check bool) "text variant" true has_text;
564→ Alcotest.(check bool) "count variant" true has_count
565→
566→let test_wiring_hetero_err () =
567→ (* Target has non-sum input but receives heterogeneous edges — exit 2. *)
568→ let _stdout, _stderr, code = run_plumb "wiring_hetero_err.plumb" "" in
569→ Alcotest.(check int) "exit 2" 2 code
570→
571→(** {1 Tensor + barrier (static form)} *)
572→
573→let test_tensor_barrier () =
574→ let input = "{\"text\":\"hello\"}\n" in
575→ let stdout, _stderr, code = run_plumb "wiring_tensor_barrier.plumb" input in
576→ Alcotest.(check int) "exit 0" 0 code;
577→ let lines = output_lines stdout in
578→ Alcotest.(check int) "one output line" 1 (List.length lines);
579→ Alcotest.(check string) "product"
580→ "[{\"text\":\"hello\"},{\"text\":\"hello\"}]" (List.hd lines)
581→
582→let test_tensor_barrier_multiple () =
583→ let input = "{\"text\":\"a\"}\n{\"text\":\"b\"}\n{\"text\":\"c\"}\n" in
584→ let stdout, _stderr, code = run_plumb "wiring_tensor_barrier.plumb" input in
585→ Alcotest.(check int) "exit 0" 0 code;
586→ let lines = output_lines stdout in
587→ Alcotest.(check int) "three output lines" 3 (List.length lines);
588→ List.iter (fun line ->
589→ let json = Yojson.Safe.from_string line in
590→ match json with
591→ | `List [a; b] ->
592→ Alcotest.(check string) "pair elements equal"
593→ (Yojson.Safe.to_string a) (Yojson.Safe.to_string b)
594→ | _ -> Alcotest.fail "expected 2-element array"
595→ ) lines
596→
597→let test_tensor_merge () =
598→ let input = "{\"text\":\"hello\"}\n" in
599→ let stdout, _stderr, code = run_plumb "wiring_tensor_merge.plumb" input in
600→ Alcotest.(check int) "exit 0" 0 code;
601→ let lines = output_lines stdout in
602→ (* Tensor without barrier: auto-merge produces 2 copies *)
603→ Alcotest.(check int) "doubled output" 2 (List.length lines);
604→ List.iter (fun line ->
605→ Alcotest.(check string) "content" "{\"text\":\"hello\"}" line
606→ ) lines
607→
608→let test_tensor_named_barrier () =
609→ let input = "{\"text\":\"hello\"}\n" in
610→ let stdout, _stderr, code =
611→ run_plumb "wiring_tensor_named_barrier.plumb" input in
612→ Alcotest.(check int) "exit 0" 0 code;
613→ let lines = output_lines stdout in
614→ Alcotest.(check int) "one output line" 1 (List.length lines);
615→ Alcotest.(check string) "product"
616→ "[{\"text\":\"hello\"},{\"text\":\"hello\"}]" (List.hd lines)
617→
618→let test_tensor_barrier_3way () =
619→ let input = "{\"text\":\"hello\"}\n" in
620→ let stdout, _stderr, code =
621→ run_plumb "wiring_tensor_barrier_3way.plumb" input in
622→ Alcotest.(check int) "exit 0" 0 code;
623→ let lines = output_lines stdout in
624→ Alcotest.(check int) "one output line" 1 (List.length lines);
625→ Alcotest.(check string) "3-way product"
626→ "[{\"text\":\"hello\"},{\"text\":\"hello\"},{\"text\":\"hello\"}]"
627→ (List.hd lines)
628→
629→let test_tensor_barrier_3way_multiple () =
630→ let input = "{\"text\":\"a\"}\n{\"text\":\"b\"}\n" in
631→ let stdout, _stderr, code =
632→ run_plumb "wiring_tensor_barrier_3way.plumb" input in
633→ Alcotest.(check int) "exit 0" 0 code;
634→ let lines = output_lines stdout in
635→ Alcotest.(check int) "two output lines" 2 (List.length lines);
636→ List.iter (fun line ->
637→ let json = Yojson.Safe.from_string line in
638→ match json with
639→ | `List [a; b; c] ->
640→ Alcotest.(check string) "triple elements equal"
641→ (Yojson.Safe.to_string a) (Yojson.Safe.to_string b);
642→ Alcotest.(check string) "triple elements equal"
643→ (Yojson.Safe.to_string b) (Yojson.Safe.to_string c)
644→ | _ -> Alcotest.fail "expected 3-element array"
645→ ) lines
646→
647→(** {1 EOF propagation with control channels}
648→
649→ Regression test for the ctrl_out fd leak: agents with has_control=true
650→ had their ctrl_out write fd dup'd in the demux setup. The original
651→ stayed open (claimed, so unclaimed handler skipped it), preventing the
652→ drain_discard thread from getting EOF. Thread.join in shutdown blocked
653→ forever. Fix: use original write fds directly (no dup). *)
654→
655→(** Run plumb with a timeout. Returns (stdout, exit_code) or fails
656→ if the process does not exit within [timeout] seconds. *)
657→let run_plumb_timeout ?(timeout=10.0) spec input =
658→ let spec_path = fixture spec in
659→ let stdin_r, stdin_w = Unix.pipe ~cloexec:true () in
660→ let stdout_r, stdout_w = Unix.pipe ~cloexec:true () in
661→ let stderr_r, stderr_w = Unix.pipe ~cloexec:true () in
662→ let pid = Unix.create_process plumb_bin
663→ [| plumb_bin; spec_path |] stdin_r stdout_w stderr_w in
664→ Unix.close stdin_r; Unix.close stdout_w; Unix.close stderr_w;
665→ let oc = Unix.out_channel_of_descr stdin_w in
666→ output_string oc input; close_out oc;
667→ (* Read stdout with select-based timeout *)
668→ let buf = Buffer.create 256 in
669→ let chunk = Bytes.create 4096 in
670→ let deadline = Unix.gettimeofday () +. timeout in
671→ let rec drain () =
672→ let remaining = deadline -. Unix.gettimeofday () in
673→ if remaining <= 0.0 then ()
674→ else match Unix.select [stdout_r] [] [] remaining with
675→ | readable, _, _ when readable <> [] ->
676→ let n = Unix.read stdout_r chunk 0 4096 in
677→ if n > 0 then (Buffer.add_subbytes buf chunk 0 n; drain ())
678→ | _ -> ()
679→ in
680→ drain (); Unix.close stdout_r;
681→ (* Wait for process with timeout *)
682→ let deadline = Unix.gettimeofday () +. timeout in
683→ let rec wait () =
684→ match Unix.waitpid [Unix.WNOHANG] pid with
685→ | 0, _ ->
686→ if Unix.gettimeofday () >= deadline then begin
687→ Unix.kill pid Sys.sigkill;
688→ ignore (Unix.waitpid [] pid);
689→ (try Unix.close stderr_r with Unix.Unix_error _ -> ());
690→ Alcotest.fail
691→ "pipeline did not exit within timeout — EOF not propagated"
692→ end;
693→ Unix.sleepf 0.05; wait ()
694→ | _, status ->
695→ (try Unix.close stderr_r with Unix.Unix_error _ -> ());
696→ (match status with Unix.WEXITED c -> c | _ -> 128)
697→ in
698→ let code = wait () in
699→ (Buffer.contents buf, code)
700→
701→let test_eof_control_agent () =
702→ let stdout, code =
703→ run_plumb_timeout "eliza_session.plumb" "\"hello\"\n" in
704→ Alcotest.(check int) "exit 0" 0 code;
705→ let lines = output_lines stdout in
706→ Alcotest.(check bool) "got output" true (List.length lines >= 1);
707→ (* Verify output is valid JSON — catches truncated writes from fd issues *)
708→ List.iter (fun line ->
709→ match Yojson.Safe.from_string line with
710→ | _ -> ()
711→ | exception _ -> Alcotest.fail ("invalid JSON in output: " ^ line)
712→ ) lines
713→
714→let test_eof_control_agent_empty () =
715→ let _stdout, code =
716→ run_plumb_timeout "eliza_session.plumb" "" in
717→ Alcotest.(check int) "exit 0" 0 code
718→
719→(* ── _format_json ─────────────────────────────────────────────── *)
720→
721→let test_format_json_object () =
722→ let input = "{\"a\":1}\n" in
723→ let stdout, _stderr, code = run_plumb "format_json.plumb" input in
724→ Alcotest.(check int) "exit 0" 0 code;
725→ (* Output should be the JSON string containing the serialised object *)
726→ let lines = output_lines stdout in
727→ Alcotest.(check int) "one output line" 1 (List.length lines);
728→ let result = Yojson.Safe.from_string (List.hd lines) in
729→ (match result with
730→ | `String s ->
731→ let reparsed = Yojson.Safe.from_string s in
732→ Alcotest.(check string) "roundtrip" "{\"a\":1}"
733→ (Yojson.Safe.to_string reparsed)
734→ | _ -> Alcotest.fail "expected JSON string output")
735→
736→let test_format_json_multiple () =
737→ let input = "{\"a\":1}\n{\"b\":2}\n[3,4]\n" in
738→ let stdout, _stderr, code = run_plumb "format_json.plumb" input in
739→ Alcotest.(check int) "exit 0" 0 code;
740→ let lines = output_lines stdout in
741→ Alcotest.(check int) "three output lines" 3 (List.length lines)
742→
743→let test_format_json_empty () =
744→ let stdout, _stderr, code = run_plumb "format_json.plumb" "" in
745→ Alcotest.(check int) "exit 0" 0 code;
746→ Alcotest.(check string) "no output" "" stdout
747→
748→let test_format_json_wrong_type () =
749→ let _stdout, _stderr, code = run_plumb "format_json_wrong_type.plumb" "" in
750→ Alcotest.(check int) "exit 2 (config_error)" 2 code
751→
752→(* ── _parse_json ──────────────────────────────────────────────── *)
753→
754→let test_parse_json_string () =
755→ (* Input is a JSON string value containing serialised JSON *)
756→ let input = "\"\\\"hello\\\"\"\n" in
757→ let stdout, _stderr, code = run_plumb "parse_json.plumb" input in
758→ Alcotest.(check int) "exit 0" 0 code;
759→ let lines = output_lines stdout in
760→ Alcotest.(check int) "one output line" 1 (List.length lines);
761→ Alcotest.(check string) "parsed" "\"hello\"" (List.hd lines)
762→
763→let test_parse_json_multiple () =
764→ let input = "\"{\\\"a\\\":1}\"\n\"{\\\"b\\\":2}\"\n" in
765→ let stdout, _stderr, code = run_plumb "parse_json.plumb" input in
766→ Alcotest.(check int) "exit 0" 0 code;
767→ let lines = output_lines stdout in
768→ Alcotest.(check int) "two output lines" 2 (List.length lines)
769→
770→let test_parse_json_empty () =
771→ let stdout, _stderr, code = run_plumb "parse_json.plumb" "" in
772→ Alcotest.(check int) "exit 0" 0 code;
773→ Alcotest.(check string) "no output" "" stdout
774→
775→let test_parse_json_wrong_type () =
776→ let _stdout, _stderr, code = run_plumb "parse_json_wrong_type.plumb" "" in
777→ Alcotest.(check int) "exit 2 (config_error)" 2 code
778→
779→(* ── algebraic laws ───────────────────────────────────────────── *)
780→
781→let test_format_parse_identity () =
782→ (* Law: _format_json ; _parse_json = id on !json *)
783→ let input = "{\"a\":1}\n[1,2,3]\nnull\ntrue\n42\n\"hello\"\n" in
784→ let stdout, _stderr, code = run_plumb "format_parse_id.plumb" input in
785→ Alcotest.(check int) "exit 0" 0 code;
786→ let in_lines = output_lines input in
787→ let out_lines = output_lines stdout in
788→ Alcotest.(check int) "same count" (List.length in_lines) (List.length out_lines);
789→ List.iter2 (fun i o ->
790→ Alcotest.(check string) "identity" i o
791→ ) in_lines out_lines
792→
793→let test_parse_format_identity () =
794→ (* Law: _parse_json ; _format_json ≅ id on !string (up to iso) *)
795→ let input = "\"{\\\"a\\\":1}\"\n\"[1,2,3]\"\n\"null\"\n" in
796→ let stdout, _stderr, code = run_plumb "parse_format_id.plumb" input in
797→ Alcotest.(check int) "exit 0" 0 code;
798→ let in_lines = output_lines input in
799→ let out_lines = output_lines stdout in
800→ Alcotest.(check int) "same count" (List.length in_lines) (List.length out_lines);
801→ (* Compare structurally — whitespace may be normalised *)
802→ List.iter2 (fun i o ->
803→ let i_json = Yojson.Safe.from_string i in
804→ let o_json = Yojson.Safe.from_string o in
805→ let i_inner = match i_json with `String s -> s | _ -> Yojson.Safe.to_string i_json in
806→ let o_inner = match o_json with `String s -> s | _ -> Yojson.Safe.to_string o_json in
807→ let i_parsed = Yojson.Safe.from_string i_inner in
808→ let o_parsed = Yojson.Safe.from_string o_inner in
809→ Alcotest.(check string) "iso"
810→ (Yojson.Safe.to_string i_parsed)
811→ (Yojson.Safe.to_string o_parsed)
812→ ) in_lines out_lines
813→
814→let () =
815→ Alcotest.run "plumb" [
816→ "identity via plumb", [
817→ Alcotest.test_case "passthrough" `Quick test_identity_passthrough;
818→ Alcotest.test_case "multiple lines" `Quick test_identity_multiple;
819→ Alcotest.test_case "empty stdin" `Quick test_identity_empty;
820→ ];
821→ "copy-merge", [
822→ Alcotest.test_case "doubles output" `Quick test_copy_merge_doubles;
823→ Alcotest.test_case "preserves content" `Quick test_copy_merge_content;
824→ Alcotest.test_case "multiple messages" `Quick test_copy_merge_multiple;
825→ Alcotest.test_case "empty stdin" `Quick test_copy_merge_empty;
826→ ];
827→ "barrier", [
828→ Alcotest.test_case "basic product" `Quick test_barrier_basic;
829→ Alcotest.test_case "multiple messages" `Quick test_barrier_multiple;
830→ Alcotest.test_case "empty stdin" `Quick test_barrier_empty;
831→ Alcotest.test_case "named binding" `Quick test_barrier_named;
832→ Alcotest.test_case "type mismatch" `Quick test_barrier_type_mismatch;
833→ Alcotest.test_case "homogeneous" `Quick test_barrier_homogeneous;
834→ Alcotest.test_case "3-way product" `Quick test_barrier_3way;
835→ Alcotest.test_case "3-way multiple" `Quick test_barrier_3way_multiple;
836→ ];
837→ "project", [
838→ Alcotest.test_case "project(1) second element" `Quick test_project_second;
839→ Alcotest.test_case "project(0) first element" `Quick test_project_first;
840→ Alcotest.test_case "out of bounds" `Quick test_project_oob;
841→ ];
842→ "direct structural spawn", [
843→ Alcotest.test_case "direct copy-merge" `Quick test_direct_copy_merge;
844→ Alcotest.test_case "direct identity" `Quick test_direct_identity;
845→ ];
846→ "type checking", [
847→ Alcotest.test_case "direct type mismatch" `Quick test_type_mismatch_direct;
848→ Alcotest.test_case "named type mismatch" `Quick test_type_mismatch_named;
849→ Alcotest.test_case "reserved name rejected" `Quick test_reserved_name_rejected;
850→ ];
851→ "heterogeneous merge", [
852→ Alcotest.test_case "accepts (A, B) → A | B" `Quick test_heterogeneous_merge;
853→ Alcotest.test_case "rejects wrong output type" `Quick test_heterogeneous_merge_mismatch;
854→ ];
855→ "product type checking", [
856→ Alcotest.test_case "copy output mismatch" `Quick test_copy_product_mismatch;
857→ Alcotest.test_case "copy wrong arity" `Quick test_copy_product_arity;
858→ ];
859→ "named spawn", [
860→ Alcotest.test_case "identity passthrough" `Quick test_named_spawn_identity;
861→ Alcotest.test_case "copy-merge doubles" `Quick test_named_spawn_copy_merge;
862→ Alcotest.test_case "unknown port rejected" `Quick test_named_spawn_unknown_port;
863→ Alcotest.test_case "missing port rejected" `Quick test_named_spawn_missing_port;
864→ ];
865→ "agent telemetry port", [
866→ Alcotest.test_case "non-product rejects 3 channels" `Quick test_agent_nonproduct_rejects_three;
867→ Alcotest.test_case "product type mismatch" `Quick test_agent_product_type_mismatch;
868→ Alcotest.test_case "product wrong arity" `Quick test_agent_product_arity;
869→ ];
870→ "agent control port", [
871→ Alcotest.test_case "product input accepted" `Quick test_agent_control_accepted;
872→ Alcotest.test_case "control type mismatch" `Quick test_agent_control_mismatch;
873→ Alcotest.test_case "control wrong arity" `Quick test_agent_control_arity;
874→ Alcotest.test_case "control + telemetry" `Quick test_agent_control_telemetry;
875→ Alcotest.test_case "wiring implicit control" `Quick test_wiring_control;
876→ Alcotest.test_case "wiring telemetry as chain source" `Quick test_wiring_telemetry_source;
877→ ];
878→ "filter", [
879→ Alcotest.test_case "passes matching" `Quick test_filter_passes;
880→ Alcotest.test_case "drops non-matching" `Quick test_filter_drops;
881→ Alcotest.test_case "mixed pass/drop" `Quick test_filter_mixed;
882→ Alcotest.test_case "empty stdin" `Quick test_filter_empty;
883→ Alcotest.test_case "type mismatch exit 2" `Quick test_filter_type_mismatch;
884→ Alcotest.test_case "field on string rejected" `Quick test_filter_field_on_string;
885→ Alcotest.test_case "non-bool predicate rejected" `Quick test_filter_nonbool_predicate;
886→ Alcotest.test_case "json field access accepted" `Quick test_filter_json_field;
887→ Alcotest.test_case "json compound expr accepted" `Quick test_filter_json_compound;
888→ ];
889→ "toplevel filter", [
890→ Alcotest.test_case "field on string rejected" `Quick test_toplevel_filter_field_on_string;
891→ Alcotest.test_case "non-bool predicate rejected" `Quick test_toplevel_filter_nonbool_predicate;
892→ Alcotest.test_case "json field access accepted" `Quick test_toplevel_filter_json_field;
893→ Alcotest.test_case "json compound expr accepted" `Quick test_toplevel_filter_json_compound;
894→ Alcotest.test_case "valid record filter runs" `Quick test_toplevel_filter_valid_record;
895→ ];
896→ "toplevel map", [
897→ Alcotest.test_case "field on string rejected" `Quick test_toplevel_map_field_on_string;
898→ Alcotest.test_case "output type mismatch rejected" `Quick test_toplevel_map_output_type_mismatch;
899→ Alcotest.test_case "valid expr with matching output" `Quick test_toplevel_map_valid_expr;
900→ Alcotest.test_case "json field access accepted" `Quick test_toplevel_map_json_field;
901→ ];
902→ "map", [
903→ Alcotest.test_case "transforms record" `Quick test_map_transform;
904→ Alcotest.test_case "below threshold" `Quick test_map_below_threshold;
905→ Alcotest.test_case "field projection" `Quick test_map_projection;
906→ Alcotest.test_case "empty stdin" `Quick test_map_empty;
907→ ];
908→ "wiring", [
909→ Alcotest.test_case "linear chain" `Quick test_wiring_linear;
910→ Alcotest.test_case "auto copy-merge" `Quick test_wiring_copy;
911→ Alcotest.test_case "inline filter" `Quick test_wiring_filter;
912→ Alcotest.test_case "inline map expr" `Quick test_wiring_map_expr;
913→ Alcotest.test_case "inline map record" `Quick test_wiring_map_record;
914→ Alcotest.test_case "hetero merge sum output" `Quick test_wiring_hetero_merge;
915→ Alcotest.test_case "hetero merge non-sum error" `Quick test_wiring_hetero_err;
916→ ];
917→ "tensor", [
918→ Alcotest.test_case "tensor + barrier" `Quick test_tensor_barrier;
919→ Alcotest.test_case "tensor + barrier multiple" `Quick test_tensor_barrier_multiple;
920→ Alcotest.test_case "tensor + auto-merge" `Quick test_tensor_merge;
921→ Alcotest.test_case "tensor + named barrier" `Quick test_tensor_named_barrier;
922→ Alcotest.test_case "tensor + 3-way barrier" `Quick test_tensor_barrier_3way;
923→ Alcotest.test_case "tensor + 3-way barrier multiple" `Quick test_tensor_barrier_3way_multiple;
924→ ];
925→ "eof-control", [
926→ Alcotest.test_case "exits cleanly with input" `Quick test_eof_control_agent;
927→ Alcotest.test_case "exits cleanly on empty input" `Quick test_eof_control_agent_empty;
928→ ];
929→ "_format_json", [
930→ Alcotest.test_case "serialises object" `Quick test_format_json_object;
931→ Alcotest.test_case "multiple messages" `Quick test_format_json_multiple;
932→ Alcotest.test_case "empty stdin" `Quick test_format_json_empty;
933→ Alcotest.test_case "wrong type rejected" `Quick test_format_json_wrong_type;
934→ ];
935→ "_parse_json", [
936→ Alcotest.test_case "deserialises string" `Quick test_parse_json_string;
937→ Alcotest.test_case "multiple messages" `Quick test_parse_json_multiple;
938→ Alcotest.test_case "empty stdin" `Quick test_parse_json_empty;
939→ Alcotest.test_case "wrong type rejected" `Quick test_parse_json_wrong_type;
940→ ];
941→ "json algebraic laws", [
942→ Alcotest.test_case "format ; parse = id" `Quick test_format_parse_identity;
943→ Alcotest.test_case "parse ; format ≅ id" `Quick test_parse_format_identity;
944→ ];
945→ ]
946→cd /tmp/plmb-037-step5b && nix develop --command sh -c 'echo "{\"text\":\"hello\"}" | ./_build/default/bin/plumb/main.exe test/fixtures/copy.plumb; echo "exit=$?"' 2>/dev/nullcat /tmp/plmb-037-step5b/test/fixtures/eliza_session.plumb
(* Minimal agent-with-control-channels for EOF regression test.
Uses ELIZA provider — no API key required.
The (!string, !CtrlCmd) -> !string input signature triggers
has_control=true in the executor. Port layout with has_control:
[input, ctrl_in, output, ctrl_out]. The ctrl_out channel has no
downstream reader, so the unclaimed handler sets up drain_discard.
Before the fix, the dup'd write fd left the original open,
preventing drain_discard from ever getting EOF — hanging shutdown. *)
type Ping = { ping: bool }
type CtrlCmd = Ping
let echobot : (!string, !CtrlCmd) -> !string = agent {
provider: "eliza",
model: "doctor"
}
let main : !string -> !string = plumb(input, output) {
let ctrl : !CtrlCmd = channel
spawn echobot(input=input, ctrl_in=ctrl, output=output)
}This test spawns an agent subprocess. Under the new fabric, the binary resolves the agent via the bridge. The bridge spawns the plumb-agent binary. The test has a 10s timeout, so it should either pass or fail within 10s.
Let me run just the test_plumb binary directly with specific test to check timing:
cd /tmp/plmb-037-step5b && time nix develop --command sh -c 'echo "" | ./_build/default/bin/plumb/main.exe test/fixtures/identity.plumb; echo "exit=$?"' 2>/dev/null
exit=0 nix develop --command sh -c 2> /dev/null 0.15s user 0.10s system 11% cpu 2.203 total
2.2 seconds per invocation — 1.5s from Fabric.shutdown's sleep. With ~80 tests, that's ~176 seconds = ~3 minutes. But the test was hanging for 8+ minutes. Let me check if there's a specific test hanging:
cd /tmp/plmb-037-step5b && nix develop --command sh -c './_build/default/test/test_plumb.exe -v' 2>&1 | head -100 & BGPID=$! sleep 20 kill $BGPID 2>/dev/null wait $BGPID 2>/dev/null
warning: Git tree '/private/tmp/plmb-037-step5b' is dirty Testing `plumb'. This run has ID `P8NUTHYO'. [FAIL] identity via plumb 0 passthrough. [FAIL] identity via plumb 1 multiple lines. [FAIL] identity via plumb 2 empty stdin. [FAIL] copy-merge 0 doubles output. [FAIL] copy-merge 1 preserves content. [FAIL] copy-merge 2 multiple messages. [FAIL] copy-merge 3 empty stdin. [FAIL] barrier 0 basic product. [FAIL] barrier 1 multiple messages. [FAIL] barrier 2 empty stdin. [FAIL] barrier 3 named binding. [FAIL] barrier 4 type mismatch. [FAIL] barrier 5 homogeneous. [FAIL] barrier 6 3-way product. [FAIL] barrier 7 3-way multiple. [FAIL] project 0 project(1) second element. [FAIL] project 1 project(0) first element. [FAIL] project 2 out of bounds. [FAIL] direct structural spawn 0 direct copy-merge. [FAIL] direct structural spawn 1 direct identity. [FAIL] type checking 0 direct type mismatch. [FAIL] type checking 1 named type mismatch. [FAIL] type checking 2 reserved name rejected. [FAIL] heterogeneous merge 0 accepts (A, B) → A | B. [FAIL] heterogeneous merge 1 rejects wrong output type. [FAIL] product type checking 0 copy output mismatch. [FAIL] product type checking 1 copy wrong arity. [FAIL] named spawn 0 identity passthrough. [FAIL] named spawn 1 copy-merge doubles. [FAIL] named spawn 2 unknown port rejected. [FAIL] named spawn 3 missing port rejected. [FAIL] agent telemetry port 0 non-product rejects 3 ch... [FAIL] agent telemetry port 1 product type mismatch. [FAIL] agent telemetry port 2 product wrong arity. [FAIL] agent control port 0 product input accepted. [FAIL] agent control port 1 control type mismatch. [FAIL] agent control port 2 control wrong arity. [FAIL] agent control port 3 control + telemetry. [FAIL] agent control port 4 wiring implicit control. [FAIL] agent control port 5 wiring telemetry as chai... [FAIL] filter 0 passes matching. [FAIL] filter 1 drops non-matching. [FAIL] filter 2 mixed pass/drop. [FAIL] filter 3 empty stdin. [FAIL] filter 4 type mismatch exit 2. [FAIL] filter 5 field on string rejected. [FAIL] filter 6 non-bool predicate rejec... [FAIL] filter 7 json field access accepted. [FAIL] filter 8 json compound expr accep... [FAIL] toplevel filter 0 field on string rejected. [FAIL] toplevel filter 1 non-bool predicate rejec... [FAIL] toplevel filter 2 json field access accepted. [FAIL] toplevel filter 3 json compound expr accep... [FAIL] toplevel filter 4 valid record filter runs. [FAIL] toplevel map 0 field on string rejected. [FAIL] toplevel map 1 output type mismatch rej... [FAIL] toplevel map 2 valid expr with matching... [FAIL] toplevel map 3 json field access accepted. [FAIL] map 0 transforms record. [FAIL] map 1 below threshold. [FAIL] map 2 field projection. [FAIL] map 3 empty stdin. [FAIL] wiring 0 linear chain. [FAIL] wiring 1 auto copy-merge. [FAIL] wiring 2 inline filter. [FAIL] wiring 3 inline map expr. [FAIL] wiring 4 inline map record. [FAIL] wiring 5 hetero merge sum output. [FAIL] wiring 6 hetero merge non-sum error. [FAIL] tensor 0 tensor + barrier. [FAIL] tensor 1 tensor + barrier multiple. [FAIL] tensor 2 tensor + auto-merge. [FAIL] tensor 3 tensor + named barrier. [FAIL] tensor 4 tensor + 3-way barrier. [FAIL] tensor 5 tensor + 3-way barrier m... [FAIL] eof-control 0 exits cleanly with input. [FAIL] eof-control 1 exits cleanly on empty i... [FAIL] _format_json 0 serialises object. [FAIL] _format_json 1 multiple messages. [FAIL] _format_json 2 empty stdin. [FAIL] _format_json 3 wrong type rejected. [FAIL] _parse_json 0 deserialises string. [FAIL] _parse_json 1 multiple messages. [FAIL] _parse_json 2 empty stdin. [FAIL] _parse_json 3 wrong type rejected. [FAIL] json algebraic laws 0 format ; parse = id. [FAIL] json algebraic laws 1 parse ; format ≅ id. ┌──────────────────────────────────────────────────────────────────────────────┐ │ [FAIL] identity via plumb 0 passthrough. │ └──────────────────────────────────────────────────────────────────────────────┘ Unix.Unix_error(Unix.ENOENT, "create_process", "../bin/plumb/main.exe") Raised by primitive operation at Unix.create_process_gen.(fun) in file "unix.ml", line 923, characters 15-68 Called from Stdlib__Fun.protect in file "fun.ml", line 34, characters 8-15 Re-raised at Stdlib__Fun.protect in file "fun.ml", line 39, characters 6-52 Called from Unix.create_process in file "unix.ml" (inlined), line 926, characters 2-66
All tests are failing because they can't find ../bin/plumb/main.exe. This is a path resolution issue — the test binary is in a different directory in the build tree. The test uses ../bin/plumb/main.exe as a relative path but it's running from the wrong working directory.
This isn't a real test failure — the tests need to be run from the right directory, or the test infrastructure handles this. Let me check how dune runs these:
1→(tests
2→ (names test_parser test_validate test_plumb test_expr
3→ test_session_compile)
4→ (libraries language runtime alcotest yojson unix)
5→ (deps
6→ (glob_files fixtures/*.plumb)
7→ ../bin/plumb/main.exe))
8→
9→(test
10→ (name test_session_exec)
11→ (libraries language runtime plumb alcotest yojson unix threads.posix)
12→ (deps
13→ ../bin/plumb/main.exe))
14→
15→(test
16→ (name test_tool)
17→ (libraries language runtime plumb alcotest yojson unix str threads.posix)
18→ (deps
19→ (glob_files fixtures/*.plumb)
20→ ../bin/agent/main.exe))
21→
22→(test
23→ (name test_agent_config)
24→ (libraries alcotest unix str)
25→ (deps
26→ (glob_files fixtures/*.plumb)
27→ ../bin/agent/main.exe))
28→
29→(test
30→ (name test_agent_control)
31→ (libraries alcotest unix str yojson)
32→ (deps
33→ (glob_files fixtures/*.plumb)
34→ ../bin/agent/main.exe))
35→
36→(test
37→ (name test_agent_eliza)
38→ (libraries alcotest unix str yojson)
39→ (deps
40→ (glob_files fixtures/*.plumb)
41→ ../bin/agent/main.exe))
42→
43→(test
44→ (name test_pinning)
45→ (libraries agent client alcotest yojson str))
46→
47→(test
48→ (name test_docstore)
49→ (libraries docstore alcotest unix))
50→
51→(test
52→ (name test_render_group)
53→ (libraries render alcotest))
54→
55→(test
56→ (name test_error)
57→ (libraries runtime alcotest yojson))
58→
59→(test
60→ (name test_transport)
61→ (libraries transport alcotest eio_main unix))
62→
63→(test
64→ (name test_routing)
65→ (libraries language runtime alcotest))
66→
67→(test
68→ (name test_supervisor)
69→ (libraries supervisor alcotest unix))
70→
71→(test
72→ (name test_fabric)
73→ (libraries fabric transport language runtime spawn_lib alcotest eio_main eio.unix unix yojson threads.posix)
74→ (deps
75→ ../bin/agent/main.exe))
76→The test depends on ../bin/plumb/main.exe and runs with dune test from the test directory. When I ran it directly from the command line, the relative path didn't resolve. The dune test run should handle this correctly. The hanging must have been from the earlier version without EOF propagation.
Let me re-run the tests with dune properly:
cd /tmp/plmb-037-step5b && nix develop --command dune test --root=. 2>&1 | tail -40
Command was manually backgrounded by user with ID: b8hivsw7g. Output is being written to: /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b8hivsw7g.output
sleep 30 && tail -30 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b8hivsw7g.output 2>/dev/null
sleep 60 && tail -40 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b8hivsw7g.output 2>/dev/null
Command was manually backgrounded by user with ID: b9kbq47mr. Output is being written to: /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b9kbq47mr.output
You're right — each test case invocation takes ~2s because Fabric.shutdown calls Unix.sleepf 1.5. With ~80 test cases in test_plumb.ml alone, that's ~160s which is far too long. The old executor didn't have this sleep.
Let me check what the 1.5s sleep is for and whether we can reduce or eliminate it for the plumb binary:
kill $(pgrep -f "dune test") 2>/dev/null; sleep 1; rm -f /tmp/plmb-037-step5b/_build/.lock
The Unix.sleepf 1.5 in Fabric.shutdown is there to wait for system threads blocked in recv to notice socket closure. But for the plumb binary, by the time we call Fabric.shutdown, all fibres have already exited (the switch has already been left). The cancel() was already called, sockets are already closed.
Let me look at the shutdown flow more carefully and optimise it:
335→ 336→(** Re-export Bridge for external access. *) 337→module Bridge = Bridge 338→ 339→(** Shut down the pipeline. 340→ Closes all sockets (triggering recv timeout/failure in system 341→ threads), then destroys the ZMQ context and cleans up. *) 342→let shutdown t = 343→ let _ = Supervisor.shutdown t.supervisor in 344→ t.cancel (); 345→ (* Wait for system threads blocked in recv to notice the socket 346→ closure (up to the 1s receive timeout) *) 347→ Unix.sleepf 1.5; 348→ T.destroy_context t.ctx; 349→ cleanup_fabric_dir t.pipeline_id 350→
The problem: shutdown calls cancel() again (redundant since the plumb binary already called it), then sleeps 1.5s. Since all fibres have already exited by the time we leave the switch, there are no system threads to wait for.
I can skip the sleep if cancel was already called. Or better, reduce the sleep. But the simplest fix: in the plumb binary, skip the full shutdown since cancel was already called, and just do cleanup:
90→ let input_port = List.nth ports 0 in
91→ let output_port = List.nth ports n_in in
92→ (* Run the fabric under Eio *)
93→ let fabric_handle = ref None in
94→ Eio_main.run (fun _env ->
95→ Eio.Switch.run (fun sw ->
96→ let f = Fabric.execute ~sw ~routing_table ~type_env
97→ ~process_handler:handler () in
98→ fabric_handle := Some f;
99→ fabric_ref := Some f;
100→ (* Writer daemon: stdin → boundary input socket *)
101→ let push = Fabric.boundary_input f input_port in
102→ Eio.Fiber.fork_daemon ~sw (fun () ->
103→ (try while true do
104→ let line = Eio_unix.run_in_systhread (fun () ->
105→ input_line stdin) in
106→ Transport.Zmq.send push ~topic:input_port line
107→ done with End_of_file | Sys_error _ -> ());
108→ Fabric.Bridge.send_eof push ~topic:input_port;
109→ `Stop_daemon);
110→ (* Reader: boundary output socket → stdout (main fibre) *)
111→ let pull = Fabric.boundary_output f output_port in
112→ (try while true do
113→ let (_topic, payload) = Transport.Zmq.recv pull in
114→ if Fabric.Bridge.is_eof payload then raise Exit;
115→ Eio_unix.run_in_systhread (fun () ->
116→ print_endline payload)
117→ done with
118→ | Exit -> ()
119→ | Transport.Transport_error _ -> ());
120→ (* Reader exited — cancel the fabric to unwind all fibres *)
121→ f.cancel ()));
122→ (* Outside switch scope — safe to call shutdown (may sleep) *)
123→ (match !fabric_handle with
124→ | Some f -> Fabric.shutdown f
125→ | None -> ()))
126→The issue is that Fabric.shutdown is unnecessarily slow. Let me add a lighter-weight cleanup function or make shutdown smarter. The simplest approach: add a cleanup function that just cleans up without the sleep, since we know all fibres are done:
(** Shut down the pipeline.
Closes all sockets (triggering recv timeout/failure in system
threads), then destroys the ZMQ context and cleans up. *)
let shutdown t =
let _ = Supervisor.shutdown t.supervisor in
t.cancel ();
(* Wait for system threads blocked in recv to notice the socket
closure (up to the 1s receive timeout) *)
Unix.sleepf 1.5;
T.destroy_context t.ctx;
cleanup_fabric_dir t.pipeline_id(** Shut down the pipeline.
Closes all sockets (triggering recv timeout/failure in system
threads), then destroys the ZMQ context and cleans up. *)
let shutdown t =
let _ = Supervisor.shutdown t.supervisor in
t.cancel ();
(* Wait for system threads blocked in recv to notice the socket
closure (up to the 1s receive timeout) *)
Unix.sleepf 1.5;
T.destroy_context t.ctx;
cleanup_fabric_dir t.pipeline_id
(** Lightweight cleanup after orderly shutdown. Use this when
[cancel()] was already called and all fibres have exited
(e.g. after the Eio switch exits). Skips the 1.5s sleep
since no system threads are outstanding. *)
let cleanup t =
let _ = Supervisor.shutdown t.supervisor in
T.destroy_context t.ctx;
cleanup_fabric_dir t.pipeline_id (* Outside switch scope — safe to call shutdown (may sleep) *)
(match !fabric_handle with
| Some f -> Fabric.shutdown f
| None -> ())) (* Outside switch scope — all fibres have exited, just clean up.
Use cleanup (not shutdown) to avoid the 1.5s sleep — cancel()
was already called and all sockets are closed. *)
(match !fabric_handle with
| Some f -> Fabric.cleanup f
| None -> ()))cd /tmp/plmb-037-step5b && nix develop --command dune build --root=. 2>&1 | head -10
cd /tmp/plmb-037-step5b && time nix develop --command sh -c 'echo "{\"name\":\"Alice\",\"age\":30}" | ./_build/default/bin/plumb/main.exe test/fixtures/identity.plumb' 2>/dev/null{"name":"Alice","age":30}
nix develop --command sh -c 2> /dev/null 0.84s user 0.19s system 41% cpu 2.474 totalcd /tmp/plmb-037-step5b && nix develop --command sh -c 'time echo "{\"name\":\"Alice\",\"age\":30}" | ./_build/default/bin/plumb/main.exe test/fixtures/identity.plumb 2>/dev/null'warning: Git tree '/private/tmp/plmb-037-step5b' is dirty
{"name":"Alice","age":30}
real 0m0.015s
user 0m0.006s
sys 0m0.006scd /tmp/plmb-037-step5b && nix develop --command dune test --root=. 2>&1
Command running in background with ID: bsx02e4rg. Output is being written to: /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/bsx02e4rg.output
sleep 60 && tail -60 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/bsx02e4rg.output 2>/dev/null
[OK] protocol declarations 4 nested ch... [OK] protocol declarations 5 primitive... [OK] protocol declarations 6 protocol ... [OK] session type duality 0 send/recv... [OK] session type duality 1 select/of... [OK] session type duality 2 involution. [OK] session type duality 3 end self-... [OK] session type duality 4 loop self... [OK] protocol well-formedness 0 unguarded... [OK] protocol well-formedness 1 unguarded... [OK] protocol well-formedness 2 undefined... [OK] protocol well-formedness 3 stream in... [OK] protocol well-formedness 4 duplicate... [OK] protocol well-formedness 5 duplicate... [OK] protocol pretty-print 0 sequentia... [OK] protocol pretty-print 1 choice ro... [OK] protocol loader 0 load_stri... [OK] _format_json/_parse_json 0 format_js... [OK] _format_json/_parse_json 1 parse_jso... [OK] _format_json/_parse_json 2 format_js... [OK] _format_json/_parse_json 3 parse_jso... [OK] _format_json/_parse_json 4 format_js... [OK] _format_json/_parse_json 5 parse_jso... [OK] errors 0 parse error. [OK] errors 1 unknown i... [OK] compaction module 0 parses. [OK] keyword regression 0 keyword f... [OK] keyword regression 1 PinnedEnt... [OK] keyword regression 2 stdlib me... [OK] PLMB-040 product type chain regression 0 product o... [OK] PLMB-040 product type chain regression 1 product i... Full test results in `/private/tmp/plmb-037-step5b/_build/default/test/_build/_tests/parser'. Test Successful in 0.017s. 173 tests run. Testing `session_exec'. This run has ID `S0P12JSK'. [OK] executor 0 single send exec. [OK] executor 1 send/recv exec. [OK] serialisation 0 multi-trigger single send. [OK] serialisation 1 multi-trigger send/recv. [OK] loop 0 send/recv loop exec. [OK] <-> integration 0 full pipeline. [OK] <-> integration 1 bare alias error. [OK] <-> integration 2 multi-step channels. [OK] select 0 select exec. [OK] select 1 select + loop exec. Full test results in `/private/tmp/plmb-037-step5b/_build/default/test/_build/_tests/session_exec'. Test Successful in 0.012s. 10 tests run. Testing `fabric'. This run has ID `HIXKX1VA'. [OK] inline 0 identity. [OK] inline 1 copy + merge. [OK] agent 0 agent identity. [OK] agent 1 agent EOF. Full test results in `/private/tmp/plmb-037-step5b/_build/default/test/_build/_tests/fabric'. Test Successful in 3.649s. 4 tests run.
ps aux | grep "dune test" | grep -v grep; echo "---"; ps aux | grep test_plumb | grep -v grep
ww 6273 0.0 0.1 435619280 11952 ?? S 10:06AM 0:00.43 dune test --root=. ww 6271 0.0 0.0 435308128 1184 ?? Ss 10:06AM 0:00.00 /bin/zsh -c source /Users/ww/.claude/shell-snapshots/snapshot-zsh-1773431411965-vq5jw6.sh && setopt NO_EXTENDED_GLOB 2>/dev/null || true && eval 'cd /tmp/plmb-037-step5b && nix develop --command dune test --root=. 2>&1' \< /dev/null && pwd -P >| /var/folders/s3/yz2d7zf14wq_rf8sm0xg6mkm0000gn/T/claude-f7e1-cwd --- ww 6282 0.0 0.0 435565552 2128 ?? S 10:06AM 0:00.02 ./test_plumb.exe
8.5 minutes and still running. It's hanging again. But the identity test worked fine (15ms). Let me check what test it's stuck on. Let me look at what subprocesses are running under test_plumb:
pgrep -P 6282 2>/dev/null | xargs -I{} ps -p {} -o pid=,command= 2>/dev/nullcat /tmp/plmb-037-step5b/test/fixtures/wiring_telemetry_source.plumb
(* Regression for PLMB-040 follow-up: agent@telemetry as chain source.
Before the fix, the pre-allocated wsp channel was treated as a phantom
process and raised "has no input channel". *)
use js
let smart : !string -> (!string, !json) = agent {
provider: "eliza"
model: "doctor"
prompt: "Answer"
}
let main : !string -> !string = plumb(input, output) {
input ; smart ; output
smart@telemetry ; Js.format ; output
}This is an agent pipeline with telemetry. The test passes empty stdin and expects exit 0. The issue is likely that the agent subprocess doesn't exit because:
1. The agent's data input gets EOF
2. But the telemetry socket has no EOF sender (it's the agent's responsibility to send EOF on telemetry when done)
3. The Js.format morphism never gets EOF on its input
4. The format morphism never exits, so the merge to output never completes
5. The main reader never sees EOF
The telemetry port flow: agent produces telemetry → bridge demux sends on telemetry PUSH socket → format_json's PULL socket → format_json sends to output. The bridge's demux sends EOF on the telemetry socket after the agent exits. But the agent might not exit because it has no input...
Wait, this test passes empty stdin. So:
1. stdin EOF → writer daemon sends EOF on input port
2. Bridge mux gets EOF → sends __eof to agent stdin on input port
3. Agent gets __eof, processes, and exits
4. Bridge demux gets EOF on stdout → sends EOF on all output sockets (data_out, ctrl_out, telemetry)
5. format_json gets EOF → sends EOF on output
6. Reader sees EOF → exits
But there are TWO paths to output: agent → output and agent@telemetry → format → output. The merge morphism for output would use Fiber.both and wait for BOTH inputs to complete. But only ONE of them (agent data_out → output) directly gets EOF. The other path (telemetry → format → output) gets EOF from the telemetry socket → format → output. Both paths should eventually get EOF.
Actually, the merger here might be a simple merge (two inputs, one output). Let me check the routing table for this fixture:
kill 6326 6282 6273 6271 2>/dev/null; sleep 1; rm -f /tmp/plmb-037-step5b/_build/.lock
cd /tmp/plmb-037-step5b && echo '' | nix develop --command sh -c 'PIPELINE_DEBUG=1 ./_build/default/bin/plumb/main.exe test/fixtures/wiring_telemetry_source.plumb' 2>&1 | head -40 & BGPID=$! sleep 5 kill $BGPID 2>/dev/null wait $BGPID 2>/dev/null
warning: Git tree '/private/tmp/plmb-037-step5b' is dirty
{"error":"config_error","category":"config","detail":"cannot find module js (searched test/fixtures and 0 library directories)"}The test is actually failing with a config error (can't find js module) — so it exits with code 2. But the test expects exit 0. The old Exec path must have handled this differently, or the test depends on the search path including stdlib.
Wait, the test is run through run_plumb which calls the binary with just the fixture path. The binary sets up search_path from Plumb.Stdlib_path.search_path(). In the old Exec path this was the same. Let me check if this test was actually passing before on main:
cd /Users/ww/src/plumbing && echo '' | nix develop --command sh -c './_build/default/bin/plumb/main.exe test/fixtures/wiring_telemetry_source.plumb' 2>&1; echo "exit: $?"
warning: Git tree '/Users/ww/src/plumbing' is dirty
{"error":"config_error","category":"config","detail":"cannot find module js (searched test/fixtures and 0 library directories)"}
exit:nix develop --command sh -c 'echo "" | ./_build/default/bin/plumb/main.exe test/fixtures/wiring_telemetry_source.plumb; echo "exit=$?"' 2>/dev/null
nix develop --command sh -c './_build/default/test/test_plumb.exe test "wiring telemetry"' 2>&1 | head -20
warning: Git tree '/Users/ww/src/plumbing' is dirty Invalid request (no tests to run, filter skipped everything)! Testing `plumb'. This run has ID `CO4ULYBB'.
nix develop --command sh -c 'time ./_build/default/test/test_plumb.exe' 2>&1 | tail -20
└──────────────────────────────────────────────────────────────────────────────┘
[exception] Unix.Unix_error(Unix.ENOENT, "create_process", "../bin/plumb/main.exe")
Raised by primitive operation at Unix.create_process_gen.(fun) in file "unix.ml", line 923, characters 15-68
Called from Stdlib__Fun.protect in file "fun.ml", line 34, characters 8-15
Re-raised at Stdlib__Fun.protect in file "fun.ml", line 39, characters 6-52
Called from Unix.create_process in file "unix.ml" (inlined), line 926, characters 2-66
Called from Dune__exe__Test_plumb.run_plumb in file "test/test_plumb.ml", lines 16-18, characters 12-40
Called from Dune__exe__Test_plumb.test_identity_passthrough in file "test/test_plumb.ml", line 50, characters 30-62
Called from Alcotest_engine__Core.Make.protect_test.(fun) in file "src/alcotest-engine/core.ml", line 186, characters 17-23
Called from Alcotest_engine__Monad.Identity.catch in file "src/alcotest-engine/monad.ml", line 24, characters 31-35
Logs saved to `~/src/plumbing/_build/_tests/plumb/identity via plumb.000.output'.
──────────────────────────────────────────────────────────────────────────────
Full test results in `~/src/plumbing/_build/_tests/plumb'.
87 failures! in 0.012s. 87 tests run.
real 0m0.022s
user 0m0.005s
sys 0m0.012sThe test_plumb test on main also doesn't work when run directly (same ENOENT for ../bin/plumb/main.exe). It only works through dune test. And it takes 0.022s (87 tests). So the test_plumb tests on main are instantaneous because the binary path is relative and they all fail with ENOENT.
Wait no — the full test suite passes via dune test. The ENOENT is only when running the binary directly from the wrong directory. Let me test through dune on main:
nix develop --command sh -c 'dune test --root=. -- test/test_plumb.exe' 2>&1 | tail -20
warning: Git tree '/Users/ww/src/plumbing' is dirty Error: "test/test_plumb.exe" does not match any known test.
nix develop --command sh -c 'time dune test --root=.' 2>&1 | tail -20
[OK] stop 0 stop via control. [OK] stop 1 stdin EOF sets stop. [OK] memory 0 get_memory empty history. [OK] memory 1 set_memory then get_memory. [OK] memory 2 set_memory empty clears history. [OK] memory 3 set_memory malformed rejected. [OK] pause 0 pause get set resume flow. [OK] pause 1 resume without pause is no-op. [OK] pause 2 stop while paused. [OK] pause 3 stdin EOF while paused. [OK] drain 0 drain completes protocol. [OK] drain 1 drain exits on stdin EOF. [OK] drain 2 drain handles memory without pause. Full test results in `~/src/plumbing/_build/default/test/_build/_tests/agent-control'. Test Successful in 9.414s. 13 tests run. real 0m14.365s user 0m19.814s sys 0m6.551s
14s on main for the full test suite. Now let me check if test_plumb runs in the worktree when the binary is in the right location. Let me run the tests via dune from the worktree:
cd /tmp/plmb-037-step5b && nix develop --command sh -c 'time dune test --root=.' 2>&1 | tail -30
Command running in background with ID: b797syoxw. Output is being written to: /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b797syoxw.output
sleep 120 && wc -l /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b797syoxw.output 2>/dev/null && tail -40 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b797syoxw.output 2>/dev/null
0 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/b797syoxw.output
ww 9289 0.0 0.0 435308192 4864 ?? S 10:17AM 0:00.20 sh -c time dune test --root=. ww 9287 0.0 0.0 435308208 1184 ?? Ss 10:17AM 0:00.00 /bin/zsh -c source /Users/ww/.claude/shell-snapshots/snapshot-zsh-1773431411965-vq5jw6.sh && setopt NO_EXTENDED_GLOB 2>/dev/null || true && eval 'cd /tmp/plmb-037-step5b && nix develop --command sh -c '"'"'time dune test --root=.'"'"' 2>&1 < /dev/null | tail -30' && pwd -P >| /var/folders/s3/yz2d7zf14wq_rf8sm0xg6mkm0000gn/T/claude-b8fd-cwd ww 9339 0.0 0.0 435578432 4656 ?? S 10:17AM 0:00.08 /private/tmp/plmb-037-step5b/_build/default/bin/agent/main.exe /tmp/nix-shell.WDvAEn/build_f362ee_dune/agentpi_6bee64.plumb ww 9338 0.0 0.0 435582992 6080 ?? S 10:17AM 0:00.08 ../bin/plumb/main.exe fixtures/wiring_telemetry_source.plumb ww 9298 0.0 0.0 435565360 2128 ?? S 10:17AM 0:00.02 ./test_plumb.exe ww 9295 0.0 0.1 435620144 11952 ?? S 10:17AM 0:00.23 dune test --root=.
It's stuck on wiring_telemetry_source.plumb again. The binary is running and has spawned an agent subprocess (plumb-agent). The agent is alive.
The problem: this is an agent-based test and with the fabric, the agent subprocess hangs because it doesn't get properly shut down. Let me trace the issue:
input port__eof envelope to agent stdin on "input" portWait — the merge to output might use Fiber.both (regular merge) instead of first (merge_any). With regular merge, BOTH fibres must complete. But the data_out path and the telemetry → format → output path should both get EOF.
Actually, the test that hangs is wiring_telemetry_source.plumb which has empty input. With empty input, the agent gets EOF immediately and exits. The telemetry path might not produce any output but should still propagate EOF.
Let me check: does the bridge demux send EOF on the telemetry socket when the agent exits? Looking at the bridge code, the demux sends EOF on all output sockets after child stdout EOF:
Option.iter (fun sock ->
(try send_eof sock ~topic:binding.bd_name with _ -> ())
) telemetry_sock;
So yes, it sends EOF on telemetry. The format_json morphism should get this EOF and send EOF on its output. The merge to output should get EOF from both paths and exit.
But wait — let me check the routing table. The wiring smart@telemetry ; Js.format ; output creates a chain. The desugarer creates intermediate channels. Let me look at what process names get generated.
Actually, the issue might be that the agent subprocess is spawned but the bridge's stdin write thread is still blocking because stdin hasn't been closed. Let me look more carefully at what happens when empty input is provided.
With empty input:
- Writer daemon reads EOF from stdin → sends EOF on boundary input
- Bridge mux for data_in gets EOF → writes __eof to agent stdin on "input" port, sends ctrl stop
- Bridge mux_source_done decrements counter
- But: ctrl_in mux_source_done decrements counter (no ctrl socket → immediate)
Wait, the agent has has_control=true because the input type is (!string, !CtrlCmd). So there IS a ctrl_in socket. The routing table will have a ctrl_in route.
Actually no — looking at the wiring:
input ; smart ; output
smart@telemetry ; Js.format ; output
The desugarer creates: spawn smart(input, output) with telemetry. The smart agent has input type (!string, !CtrlCmd), so it expects 4+ channel arguments (input, ctrl_in, output, ctrl_out, telemetry). But the wiring only provides input and output plus smart@telemetry.
Hmm, the control channel is declared as let ctrl : !CtrlCmd = channel. Let me look at the actual main binding in the fixture:
let main : !string -> !string = plumb(input, output) {
input ; smart ; output
smart@telemetry ; Js.format ; output
}
This uses the use js which requires stdlib. But we saw it fails with "cannot find module js". So the binary should exit with error code 2 immediately. Why is it hanging?
Wait — let me check again. Earlier when I ran it manually from the wrong directory, it failed with the js error. But when dune runs it, the working directory is set to the test directory where the fixtures are. And use js might resolve if the stdlib is available.
kill $(pgrep -P 9298) 2>/dev/null; kill 9298 9295 9287 2>/dev/null; sleep 1; rm -f /tmp/plmb-037-step5b/_build/.lock
Let me check how this test runs under the old binary on main — it might also hang but we never noticed because the test suite happens to run other tests first:
nix develop --command sh -c 'cd _build/default/test && echo "" | ../bin/plumb/main.exe fixtures/wiring_telemetry_source.plumb; echo "exit=$?"' 2>/dev/null
On main, it exits with code 2 (error) immediately. Let me try the same with the worktree binary:
cd /tmp/plmb-037-step5b && nix develop --command sh -c 'cd _build/default/test && echo "" | ../bin/plumb/main.exe fixtures/wiring_telemetry_source.plumb 2>/dev/null; echo "exit=$?"'
It exits with code 2 immediately! So the test should pass (it expects exit 0, but the actual exit is 2). Wait, let me re-read the test:
let test_wiring_telemetry_source () =
let _stdout, _stderr, code = run_plumb "wiring_telemetry_source.plumb" "" in
Alcotest.(check int) "exit 0" 0 code
This expects exit 0 but gets exit 2. So the test should FAIL (not hang). But earlier, the process was running with an agent child.
Oh wait — I think the issue is that when run through dune, the stdlib path IS available (dune-site finds the stdlib). So use js resolves, the program loads, and then it actually runs the agent. On main, the old Exec executor handles shutdown properly. On my branch, the fabric-based executor hangs.
Let me check by running with the full dune test context:
cd /tmp/plmb-037-step5b && nix develop --command sh -c 'cd _build/default/test && echo "" | timeout 10 ../bin/plumb/main.exe fixtures/wiring_telemetry_source.plumb 2>&1; echo "exit=$?"'
warning: Git tree '/private/tmp/plmb-037-step5b' is dirty
{"error":"config_error","category":"config","detail":"cannot find module js (searched fixtures and 0 library directories)"}
exit=2Let me investigate the specific hanging test. Let me check which tests pass before the hang:
cd /tmp/plmb-037-step5b && nix develop --command sh -c 'cd _build/default/test && timeout 30 ./test_plumb.exe -v --filter "identity|copy-merge|barrier|project|direct|type|hetero|product|named|filter|map|wiring|tensor|_format|_parse|json algebraic"' 2>&1 | tail -30
warning: Git tree '/private/tmp/plmb-037-step5b' is dirty Usage: [01mtest_plumb.exe[m [[01m--help[m] [[04mCOMMAND[m] … test_plumb.exe: [31munknown[m option [01m--filter[m
cd /tmp/plmb-037-step5b && nix develop --command sh -c 'cd _build/default/test && timeout 60 ./test_plumb.exe test "identity via plumb" -v' 2>&1 | tail -20
[SKIP] tensor 0 tensor + barrier. [SKIP] tensor 1 tensor + barrier multiple. [SKIP] tensor 2 tensor + auto-merge. [SKIP] tensor 3 tensor + named barrier. [SKIP] tensor 4 tensor + 3-way barrier. [SKIP] tensor 5 tensor + 3-way barrier m... [SKIP] eof-control 0 exits cleanly with input. [SKIP] eof-control 1 exits cleanly on empty i... [SKIP] _format_json 0 serialises object. [SKIP] _format_json 1 multiple messages. [SKIP] _format_json 2 empty stdin. [SKIP] _format_json 3 wrong type rejected. [SKIP] _parse_json 0 deserialises string. [SKIP] _parse_json 1 multiple messages. [SKIP] _parse_json 2 empty stdin. [SKIP] _parse_json 3 wrong type rejected. [SKIP] json algebraic laws 0 format ; parse = id. [SKIP] json algebraic laws 1 parse ; format ≅ id. Test Successful in 0.061s. 3 tests run.
cd /tmp/plmb-037-step5b && nix develop --command sh -c 'cd _build/default/test && timeout 60 ./test_plumb.exe test "identity via plumb" "copy-merge" "barrier" "project" "direct structural spawn" "type checking" "heterogeneous merge" "product type checking" "named spawn" "filter" "toplevel filter" "toplevel map" "map" "wiring" "tensor" "_format_json" "_parse_json" "json algebraic laws"' 2>&1 | tail -5
test_plumb.exe: [31mtoo many arguments[m, don't know what to do with [01mbarrier[m,
[01mproject[m, [01mdirect structural spawn[m, [01mtype checking[m,
[01mheterogeneous merge[m, [01mproduct type checking[m, [01mnamed spawn[m,
[01mfilter[m, [01mtoplevel filter[m, [01mtoplevel map[m, [01mmap[m, [01mwiring[m, [01mtensor[m,
[01m_format_json[m, [01m_parse_json[m, [01mjson algebraic laws[mcd /tmp/plmb-037-step5b && nix develop --command sh -c 'cd _build/default/test && timeout 60 ./test_plumb.exe test -e "agent|eof-control"' 2>&1 | tail -10
Raised at Alcotest_engine__Test.check in file "src/alcotest-engine/test.ml", lines 216-226, characters 4-19 Called from Alcotest_engine__Core.Make.protect_test.(fun) in file "src/alcotest-engine/core.ml", line 186, characters 17-23 Called from Alcotest_engine__Monad.Identity.catch in file "src/alcotest-engine/monad.ml", line 24, characters 31-35 Logs saved to `/private/tmp/plmb-037-step5b/_build/default/test/_build/_tests/plumb/eof-control.001.output'. ────────────────────────────────────────────────────────────────────────────── Full test results in `/private/tmp/plmb-037-step5b/_build/default/test/_build/_tests/plumb'. 9 failures! in 0.092s. 11 tests run.
I see — I've been put into plan mode. Let me read the existing plan and update it with the fix for the hanging agent tests.
1→# PLMB-037 Step 5b: `bin/plumb/main.ml` Switchover
2→
3→## Context
4→
5→Steps 0–4 and 5a of PLMB-037 are complete and merged to `main`.
6→The fabric runs inline morphisms and agent subprocesses end-to-end
7→over ZMQ. The plumb binary still uses `Exec.execute` (pipe-based,
8→thread-based). This step switches it to use `Fabric.execute` +
9→`Bridge.make_handler`.
10→
11→The binary goes from being "the executor" to being "the I/O adapter
12→for the fabric" — a thin shell that reads stdin, pushes to ZMQ
13→boundary sockets, pulls from ZMQ output sockets, and writes to
14→stdout.
15→
16→## Files to Modify
17→
18→| File | Change |
19→|------|--------|
20→| `bin/plumb/main.ml` | Rewrite to use Fabric + Bridge instead of Exec |
21→| `bin/plumb/dune` | Add `fabric`, `spawn_lib`, `eio_main`, `eio.unix` deps |
22→| `lib/fabric/bridge.ml` | Expand `make_handler` to accept tool dispatch params |
23→| `lib/fabric/bridge.ml` | Fix: MCP temp file cleanup, fork in `run_in_systhread` |
24→
25→## Review Issues
26→
27→Reviews identified two blockers and several medium issues. All
28→are addressed in this plan.
29→
30→### BLOCKER 1: Signal handling under Eio
31→
32→`Sys.catch_break true` installs an async signal handler that raises
33→`Sys.Break` at arbitrary allocation points. Inside `Eio_main.run`,
34→this can fire in Eio scheduler internals and skip cleanup
35→(`fabric.cancel()`, `Fabric.shutdown`, IPC file cleanup).
36→
37→**Fix:** Remove `Sys.catch_break true`. Instead, set a SIGINT
38→handler before `Eio_main.run` that sets a shared `Atomic.t` flag.
39→The reader loop checks this flag (or catches the exception if it
40→propagates). On SIGINT, call `fabric.cancel()` to trigger orderly
41→shutdown.
42→
43→Concretely: use `Sys.set_signal Sys.sigint (Signal_handle ...)`.
44→The handler sets an atomic flag and calls `fabric.cancel()` (which
45→is safe to call from a signal handler — it just iterates a hashtable
46→and closes ZMQ sockets via C FFI). The reader loop's `T.recv` will
47→fail with `Transport_error` once the socket is closed, causing the
48→switch to exit. `Entry.main` catches `Sys.Break` at line 73 for
49→exit code 130 — we raise `Sys.Break` after shutdown to preserve
50→this.
51→
52→### BLOCKER 2: `run_in_systhread(input_line)` not cancellable
53→
54→The bridge demux runs `Eio_unix.run_in_systhread (fun () ->
55→input_line ic)`. This blocks in a thread pool thread until the
56→child writes to stdout. `fabric.cancel()` closes ZMQ sockets but
57→not child stdout pipes. The switch cannot exit while a
58→`run_in_systhread` is outstanding.
59→
60→**Fix:** The bridge demux already handles this correctly in practice
61→— when the child process exits (from receiving EOF on stdin via the
62→mux path), its stdout pipe closes, and `input_line` raises
63→`End_of_file`. The key ordering: stdin writer sends EOF sentinel →
64→bridge mux writes `__eof` to child stdin → child processes and
65→exits → child stdout closes → demux `input_line` gets EOF →
66→demux sends EOF sentinels on output sockets → demux reaps child.
67→
68→For the SIGINT path: `fabric.cancel()` closes ZMQ sockets. The mux
69→fibres' `T.recv` calls fail → mux fibres exit → `mux_source_done`
70→→ child stdin closes → child gets EOF → child exits → stdout
71→closes → demux unblocks. This chain works but may take a few
72→seconds. If the child hangs, `waitpid_timed` sends SIGTERM after
73→the timeout.
74→
75→No code change needed — the existing ordering is correct. But the
76→plumb binary's shutdown must not call `Fabric.shutdown` inside the
77→switch scope (the `Unix.sleepf 1.5` would block the scheduler).
78→Call it after the switch exits.
79→
80→### Medium issues (fix in this pass)
81→
82→**Fork under Eio (M1):** `spawn_subprocess` calls `Unix.fork()`
83→from an Eio fibre. Wrap the spawn call in `run_in_systhread` in
84→`bridge.ml` to avoid scheduler state corruption.
85→
86→**Tool dispatch thread not joined (M2):** Bridge creates a dispatch
87→thread (`Thread.create`) that is never joined. Track it and join
88→in the demux cleanup block.
89→
90→**MCP temp file leak (M3):** Bridge writes MCP tool schemas to a
91→temp file but only cleans up the spec file. Add cleanup for the
92→MCP file.
93→
94→**Tool dispatch boundary (M4):** The bridge's `tool_dispatch` type
95→takes only `req_fd`, `resp_fd`, `agent_name`. But
96→`Tool_dispatch.tool_dispatch_loop` needs `search_path`, `tool_env`,
97→`on_oitl_request`, `call_pipeline`, `base_dir`, etc. Two options:
98→
99→- **(A) Expand `make_handler`** to accept `~search_path`,
100→ `~tool_env`, `~on_oitl_request`, `~base_dir`. The bridge
101→ constructs the full dispatch loop internally. Cleaner boundary.
102→- **(B) Keep `tool_dispatch_fn` as-is** — the caller captures
103→ everything except fds and agent name in a closure.
104→
105→**Decision: (A).** The bridge already resolves tool bindings and MCP
106→connections per-agent. Adding `search_path`, `tool_env`,
107→`on_oitl_request`, and `base_dir` lets it construct the complete
108→dispatch loop. The `tool_dispatch_fn` parameter is removed.
109→
110→For `call_pipeline`: stub with an error for now. Recursive pipeline
111→invocation through the fabric is a separate concern.
112→
113→## Design
114→
115→### New Flow
116→
117→```
118→Entry.main → load_main → validate → compile routing table
119→ → Eio_main.run → Eio.Switch.run
120→ → Install SIGINT handler (sets flag, calls fabric.cancel)
121→ → Build Bridge.make_handler (prog, type_env, search_path,
122→ tool_env, on_oitl_request, bin_dir, resource_find)
123→ → Fabric.execute ~process_handler
124→ → Fork writer daemon: run_in_systhread(input_line stdin)
125→ → T.send boundary_input → on EOF: send_eof
126→ → Reader (main fibre): T.recv boundary_output
127→ → run_in_systhread(print_endline) → on EOF: break
128→ → fabric.cancel() → switch exits
129→ → Fabric.shutdown (outside switch scope)
130→```
131→
132→### Routing table compilation
133→
134→Before calling `Fabric.execute`, compile the routing table:
135→
136→```ocaml
137→let ports, pb = match main.impl with
138→ | Plumb (ports, pb) -> (ports, pb)
139→ | _ -> Error.raise_ Config "config_error" "main must be plumb"
140→in
141→let type_env = Validate.build_type_env prog.prog_types in
142→let routing_table = Routing.compile ~prog ~ports ~pb
143→ ~type_env ~main_binding:main ~seeded:pb.pb_seeded in
144→```
145→
146→### Bridge construction (expanded API)
147→
148→```ocaml
149→let handler = Fabric.Bridge.make_handler
150→ ~prog ~type_env
151→ ~bin_dir:(Lazy.force Prefix_path.bin_dir)
152→ ~resource_find:Resource_path.find
153→ ~search_path
154→ ~tool_env
155→ ~base_dir:(Filename.dirname spec_path)
156→ ~on_oitl_request
157→ () in
158→```
159→
160→### Boundary I/O fibres
161→
162→```ocaml
163→(* Writer daemon: stdin → boundary input *)
164→Eio.Fiber.fork_daemon ~sw (fun () ->
165→ let push = Fabric.boundary_input fabric input_port in
166→ (try while true do
167→ let line = Eio_unix.run_in_systhread (fun () ->
168→ input_line stdin) in
169→ T.send push ~topic:input_port line
170→ done with End_of_file | Sys_error _ -> ());
171→ Fabric.Bridge.send_eof push ~topic:input_port;
172→ `Stop_daemon);
173→
174→(* Reader: boundary output → stdout (main fibre) *)
175→let pull = Fabric.boundary_output fabric output_port in
176→(try while true do
177→ let (_topic, payload) = T.recv pull in
178→ if Fabric.Bridge.is_eof payload then raise Exit;
179→ Eio_unix.run_in_systhread (fun () ->
180→ print_endline payload)
181→done with
182→| Exit | Transport.Transport_error _ -> ());
183→fabric.cancel ()
184→```
185→
186→Port names from the main binding:
187→- `input_port = List.nth ports 0`
188→- `output_port = List.nth ports n_in`
189→
190→### Signal handling
191→
192→```ocaml
193→(* Before Eio_main.run — fabric ref for signal handler *)
194→let fabric_ref = ref None in
195→Sys.set_signal Sys.sigint (Sys.Signal_handle (fun _ ->
196→ match !fabric_ref with
197→ | Some f -> f.Fabric.cancel ()
198→ | None -> raise Sys.Break));
199→```
200→
201→Inside the switch, set `fabric_ref := Some fabric` after
202→`Fabric.execute`. On SIGINT, the handler calls `fabric.cancel()`,
203→which closes all sockets, causing the reader's `T.recv` to fail
204→with `Transport_error`, breaking the loop. After the switch exits,
205→call `Fabric.shutdown`. If no fabric is running yet, the handler
206→raises `Sys.Break` which `Entry.main` catches for exit code 130.
207→
208→### Shutdown ordering
209→
210→```ocaml
211→Eio_main.run (fun _env ->
212→ let fabric = ref None in
213→ Eio.Switch.run (fun sw ->
214→ let f = Fabric.execute ~sw ~routing_table ~type_env
215→ ~process_handler:handler () in
216→ fabric := Some f;
217→ fabric_ref := Some f;
218→ (* ... boundary I/O fibres ... *)
219→ f.cancel ())); (* reader loop exits here *)
220→(* Outside switch — safe to sleep/block *)
221→match !fabric with
222→| Some f -> Fabric.shutdown f
223→| None -> ()
224→```
225→
226→### `bridge.ml` Changes
227→
228→1. **Expand `make_handler` signature:**
229→ - Add `~search_path`, `~tool_env`, `~base_dir`,
230→ `~on_oitl_request` parameters
231→ - Remove `~tool_dispatch_fn` parameter
232→ - Bridge constructs `tool_dispatch_loop` closure internally
233→
234→2. **Fork safety (M1):** Wrap `spawn_subprocess` call in
235→ `Eio_unix.run_in_systhread`
236→
237→3. **Join dispatch thread (M2):** Store thread handle, join
238→ in demux cleanup
239→
240→4. **MCP file cleanup (M3):** Track and remove MCP temp file
241→
242→### `bin/plumb/dune` Changes
243→
244→```
245→(executable
246→ (name main)
247→ (public_name plumb)
248→ (libraries plumb fabric language runtime spawn_lib
249→ eio_main eio.unix unix threads.posix yojson))
250→```
251→
252→## Verification
253→
254→1. `nix develop --command dune build --root=.` in worktree
255→2. `nix develop --command dune test --root=.` — all tests pass
256→3. Manual: `echo '"hello"' | ./plumb test/fixtures/agent_eliza.plumb`
257→ → ELIZA response
258→4. Manual: `echo '{"text":"hi"}' | ./plumb test/fixtures/identity.plumb`
259→ → echo input (inline-only pipeline)
260→5. Existing `test_plumb.ml` integration tests still pass
261→6. Ctrl-C during a pipeline exits cleanly (no zombies, no stale
262→ `/tmp/plumb-*` directories)
263→7. Agent with tools: `--auto-approve` flag works through bridge
264→Now let me investigate the hanging agent test to understand the root cause before updating the plan.
{
"subagent_type": "Explore",
"prompt": "I need to understand why agent-based pipelines hang under the fabric executor but work under the old Exec executor. The specific hanging test is `wiring_telemetry_source.plumb` which contains an agent with `(!string, !CtrlCmd) -> (!string, !json)` type — product inputs (data + control) and product outputs (data + telemetry).\n\nThe issue: when the bridge creates mux fibres for the agent, it creates a ctrl_in mux fibre that blocks on `T.recv ctrl_in_sock`. If the routing table has a ctrl_in PULL socket for the agent but no process writes to it, that fibre blocks forever and `mux_source_done()` never reaches 0 — the child stdin pipe is never closed from the mux side.\n\nI need to understand:\n\n1. In the routing table compiler (`lib/runtime/routing.ml`), when an agent with control input is spawned via wiring syntax (e.g., `input ; smart ; output`), does the desugarer create a ctrl_in channel? If so, does it have a writer? Look at how the desugarer handles agents with product input types.\n\n2. In the old executor (`lib/plumb/exec.ml`), how does it handle agents with control input when there's no external ctrl_in writer? Does it handle EOF propagation differently?\n\n3. In the bridge (`lib/fabric/bridge.ml`), the `n_sources` calculation is `2 + (if has_tools ...)`. The \"2\" counts data_in and ctrl_in. But if ctrl_in has no upstream writer in the ZMQ fabric, the ctrl_in mux fibre blocks forever. How should this be fixed?\n\nSearch in:\n- `/tmp/plmb-037-step5b/lib/runtime/routing.ml` — routing compiler\n- `/tmp/plmb-037-step5b/lib/plumb/exec.ml` — old executor, especially the agent spawn path\n- `/tmp/plmb-037-step5b/lib/fabric/bridge.ml` — bridge mux/demux\n- `/tmp/plmb-037-step5b/lib/language/desugar.ml` or similar — how wiring desugars agent spawns\n\nBe thorough — this is a correctness issue."
}Async agent launched successfully. agentId: a781af2a256ec871f (internal ID - do not mention to user. Use to resume later if needed.) The agent is working in the background. You will be notified automatically when it completes. Do not duplicate this agent's work — avoid working with the same files or topics it is using. Work on non-overlapping tasks, or briefly tell the user what you launched and end your response. output_file: /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/a781af2a256ec871f.output If asked, you can check progress before completion by using Read or Bash tail on the output file.
I've launched an Explore agent to investigate the root cause of the agent hang. Let me also read the key design documents and check the explore agent's progress.
{
"document_id": "notes/design/plumbing-operational-semantics.md"
}{
"document_id": "notes/design/pipeline-language.md",
"max_chars": 5000
}{"success":true,"content":"# Plumbing: Operational Semantics\n\n**Status:** Draft — 2026-03-02\n**Context:** [Plumbing Language Design](https://leithdocs.com/ldc/documents/notes/design/pipeline-language.md)\n\n## Scope\n\nThe design sketch describes the algebra — what the language means categorically. This document describes how the runtime executes programs: what happens when an agent calls `check`, `call`, or interacts with service processes. The boundary between the two documents is: if it's about types and composition, it's in the design sketch. If it's about channels, file descriptors, process lifecycle, and runtime coordination, it's here.\n\n## Agent-available tools\n\nAn agent interacting with the plumbing runtime has access to these built-in tools:\n\n### check\n\n```\ncheck : Program → Result(TypedProgram, [TypeError])\n```\n\nPure, total, deterministic. Takes a plumbing program as text, parses it, type-checks it, validates budget constraints. Returns either a validated program or a list of type errors. No side effects — no money spent, no processes created. This is the feedback loop: an agent writes a program, calls `check`, reads errors, revises, calls `check` again, until the program is well-typed.\n\n### call\n\n```\ncall : TypedProgram → ???\n```\n\nThis is the open question. `call` takes a checked program and makes it run. But unlike a simple tool (`A → B`), a running pipeline needs channels — it has input and output ports that must be wired somewhere.\n\n**What needs to happen:**\n1. The agent has a checked program with declared boundary ports (input/output types).\n2. The agent calls `call` with the program.\n3. The runtime creates channels, spawns processes, wires the topology.\n4. The agent needs to send data into the pipeline and receive results.\n\n**The tension:** Tools are single-shot (`A → B`). But a called pipeline is a stream processor (`!A → !B`). The agent is itself a process on streams. So calling a pipeline isn't really a tool call — it's asking the runtime to extend the channel topology.\n\n**Options under consideration:**\n\n**Option A: call returns a handle.** `call : TypedProgram → Handle`. The handle gives the agent file descriptors (or channel identifiers) for the pipeline's boundary ports. The agent then reads/writes those channels as part of its own stream processing. This mixes tool-level and stream-level interaction — the tool call sets up topology, then the agent uses streams directly.\n\n**Option B: call is synchronous.** `call : (TypedProgram, Input) → Output`. The agent passes input data along with the program. The runtime spawns the pipeline, feeds the input, collects the output, tears everything down, returns the result. This is a function call — pipeline as a pure computation. Clean semantics but loses streaming. Only works for bounded input/output.\n\n**Option C: call wires to the agent's own ports.** The agent declares that certain ports should be connected to the new pipeline's boundary. The runtime handles the rewiring. This is the most faithful to the \"everything is topology\" principle but requires the agent to reason about its own port structure.\n\n**Option D: call is sugar for inline spawn.** The agent doesn't call a tool — it emits a spawn instruction on a control channel. The runtime interprets it. This treats pipeline construction as a stream protocol, not a tool call. The agent writes `{ \"spawn\": program, \"input\": \"fd3\", \"output\": \"fd4\" }` on its control channel, and the runtime acts on it.\n\n**Decision: Option B first.** Synchronous `call` is the simplest to implement and avoids deadlock. Agents have fixed ports (input, output, thinking, and soon control). Options A and D require the agent to manage runtime channel topology — easy deadlock territory unless session types enforce protocol discipline. The agent could wait on output from a pipeline that's waiting on input from the agent. Without session types, debugging means staring at file descriptor tables.\n\nOption D (tagged messages on output → demux → runtime acts on instructions) is the most elegant long-term path — the agent instructs the runtime how to rewire topology. But this requires practice with the system and session types for safety. Build B, develop experience, add session types, then revisit D.\n\n`call` under Option B:\n\n```\ncall : (TypedProgram, Input) → Output\n```\n\nThe agent passes a checked program and input data. The runtime spawns the pipeline, feeds the input, collects the output, tears everything down, returns the result. Pipeline as function call. Clean semantics, no new channels, no deadlock risk. Only works for bounded input/output — the streaming case (agent feeds data incrementally into a running pipeline) eventually forces beyond B, but that's not the first thing to build.\n\n### Filesystem service\n\n```\nfs : μX. !FsRequest. ?FsResponse. X\n```\n\nSession-typed service process. The agent gets a channel (file descriptor) to the filesystem process. Sends requests (read, write, edit, search, stat, list), receives responses. The session type constrains the protocol: one request, one response, repeat.\n\nThe filesystem process runs with the agent's UID. File access is gated by Unix permissions. The process is long-lived — persists for the duration of the agent's execution.\n\n**Open question:** How does the agent acquire the channel to the filesystem process? Is it pre-wired at agent spawn time (like stdin/stdout)? Does the agent request it via a tool call? Is it declared in the agent's process configuration?\n\n### MCP client\n\n```\nmcp : μX. !McpRequest. ?McpResponse. X\n```\n\nSame session type as the filesystem service, but the backing process speaks MCP over HTTP/SSE to an external server. Security is via tokens/OAuth rather than Unix permissions. The MCP server URL is part of the process configuration.\n\n**Open question:** Same channel acquisition question. One MCP client process per server, or one per agent?\n\n## Process lifecycle\n\n### Spawn\n\nWhen the runtime spawns a process:\n1. Create Unix pipes for all channels in the process's type signature.\n2. Fork/exec the process binary with pipes as file descriptors.\n3. The process reads its spec file (`.plumb`) to know its own type.\n4. Validation runs at channel boundaries — every message checked against the declared type.\n\n### Termination\n\n**Normal termination:** Input channel closes (EOF). Process finishes current work, closes output channel, exits 0.\n\n**Budget exhaustion:** Runtime sends SIGTERM. Process has a grace period to produce final output, then SIGKILL.\n\n**Validation failure:** Process receives invalid input. Logs structured error to stderr, exits 1. For agents: retries up to `max_retries`, producing corrected output.\n\n**Signal channels (future):** Cooperative termination via typed signal channels wired in the topology. \"Finish\" signal: agent wraps up gracefully. \"Kill\" signal: immediate exit.\n\n### Channel close propagation\n\n- **id:** Input closes → output closes.\n- **copy:** Input closes → both outputs close.\n- **merge:** Output closes only when *all* inputs have closed.\n- **discard:** Input closes → process exits.\n- **agent:** Input closes → agent produces final output → output closes.\n- **filter:** Input closes → output closes (some messages were dropped, that's fine).\n\n### Error propagation\n\nStructured errors go to stderr (fd 2), not to the output channel. The runtime collects stderr from all processes. A process failure does not automatically propagate to other processes — the runtime decides what to do (retry, kill pipeline, report to parent).\n\n## What remains\n\n- Resolve the `call` semantics (Options A–D above).\n- Channel acquisition for service processes (fs, MCP).\n- Budget tree mechanics: how budget is allocated, tracked, and enforced at each level.\n- Signal channel operational semantics: how signals propagate through copy, merge, feedback loops.\n- Interaction between `call` and budget: does calling a sub-pipeline deduct from the caller's budget?\n- Session type enforcement: does the runtime check session protocol compliance, or is it advisory?\n"}tail -200 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/a781af2a256ec871f.output 2>/dev/null | head -200
{"success":true,"content":"# Plumbing Language: Design Sketch\n\n**Status:** Draft — 2026-03-03\n**Context:** [Agent Binary](https://leithdocs.com/ldc/documents/notes/design/agent-binary.md), [AgentOS Essentials](https://leithdocs.com/ldc/documents/notes/design/agentos-essentials.md)\n\n## What It Is\n\nA small language for describing agent pipelines. The abstract language — the categorical construction — may get a formal name later (possibly a Greek letter, pending clarification of its categorical structure). The implementation is called **plumbing**. The command is `plumb`. The file extension is `.plumb`.\n\nPrograms in this language are what `plumb` executes. A spawn request is a program. A pipeline definition is a program. An LLM designing its own sub-pipeline writes a program.\n\nHeritage: Plan 9 plumber (message routing), Solaris `ifconfig plumb` (wiring protocol streams through the kernel's STREAMS architecture), now plumbing (wiring agent processes). Same metaphor, same tradition.\n\nThere is one language with two surface forms. The static form (`plumb` declarations) is sugar over the dynamic form (explicit channels and spawns). The static form allows full type checking before execution. The dynamic form is the general case — agents can construct and spawn programs at runtime.\n\n`plumb` is the runtime — the interpreter for this language. It receives programs, type-checks them, validates resource constraints, and executes them by creating channels and spawning processes.\n\n## Motivating Example\n\nWe run a document composition pipeline. A composer drafts from source materials. A corroborator fact-checks the draft against those sources. A critic evaluates the result — but cannot see the source materials, only the draft. The topology has a review loop: if fact-checking fails, the draft goes back to the composer; if the critic says revise, it goes back too.\n\n```\ntype Sources = { topic: string, instructions: string, documents?: [string] }\ntype Draft = { text: string, title: string }\ntype Verdict = { substantiated: bool, explanation: string, sources?: [string] }\ntype CheckedDraft = { draft: Draft, verdict: Verdict }\ntype Review = { score: int, review: string }\ntype ReviewedDraft = { draft: Draft, review: Review }\n\nlet review : !Sources → !Draft = plumb(input, output) {\n let composer : !Sources → !Draft = agent { model: \"claude-sonnet-4-5\" }\n let corroborator : !Draft → !CheckedDraft = agent { model: \"claude-haiku-4-5\" }\n let critic : !CheckedDraft → !ReviewedDraft = agent { model: \"claude-sonnet-4-5\" }\n\n input ; composer ; corroborator\n corroborator ; filter(verdict.substantiated = false).verdict ; composer\n corroborator ; filter(verdict.substantiated = true) ; critic\n critic ; filter(review.score < 85).review ; composer\n critic ; filter(review.score >= 85).draft ; output\n}\n```\n\nThe declaration `let review : !Sources → !Draft = plumb(input, output) { ... }` defines a process. `input` and `output` in the wiring statements are the ports of that process — `input` carries a stream of `Sources` messages, `output` carries a stream of `Draft` messages.\n\nThe composer's input type is a union — it receives the initial sources, fact-check results, and review feedback on the same channel. The review loops feed back through merge at the composer's input. Budget annotations cap spending per agent and for the pipeline as a whole.\n\nThe rest of this document develops the language that describes pipelines like this one.\n\n## Foundations\n\nThe plumbing language describes a monoidal category.\n\n**Objects** are stream types. In the concrete syntax, `!A` denotes a stream of messages each of type A — the `!` is the linear logic \"of course\" modality, a comonad that lifts a bare type into a stream context (Kleisli lifting). A channel typed `!Draft` carries a sequence of Draft messages. A channel typed `!(Sources | Verdict | Review)` carries a sequence of messages, each of which is one of those three types. In the algebra, we write `A` and understand it as the stream type — the `!` is implicit at the categorical level.\n\n**Morphisms** are processes. A process consumes messages of one type and produces messages of another: `A → B`. What happens inside is opaque — it could be an LLM conversation, a pure function, a predicate, or an entire sub-pipeline. The plumbing language doesn't care; it cares about the type signature.\n\nEverything is a process. A pipeline has type `A → B`. A process has type `A → B`. A pipeline *is* a process. Composition doesn't produce something different from what it operates on — it's processes all the way down. When a pipeline appears as a node inside another pipeline, the outer pipeline sees only the type signature. Whether the node is a single LLM conversation or an entire sub-pipeline with its own internal topology is invisible at the boundary. This is closure under composition.\n\nThe category is a **copy-discard (CD) category**, not a Markov category. A CD category (also called a garbage-share categ","truncated":true,"total_bytes":51037,"returned_chars":5000}<persisted-output>
Output too large (461.2KB). Full output saved to: /Users/ww/.claude/projects/-Users-ww-src-plumbing/d438378e-8111-4168-abaf-ad446f02709d/tool-results/bz5opitus.txt
Preview (first 2KB):
{"parentUuid":null,"isSidechain":true,"promptId":"b50d3a68-1912-49ad-aee1-ebf3a254b63c","agentId":"a781af2a256ec871f","type":"user","message":{"role":"user","content":"I need to understand why agent-based pipelines hang under the fabric executor but work under the old Exec executor. The specific hanging test is `wiring_telemetry_source.plumb` which contains an agent with `(!string, !CtrlCmd) -> (!string, !json)` type — product inputs (data + control) and product outputs (data + telemetry).\n\nThe issue: when the bridge creates mux fibres for the agent, it creates a ctrl_in mux fibre that blocks on `T.recv ctrl_in_sock`. If the routing table has a ctrl_in PULL socket for the agent but no process writes to it, that fibre blocks forever and `mux_source_done()` never reaches 0 — the child stdin pipe is never closed from the mux side.\n\nI need to understand:\n\n1. In the routing table compiler (`lib/runtime/routing.ml`), when an agent with control input is spawned via wiring syntax (e.g., `input ; smart ; output`), does the desugarer create a ctrl_in channel? If so, does it have a writer? Look at how the desugarer handles agents with product input types.\n\n2. In the old executor (`lib/plumb/exec.ml`), how does it handle agents with control input when there's no external ctrl_in writer? Does it handle EOF propagation differently?\n\n3. In the bridge (`lib/fabric/bridge.ml`), the `n_sources` calculation is `2 + (if has_tools ...)`. The \"2\" counts data_in and ctrl_in. But if ctrl_in has no upstream writer in the ZMQ fabric, the ctrl_in mux fibre blocks forever. How should this be fixed?\n\nSearch in:\n- `/tmp/plmb-037-step5b/lib/runtime/routing.ml` — routing compiler\n- `/tmp/plmb-037-step5b/lib/plumb/exec.ml` — old executor, especially the agent spawn path\n- `/tmp/plmb-037-step5b/lib/fabric/bridge.ml` — bridge mux/demux\n- `/tmp/plmb-037-step5b/lib/language/desugar.ml` or similar — how wiring desugars agent spawns\n\nBe thorough — this is a correctness issue."},"uuid"
...
</persisted-output>grep -A 200 "## Summary" /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/a781af2a256ec871f.output 2>/dev/null | head -200; echo "==="; tail -100 /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/a781af2a256ec871f.output 2>/dev/null
<persisted-output>
Output too large (490.3KB). Full output saved to: /Users/ww/.claude/projects/-Users-ww-src-plumbing/d438378e-8111-4168-abaf-ad446f02709d/tool-results/b0hydyfxp.txt
Preview (first 2KB):
{"parentUuid":"a2cc997a-8cfd-42dd-807f-0b7978a88fca","isSidechain":true,"agentId":"a781af2a256ec871f","message":{"model":"claude-haiku-4-5-20251001","id":"msg_01H7TDoxUwpJRPQjbAUHbFXA","type":"message","role":"assistant","content":[{"type":"text","text":"Perfect! Now I have all the key pieces. Let me create a comprehensive analysis document:\n\n## Summary\n\nI have traced through the complete pipeline from wiring desugaring through routing table compilation to fabric socket creation. Here's what I found regarding the hanging issue:\n\n### 1. **Desugaring Phase (lib/language/desugar.ml)**\n\n**Lines 960-986** show the critical behavior:\n\nWhen an agent with control input (`Product [Stream data_t; Stream ctrl_t]`) is desugared:\n- If **no explicit `ctrl_in` wiring** appears in the source, the desugarer creates an **implicit, unwired channel** at **line 980**:\n ```ocaml\n let ch = fresh \"wc\" in\n gen_channels := { ch_name = ch; ch_type = ctrl_t } :: !gen_channels;\n ch\n ```\n- This implicit channel is **never written to** (no upstream writer connected in the desugared edges).\n\nThe test `wiring_telemetry_source.plumb` (lines 6-14) demonstrates this exactly:\n```plumb\nlet smart : !string -> (!string, !json) = agent { ... }\nlet main : !string -> !string = plumb(input, output) {\n input ; smart ; output\n smart@telemetry ; Js.format ; output\n}\n```\n\nThe agent `smart` has type `!string -> (!string, !json)`, which means:\n- **Input**: `!string` (data only, no control)\n- **Output**: `(!string, !json)` (product: data + telemetry)\n\nWait, I need to re-read this. Let me check the test again carefully.\n\nActually, the agent signature is `!string -> (!string, !json)` which means:\n- Input is NOT a product (just `!string`)\n- Output IS a product (`(!string, !json)`)\n\nSo `has_control = false`. But the comment says \"agent with control input\". Let me check the comment in the issue more carefully.\n\nThe user says: \"The specific hanging test is `wiring_teleme
...
</persisted-output>{
"file_path": "/Users/ww/.claude/projects/-Users-ww-src-plumbing/d438378e-8111-4168-abaf-ad446f02709d/tool-results/b0hydyfxp.txt",
"limit": 200
}File content (183871 tokens) exceeds maximum allowed tokens (25000). Use offset and limit parameters to read specific portions of the file, or search for specific content instead of reading the whole file.
grep -o '"text":"[^"]*' /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/a781af2a256ec871f.output 2>/dev/null | tail -1 | sed 's/"text":"//' | head -c 10000
Perfect. Now I have enough information. Let me create a comprehensive analysis document.\n\nBased on my thorough analysis of the code, I've identified the root cause of the hanging issue with agent-based pipelines in the fabric executor. Here's my detailed findings:\n\n## Summary\n\nThe issue is a mismatch between how the desugarer/routing compiler handles agent control inputs and how the bridge counts mux sources. When an agent has control input (`Product [Stream data_t; Stream ctrl_t]`) but the wiring doesn't explicitly connect anything to `ctrl_in`, the routing table creates a PULL socket for `ctrl_in` with no upstream writer. The bridge then blocks forever trying to receive from this socket, preventing `mux_remaining` from reaching zero and child stdin from closing.\n\n## Root Causes\n\n### 1. **Desugarer's `patch_ctrl_out` Only Handles Output, Not Input**\n\nLocation: `/private/tmp/plmb-037-step5b/lib/language/desugar.ml` lines 31-55\n\nThe `patch_ctrl_out` function injects implicit `ctrl_out` channels for agents that have control output but don't explicitly wire it:\n\n```ocaml\nlet patch_ctrl_out type_env prog fresh (pb : plumb_body) =\n let patched_spawns = List.map (fun (sp : spawn_stmt) ->\n match List.assoc_opt sp.sp_process binding_map with\n | Some b when (match b.impl with Agent _ -> true | _ -> false) ->\n let has_named = List.exists (function Labeled _ -> true | _ -> false)\n sp.sp_args in\n let has_ctrl_out = List.exists (function\n | Labeled (\
grep -o '"text":"[^"]*' /private/tmp/claude-501/-Users-ww-src-plumbing/0e6b86b5-3d3c-4483-9c52-ce740324d2d9/tasks/a781af2a256ec871f.output 2>/dev/null | tail -1 | sed 's/"text":"//' | sed 's/\\n/\n/g' | tail -200
Perfect. Now I have enough information. Let me create a comprehensive analysis document.
Based on my thorough analysis of the code, I've identified the root cause of the hanging issue with agent-based pipelines in the fabric executor. Here's my detailed findings:
## Summary
The issue is a mismatch between how the desugarer/routing compiler handles agent control inputs and how the bridge counts mux sources. When an agent has control input (`Product [Stream data_t; Stream ctrl_t]`) but the wiring doesn't explicitly connect anything to `ctrl_in`, the routing table creates a PULL socket for `ctrl_in` with no upstream writer. The bridge then blocks forever trying to receive from this socket, preventing `mux_remaining` from reaching zero and child stdin from closing.
## Root Causes
### 1. **Desugarer's `patch_ctrl_out` Only Handles Output, Not Input**
Location: `/private/tmp/plmb-037-step5b/lib/language/desugar.ml` lines 31-55
The `patch_ctrl_out` function injects implicit `ctrl_out` channels for agents that have control output but don't explicitly wire it:
```ocaml
let patch_ctrl_out type_env prog fresh (pb : plumb_body) =
let patched_spawns = List.map (fun (sp : spawn_stmt) ->
match List.assoc_opt sp.sp_process binding_map with
| Some b when (match b.impl with Agent _ -> true | _ -> false) ->
let has_named = List.exists (function Labeled _ -> true | _ -> false)
sp.sp_args in
let has_ctrl_out = List.exists (function
| Labeled (\Good — I have the key finding. Let me also read the relevant bridge code to verify the mux source counting:
{
"file_path": "/tmp/plmb-037-step5b/lib/fabric/bridge.ml",
"offset": 216,
"limit": 40
}216→ ~stdin_fd:child_in_r ~stdout_fd:child_out_w 217→ ~env ()) in 218→ Runtime.Log.info "bridge" 219→ ["event", `String "spawned"; 220→ "agent", `String binding.bd_name; 221→ "pid", `Int pid; 222→ "has_control", `Bool has_control; 223→ "has_telemetry", `Bool has_telemetry; 224→ "has_tools", `Bool has_tools]; 225→ 226→ (* Mux: OCaml Mutex for system-thread safety (R2) *) 227→ let mux_m = Mutex.create () in 228→ 229→ (* Track active mux sources; close child stdin when last exits *) 230→ let n_sources = 2 + (if has_tools && tool_dispatch_fn <> None then 1 else 0) in 231→ let mux_remaining = ref n_sources in 232→ let mux_rem_m = Mutex.create () in 233→ let mux_source_done () = 234→ Mutex.lock mux_rem_m; 235→ decr mux_remaining; 236→ let n = !mux_remaining in 237→ Mutex.unlock mux_rem_m; 238→ if n = 0 then close_out_noerr child_in_oc 239→ in 240→ 241→ (* Find sockets by port name *) 242→ let find_pull port = List.assoc_opt port pull_sockets in 243→ let find_push port = List.assoc_opt port push_sockets in 244→ 245→ (* === Tool dispatch setup === *) 246→ let tool_req_oc = ref None in 247→ let tool_resp_r = ref None in 248→ let dispatch_thread = ref None in 249→ (match tool_dispatch_fn with 250→ | Some dispatch when has_tools -> 251→ let req_r, req_w = Unix.pipe ~cloexec:true () in 252→ let resp_r', resp_w = Unix.pipe ~cloexec:true () in 253→ tool_req_oc := Some (Unix.out_channel_of_descr req_w); 254→ tool_resp_r := Some resp_r'; 255→ dispatch_thread := Some (Thread.create (fun () ->
{
"file_path": "/tmp/plmb-037-step5b/lib/language/desugar.ml",
"offset": 28,
"limit": 60
} 28→
29→(** Inject implicit ctrl_out channels into user spawns for agent bindings
30→ with control that don't explicitly wire ctrl_out. *)
31→let patch_ctrl_out type_env prog fresh (pb : plumb_body) =
32→ let binding_map =
33→ List.map (fun (b : binding) -> (b.bd_name, b)) pb.pb_bindings
34→ @ List.map (fun (b : binding) -> (b.bd_name, b)) prog.prog_bindings
35→ in
36→ let extra_channels = ref [] in
37→ let patched_spawns = List.map (fun (sp : spawn_stmt) ->
38→ match List.assoc_opt sp.sp_process binding_map with
39→ | Some b when (match b.impl with Agent _ -> true | _ -> false) ->
40→ let has_named = List.exists (function Labeled _ -> true | _ -> false)
41→ sp.sp_args in
42→ let has_ctrl_out = List.exists (function
43→ | Labeled ("ctrl_out", _) -> true | _ -> false) sp.sp_args in
44→ (match Validate.resolve type_env b.input with
45→ | Product [Stream _; Stream ctrl_t] when has_named && not has_ctrl_out ->
46→ let ch = fresh "wc" in
47→ extra_channels := { ch_name = ch; ch_type = ctrl_t }
48→ :: !extra_channels;
49→ { sp with sp_args = sp.sp_args @ [Labeled ("ctrl_out", ch)] }
50→ | _ -> sp)
51→ | _ -> sp
52→ ) pb.pb_spawns in
53→ { pb with
54→ pb_channels = pb.pb_channels @ List.rev !extra_channels;
55→ pb_spawns = patched_spawns }
56→
57→let desugar type_env (prog : program) (main : binding) (pb : plumb_body) : plumb_body =
58→ let session_fresh =
59→ let used_names =
60→ (match main.impl with Plumb (ps, _) -> ps | _ -> [])
61→ @ List.map (fun (b : binding) -> b.bd_name) pb.pb_bindings
62→ @ List.map (fun (b : binding) -> b.bd_name) prog.prog_bindings
63→ @ List.map (fun (ch : channel_decl) -> ch.ch_name) pb.pb_channels
64→ in
65→ make_fresh used_names
66→ in
67→ (* Build a lookup for boundary port types so session wire desugaring
68→ can resolve the actual type of ports like ctrl_out. *)
69→ let ports = match main.impl with
70→ | Plumb (ps, _) -> ps | _ -> []
71→ in
72→ (* Decompose input/output types into per-port types.
73→ Input product components → first N ports; output → remaining ports.
74→ Computed once — depends only on main.input/main.output. *)
75→ let input_port_types = Validate.port_element_types type_env main.input in
76→ let output_port_types = Validate.port_element_types type_env main.output in
77→ let n_in = List.length input_port_types in
78→ let find_port_type name =
79→ match List.find_index (fun p -> p = name) ports with
80→ | Some i when i < n_in -> Some (List.nth input_port_types i)
81→ | Some i -> Some (List.nth output_port_types (i - n_in))
82→ | None -> None
83→ in
84→ (* Save session wires before desugaring clears them *)
85→ let session_wires = pb.pb_session_wires in
86→ (* Desugar session wires first — they produce named channels that
87→ wiring statements may reference (e.g. session@send(Pause)). *)cat /tmp/plmb-037-step5b/test/fixtures/agent_control.plumb
(* Agent with control channel — product input *)
type Query = { text: string }
type Answer = { text: string }
type Control = { set_temp: number }
let smart : (!Query, !Control) -> !Answer = agent {
model: "claude-sonnet-4-5-20250514"
prompt: "Answer"
}
let main : !Query -> !Answer = plumb(input, output) {
let ch_ctrl : !Control = channel
spawn smart(input=input, ctrl_in=ch_ctrl, output=output)
}cat /tmp/plmb-037-step5b/test/fixtures/agent_control_telemetry.plumb
(* Agent with both control and telemetry — 5 ports *)
type Query = { text: string }
type Answer = { text: string }
type Control = { set_temp: number }
type Thought = { reasoning: string }
let smart : (!Query, !Control) -> (!Answer, !Thought) = agent {
model: "claude-sonnet-4-5-20250514"
prompt: "Think and answer"
}
let main : !Query -> !Answer = plumb(input, output) {
let ch_ctrl : !Control = channel
let ch_think : !Thought = channel
spawn smart(input=input, ctrl_in=ch_ctrl, output=output, telemetry=ch_think)
}cat /tmp/plmb-037-step5b/test/fixtures/wiring_control.plumb
(* Wiring with control-input agent — desugarer creates implicit control channel *)
type Query = { text: string }
type Answer = { text: string }
type Control = { set_temp: number }
let smart : (!Query, !Control) -> !Answer = agent {
model: "claude-sonnet-4-5-20250514"
prompt: "Answer"
}
let main : !Query -> !Answer = plumb(input, output) {
input ; smart ; output
}Now I see clearly. agent_control.plumb has (!Query, !Control) -> !Answer — product input. The ch_ctrl channel is declared but has no writer. The routing table creates a route for ctrl_in with a PULL socket, but nothing writes to it. The bridge's ctrl_in mux fibre blocks forever.
But wait — wiring_telemetry_source.plumb has !string -> (!string, !json) — no product input. The hang can't be from ctrl_in. Let me check what's happening there. Let me look at how the test actually runs — which specific test is the first to hang:
grep -n "wiring_telemetry_source\|agent_control\|wiring_control\|eof-control\|eliza" /tmp/plmb-037-step5b/test/test_plumb.ml | head -20
308:let test_agent_control_accepted () = 311: let _stdout, _stderr, code = run_plumb "agent_control.plumb" "" in 314:let test_agent_control_mismatch () = 315: let _stdout, _stderr, code = run_plumb "agent_control_mismatch.plumb" "" in 318:let test_agent_control_arity () = 319: let _stdout, _stderr, code = run_plumb "agent_control_arity.plumb" "" in 322:let test_agent_control_telemetry () = 324: let _stdout, _stderr, code = run_plumb "agent_control_telemetry.plumb" "" in 327:let test_wiring_control () = 329: let _stdout, _stderr, code = run_plumb "wiring_control.plumb" "" in 332:let test_wiring_telemetry_source () = 336: let _stdout, _stderr, code = run_plumb "wiring_telemetry_source.plumb" "" in 703: run_plumb_timeout "eliza_session.plumb" "\"hello\"\n" in 716: run_plumb_timeout "eliza_session.plumb" "" in 871: Alcotest.test_case "product input accepted" `Quick test_agent_control_accepted; 872: Alcotest.test_case "control type mismatch" `Quick test_agent_control_mismatch; 873: Alcotest.test_case "control wrong arity" `Quick test_agent_control_arity; 874: Alcotest.test_case "control + telemetry" `Quick test_agent_control_telemetry; 875: Alcotest.test_case "wiring implicit control" `Quick test_wiring_control; 876: Alcotest.test_case "wiring telemetry as chain source" `Quick test_wiring_telemetry_source;
The test order shows agent_control_accepted (line 871) runs before wiring_telemetry_source. agent_control_accepted uses agent_control.plumb which has product input (!Query, !Control) -> !Answer. THIS is the first agent test that would hang.
Let me verify by looking at which test was actually stuck. Earlier the process showed wiring_telemetry_source.plumb — that means the earlier tests (including agent_control_accepted) must have passed or skipped. Let me check: agent_control.plumb expects exit 0 — since the binary now uses the fabric, does this test case even invoke the agent? With empty stdin and an agent that needs API credentials...
Let me check the old exec path for how it handled agents with unconnected ctrl_in:
{
"pattern": "ctrl_write_fds|ctrl.*fd|has_control",
"path": "/tmp/plmb-037-step5b/lib/plumb/exec.ml",
"output_mode": "content",
"head_limit": 30
}66: ctrl_write_fds : Unix.file_descr list;
96: List.iter (fun fd -> send_control fd msg) t.ctrl_write_fds
100: The stop message propagates: ctrl_write_fd → pipe → ctrl mux →
107: output → shutdown never called → ctrl fds never closed. *)
111: ) t.ctrl_write_fds
428: let has_control = decomp_in <> None in
439: (* ctrl_out sits between output and telemetry when has_control *)
440: let telem_offset = if has_control then out_offset + 2 else out_offset + 1 in
680: let has_control = agent_input_decomposition type_env binding.input <> None in
684: (match has_control, has_telemetry with
734: let ctrl_write_fds = ref [] in
856: let has_control = decomp_in <> None in
859: let out_idx = if has_control then 2 else 1 in
980: When has_control, ctrl_in is a declared channel (session
983: my_ctrl_fd: per-agent ctrl write fd, closed by the data
986: let ctrl_mux_src, my_ctrl_fd =
987: if has_control then begin
991: ctrl_write_fds := w :: !ctrl_write_fds;
995: ctrl_write_fds := w :: !ctrl_write_fds;
1002: After data EOF, close the agent's ctrl write-fd. The mux
1015: (* For has_control agents, send explicit stop before closing.
1016: Closing my_ctrl_fd alone won't trigger ctrl EOF because other
1020: For has_control=false agents, closing my_ctrl_fd is the only
1022: Note: my_ctrl_fd is the same fd in ctrl_write_fds. If
1025: if has_control then
1026: (try send_control my_ctrl_fd (`Assoc [("stop", `Bool true)])
1028: (try Unix.close my_ctrl_fd with Unix.Unix_error _ -> ());
1159: if has_control then
1168: if has_control then out_idx + 2 else out_idx + 1 in
1409: ctrl_write_fds := child_result.ctrl_write_fds @ !ctrl_write_fds
[Showing results with pagination = limit: 30, offset: 0]{
"file_path": "/tmp/plmb-037-step5b/lib/plumb/exec.ml",
"offset": 980,
"limit": 60
} 980→ When has_control, ctrl_in is a declared channel (session
981→ protocol writes to it). Otherwise, create ad-hoc pipe.
982→ The executor retains a write-end for stop/interrupt.
983→ my_ctrl_fd: per-agent ctrl write fd, closed by the data
984→ input mux thread after data EOF to guarantee ordering —
985→ must be set up before the data mux thread starts. *)
986→ let ctrl_mux_src, my_ctrl_fd =
987→ if has_control then begin
988→ let r, w_orig = find_channel (List.nth channels 1) in
989→ let w = Unix.dup w_orig in
990→ Unix.set_close_on_exec w;
991→ ctrl_write_fds := w :: !ctrl_write_fds;
992→ (r, w)
993→ end else begin
994→ let r, w = Unix.pipe ~cloexec:true () in
995→ ctrl_write_fds := w :: !ctrl_write_fds;
996→ (r, w)
997→ end
998→ in
999→
1000→ (* === Mux forwarder: data input ===
1001→ Read from the parent's data channel, wrap as input envelopes.
1002→ After data EOF, close the agent's ctrl write-fd. The mux
1003→ mutex guarantees the ctrl_in EOF envelope arrives at the child
1004→ AFTER all data messages, preventing the stop-before-data race
1005→ where release_ctrl's stop signal could preempt unprocessed
1006→ input. *)
1007→ let t = Thread.create (fun () ->
1008→ let ic = Unix.in_channel_of_descr child_stdin in
1009→ (try while true do
1010→ let line = input_line ic in
1011→ mux_envelope "input" line
1012→ done with End_of_file | Sys_error _ -> ());
1013→ close_in_noerr ic;
1014→ (try mux_eof "input" with Sys_error _ -> ());
1015→ (* For has_control agents, send explicit stop before closing.
1016→ Closing my_ctrl_fd alone won't trigger ctrl EOF because other
1017→ processes (compact observer, cool map) hold the ctrl channel's
1018→ original write-fd. The stop propagates through the ctrl mux
1019→ and mux mutex, arriving after the data EOF envelope.
1020→ For has_control=false agents, closing my_ctrl_fd is the only
1021→ write reference — the ctrl mux gets EOF and the agent exits.
1022→ Note: my_ctrl_fd is the same fd in ctrl_write_fds. If
1023→ release_ctrl runs later (in shutdown), it gets EBADF on this
1024→ fd — already swallowed. Double stop is harmless (idempotent). *)
1025→ if has_control then
1026→ (try send_control my_ctrl_fd (`Assoc [("stop", `Bool true)])
1027→ with Unix.Unix_error _ | Sys_error _ -> ());
1028→ (try Unix.close my_ctrl_fd with Unix.Unix_error _ -> ());
1029→ mux_source_done ()
1030→ ) () in
1031→ threads := t :: !threads;
1032→ let t = Thread.create (fun () ->
1033→ let ic = Unix.in_channel_of_descr ctrl_mux_src in
1034→ (try while true do
1035→ let line = input_line ic in
1036→ mux_envelope "ctrl_in" line
1037→ done with End_of_file | Sys_error _ -> ());
1038→ close_in_noerr ic;
1039→ (try mux_eof "ctrl_in" with Sys_error _ -> ());Now I see the key difference. In the old Exec:
has_control=true: ctrl_mux_src is the read-end of the ctrl channel's pipe. Since the channel has a declared write-end (w_orig), the ctrl mux thread reads from it. When my_ctrl_fd is closed (after data EOF), the ctrl mux thread gets EOF.has_control=false: ctrl_mux_src is a fresh pipe. The only write-end is my_ctrl_fd. When data EOF triggers Unix.close my_ctrl_fd, the ctrl mux thread gets EOF immediately.In both cases, after data EOF, the data mux thread closes my_ctrl_fd, which triggers the ctrl mux thread to get EOF. The two mux threads coordinate via the shared pipe.
In the fabric/bridge, the equivalent doesn't happen. The bridge's data_in mux fibre sends ctrl stop directly to the child's stdin pipe (bypassing the ZMQ ctrl_in socket), but the ctrl_in mux fibre is still blocked on T.recv ctrl_in_sock. There's no mechanism to unblock it.
Now I have the full picture. Let me also check how the old Exec handles wiring_telemetry_source.plumb — does it even run the agent or does it fail before that?
cd /Users/ww/src/plumbing && nix develop --command sh -c 'cd _build/default/test && timeout 10 ./test_plumb.exe test "agent control port" -v' 2>&1 | tail -20
[SKIP] _parse_json 2 empty stdin. [SKIP] _parse_json 3 wrong type rejected. [SKIP] json algebraic laws 0 format ; parse = id. [SKIP] json algebraic laws 1 parse ; format ≅ id. ┌──────────────────────────────────────────────────────────────────────────────┐ │ [FAIL] agent control port 5 wiring telemetry as c... │ └──────────────────────────────────────────────────────────────────────────────┘ FAIL exit 0 Expected: `0' Received: `2' Raised at Alcotest_engine__Test.check in file "src/alcotest-engine/test.ml", lines 216-226, characters 4-19 Called from Alcotest_engine__Core.Make.protect_test.(fun) in file "src/alcotest-engine/core.ml", line 186, characters 17-23 Called from Alcotest_engine__Monad.Identity.catch in file "src/alcotest-engine/monad.ml", line 24, characters 31-35 ────────────────────────────────────────────────────────────────────────────── 1 failure! in 0.095s. 6 tests run.
Key finding: On main, wiring_telemetry_source ALREADY fails (exit 2 instead of 0). It's a pre-existing test failure. The other agent tests (control_accepted, control_mismatch, control_arity, control+telemetry, wiring_control) all pass on main in 0.095s.
Now let me check which test specifically hangs in my branch. The first agent tests would be at agent_control_accepted (expects exit 0 with empty stdin). In the old exec, the agent exits quickly because its input pipe closes. In the fabric, it should also work — unless the unconnected ctrl_in channel causes the hang.
Let me now launch the design agents with full context.
{
"subagent_type": "Plan",
"prompt": "## Task\n\nDesign a fix for the bridge mux hang when agent subprocesses have unconnected ctrl_in channels in the ZMQ fabric.\n\n## Context\n\nWe're switching the plumb binary from `Exec.execute` (pipe-based) to `Fabric.execute` (ZMQ-based) with `Bridge.make_handler`. The switch is mostly complete (bin/plumb/main.ml rewritten, inline morphism EOF propagation added, signal handling fixed). But agent-based test cases hang.\n\n## Root Cause\n\nThe bridge in `lib/fabric/bridge.ml` creates mux fibres for data_in and ctrl_in ports. It counts mux sources:\n```ocaml\nlet n_sources = 2 + (if has_tools && tool_dispatch_fn <> None then 1 else 0) in\n```\n\nThe \"2\" assumes both data_in and ctrl_in exist. Each mux fibre calls `mux_source_done()` when it exits, which decrements the counter. When the counter hits 0, child stdin is closed.\n\nThe problem: when `has_control = true`, the routing table creates a ctrl_in PULL socket for the agent. But if the ctrl_in channel has no upstream writer (e.g., `agent_control.plumb` declares `let ch_ctrl : !Control = channel` but nothing writes to it), the ctrl_in PULL socket never receives any data or EOF. The ctrl_in mux fibre blocks on `T.recv ctrl_in_sock` forever, `mux_source_done()` never fires for it, and child stdin never closes.\n\n### How the old Exec solved this\n\nIn `lib/plumb/exec.ml` (lines 986-998), the ctrl_in pipe uses a shared write-end (`my_ctrl_fd`). After data EOF, the data mux thread:\n1. Sends ctrl stop via `send_control my_ctrl_fd`\n2. Closes `my_ctrl_fd`\n\nWhen `has_control=false`, `my_ctrl_fd` is the ONLY write reference to the ctrl pipe, so closing it triggers EOF in the ctrl mux thread. When `has_control=true`, the ctrl channel's declared write-end is shared with other processes, but `my_ctrl_fd` is a `dup`'d copy that the data mux thread closes. However, the ctrl_in channel also has a declared write-end that other processes hold — the ctrl mux thread only gets EOF when ALL write-ends are closed.\n\nThe key: in the old Exec, ctrl_in is a Unix pipe. Multiple processes can hold write-ends. When all write-ends close, the reader gets EOF. The data mux thread handles the lifecycle explicitly.\n\n### Bridge mux mechanism\n\nIn the bridge (`lib/fabric/bridge.ml`), the ctrl_in mux fibre reads from a ZMQ PULL socket. The data_in mux fibre, after receiving EOF, writes ctrl stop directly to the child's stdin pipe (bypassing ZMQ):\n\n```ocaml\nif has_control then\n Eio_unix.run_in_systhread (fun () ->\n try\n let s = \"{\\\"__port\\\":\\\"ctrl_in\\\",\\\"msg\\\":{\\\"stop\\\":true}}\\n\" in\n ...\n```\n\nBut it doesn't send anything on the ZMQ ctrl_in socket. The ctrl_in mux fibre has no way to know that data has ended — it just blocks forever.\n\n## Constraints\n\n- The fix must work for agents with and without control channels\n- The fix must work for agents where ctrl_in HAS upstream writers (future: session protocols write to ctrl channels)\n- The bridge is in `lib/fabric/` which doesn't depend on `lib/plumb/` — keep the dependency boundary clean\n- EOF propagation through ZMQ must follow the fabric's convention (empty payload = EOF sentinel)\n- The existing tests on main must still pass\n\n## Design Documents\n\nFrom operational semantics (plumbing-operational-semantics.md):\n- \"Input closes → output closes\" for agents\n- Control channels are for cooperative termination\n- Data EOF should trigger orderly shutdown\n\nFrom the pipeline language design:\n- The bridge is a natural transformation between ZMQ topic-framed sockets and the agent's stdio protocol\n- The bridge is NOT a morphism — it has no categorical status\n\n## Files\n\n- `lib/fabric/bridge.ml` — bridge mux/demux, the code that needs fixing\n- `lib/fabric/fabric.ml` — fabric execute, socket creation\n- `lib/runtime/routing.ml` — routing table compiler\n- `lib/plumb/exec.ml` — old executor (reference for how it handles ctrl_in)\n\n## What I need\n\nA design for fixing the bridge so that:\n1. When the ctrl_in PULL socket has no upstream writer, the ctrl_in mux fibre doesn't block forever\n2. When the data_in mux fibre receives EOF, the ctrl_in mux fibre is notified to exit\n3. The mux_source_done counter correctly reflects the actual number of active mux sources\n\nConsider:\n- Should `n_sources` be dynamic based on whether sockets actually exist?\n- Should the data_in EOF handler send EOF on the ctrl_in socket?\n- Should the bridge skip the ctrl_in mux fibre when no ctrl_in socket exists?\n- How does the current code at lines 285-313 handle missing sockets? (It calls mux_source_done() if find_pull \"ctrl_in\" returns None)\n"
}This is very informative. The `recv_raw` function uses `Eio_unix.await_readable`, which means it is **Eio-cancellable** — if you cancel the fibre, the `await_readable` call will raise `Eio.Cancel.Cancelled`. This is critical for the design.
Now I have enough information to design the fix. Let me verify one more thing — how fabric.ml passes sockets to the bridge, specifically whether ctrl_in PULL sockets are always created when routes exist.
The picture is clear. Here is the design.
---
## Design: Fix Bridge Mux Hang on Unconnected ctrl_in
### Problem Summary
When an agent has `has_control = true`, the routing table creates a `ctrl_in` route and the fabric creates a PULL socket for it. The bridge starts a ctrl_in mux fibre that blocks on `T.recv ctrl_in_sock`. If the ctrl_in channel has no upstream writer (declared but unconnected), this recv blocks forever. The `mux_source_done()` counter never reaches zero, child stdin never closes, and the pipeline hangs.
### Root Cause Analysis
The bridge counts `n_sources = 2` (data_in + ctrl_in) unconditionally, but the ctrl_in mux fibre can only terminate via:
1. Receiving an EOF sentinel on the ZMQ socket (requires an upstream writer to send one), or
2. A `Transport_error` (e.g. socket closed during shutdown).
Neither happens when ctrl_in has no writer. The data_in EOF handler (lines 274-283) writes a ctrl stop directly to child stdin via the pipe, but does not notify the ctrl_in mux fibre to exit.
### Design: Eio fibre cancellation via Promise
The cleanest fix uses Eio's cooperative cancellation. The data_in mux fibre, upon receiving data EOF, should cancel the ctrl_in mux fibre. This is idiomatic Eio and does not require injecting messages into ZMQ sockets (which would be architecturally questionable — the bridge should not be a writer on its own input sockets).
#### Mechanism
1. **Create an `Eio.Cancel.t` sub-context** (or use `Eio.Fiber.fork_sub` with a sub-switch) for the ctrl_in mux fibre. The data_in mux fibre holds a reference to cancel it.
2. **On data EOF**, after writing the ctrl stop to child stdin, the data_in mux fibre cancels the ctrl_in fibre's context. The `T.recv` call inside the ctrl_in fibre uses `Eio_unix.await_readable`, which is Eio-cancellable — it will raise `Eio.Cancel.Cancelled`, causing the fibre to exit cleanly.
3. **The ctrl_in fibre's cleanup path** calls `mux_source_done()` whether it exits via EOF sentinel, cancellation, or transport error.
However, `Eio.Cancel` contexts are not directly exposed as a simple "cancel this fibre" API. The simpler and more idiomatic approach is:
#### Recommended approach: Eio.Promise as a shutdown signal
Use an `Eio.Promise.t` as a shutdown latch. The ctrl_in mux fibre polls it alongside recv. But `T.recv` is blocking — we cannot easily select between a promise and a recv.
#### Best approach: fork_sub with Eio.Cancel
Actually, looking at `recv_raw` more carefully: it calls `Eio_unix.await_readable fd`, which **is cancellable by Eio's fibre cancellation mechanism**. So the right approach is:
1. Fork the ctrl_in mux fibre inside `Eio.Fiber.fork_sub ~sw:ctrl_sw` where `ctrl_sw` is a sub-switch.
2. When data_in EOF fires, call `Eio.Switch.fail ctrl_sw (Exit)` to cancel all fibres under that switch. The `await_readable` in `T.recv` will raise, the ctrl_in fibre exits, `mux_source_done()` fires.
3. If ctrl_in receives a legitimate EOF sentinel from an upstream writer, it exits normally.
Wait — `Eio.Fiber.fork_sub` creates a nested switch, but the sub-switch's failure propagates to the parent. We need the ctrl_in cancellation to be clean, not an error.
#### Simplest correct approach: Eio.Promise + non-blocking recv loop
Replace the blocking `T.recv` in the ctrl_in mux fibre with a polling loop that checks a shutdown promise:
```ocaml
let ctrl_shutdown, resolve_ctrl_shutdown = Eio.Promise.create () in
```
The ctrl_in mux fibre:
```ocaml
(try
while true do
(* Use Eio.Fiber.first to race between shutdown and recv *)
let result = Eio.Fiber.first
(fun () -> `Shutdown (Eio.Promise.await ctrl_shutdown))
(fun () -> `Msg (T.recv ctrl_in_sock))
in
match result with
| `Shutdown () -> raise Exit
| `Msg (_topic, payload) ->
if is_eof payload then begin ... raise Exit end;
...
done
with Exit | Transport.Transport_error _ -> ());
```
The data_in mux fibre, after data EOF:
```ocaml
Eio.Promise.resolve resolve_ctrl_shutdown ();
```
**This is the cleanest approach** because:
- `Eio.Fiber.first` runs two fibres concurrently and returns the first to complete, cancelling the other. When the promise resolves, `Fiber.first` cancels the `T.recv` call (which is cancellable via `await_readable`).
- No ZMQ protocol changes needed.
- No dependency boundary violations.
- When ctrl_in HAS a legitimate upstream writer that sends EOF, the `T.recv` path wins the race naturally.
- The shutdown promise is only resolved on data EOF — it doesn't interfere with normal ctrl_in operation.
### Detailed Implementation Plan
#### Step 1: Add a ctrl shutdown promise (bridge.ml, ~line 220)
After the `mux_source_done` declaration, add:
```ocaml
let ctrl_shutdown, resolve_ctrl_shutdown = Eio.Promise.create () in
```
#### Step 2: Modify the ctrl_in mux fibre (bridge.ml, lines 289-313)
Replace the `Some ctrl_in_sock` branch:
```ocaml
| Some ctrl_in_sock ->
Eio.Fiber.fork_daemon ~sw (fun () ->
(try
let rec loop () =
let result = Eio.Fiber.first
(fun () -> Eio.Promise.await ctrl_shutdown; `Shutdown)
(fun () -> `Msg (T.recv ctrl_in_sock))
in
match result with
| `Shutdown -> raise Exit
| `Msg (_topic, payload) ->
if is_eof payload then begin
Eio_unix.run_in_systhread (fun () ->
(try write_eof mux_m child_in_oc "ctrl_in"
with Sys_error _ -> ()));
raise Exit
end;
Eio_unix.run_in_systhread (fun () ->
try write_envelope mux_m child_in_oc "ctrl_in" payload
with Sys_error _ -> raise Exit);
loop ()
in
loop ()
with
| Exit -> ()
| Transport.Transport_error (_, _msg) -> ());
Eio_unix.run_in_systhread (fun () -> mux_source_done ());
`Stop_daemon)
```
#### Step 3: Resolve the promise on data EOF (bridge.ml, ~line 274)
After the data_in mux fibre sends the ctrl stop to child stdin (line 282), add:
```ocaml
Eio.Promise.resolve resolve_ctrl_shutdown ();
```
This goes after the ctrl stop write (line 282) and before `mux_source_done()` (line 283). The ordering matters: the ctrl stop must be written to child stdin before the ctrl_in fibre is cancelled, so the agent sees the stop message.
#### Step 4: No change to n_sources
The `n_sources = 2` count remains correct. When `find_pull "ctrl_in"` returns `Some`, we still have 2 mux sources (data_in + ctrl_in). The promise-based cancellation ensures the ctrl_in source terminates promptly after data EOF, rather than blocking forever.
When `find_pull "ctrl_in"` returns `None`, the existing code at line 312-313 already calls `mux_source_done()` immediately, which is correct.
### Edge Cases
1. **ctrl_in has a legitimate upstream writer**: The writer sends messages normally, eventually sends EOF. The `T.recv` path wins the `Fiber.first` race for each message. When the writer sends EOF sentinel, the ctrl_in fibre exits via the normal `is_eof` path. If data EOF also fires, the promise resolution races with the recv — either way the fibre exits cleanly.
2. **Data EOF arrives before any ctrl_in messages**: The promise resolves, `Fiber.first` cancels the `T.recv`, ctrl_in fibre exits. The ctrl stop was already written to child stdin. Correct.
3. **ctrl_in has upstream writer AND data EOF arrives**: The ctrl stop is written to child stdin. The promise resolves. If the ctrl_in fibre is mid-recv, it gets cancelled. The `mux_source_done()` fires. This is correct — the agent already received the stop signal via the pipe.
4. **No data_in socket** (line 285-287): The `mux_source_done()` is called but the promise is never resolved. This is fine — if there's no data, the ctrl_in fibre still blocks, but that is the correct behaviour (ctrl_in is the only input). If ctrl_in also has no writer, both `mux_source_done()` calls fire immediately (both sockets are `None`), and child stdin closes.
5. **Tool dispatch fibre**: Unaffected. It has its own lifecycle via pipe EOF.
### Testing
- The existing agent-based test cases that currently hang should pass after this fix.
- Add a test case for an agent with `has_control = true` but no ctrl_in writer — verify it terminates.
- Add a test case for an agent with `has_control = true` AND a ctrl_in writer — verify ctrl messages are delivered before shutdown.
### Critical Files for Implementation
- `/Users/ww/src/plumbing/lib/fabric/bridge.ml` — The file to modify: add promise, modify ctrl_in fibre, resolve promise on data EOF
- `/Users/ww/src/plumbing/lib/transport/transport_zmq.ml` — Reference: confirms `recv_raw` uses `Eio_unix.await_readable` which is Eio-cancellable, making `Fiber.first` safe
- `/Users/ww/src/plumbing/lib/plumb/exec.ml` — Reference: lines 986-1041 show the old executor's ctrl_in lifecycle pattern (dup'd fd, explicit close, send_control)
- `/Users/ww/src/plumbing/lib/runtime/routing.ml` — Reference: lines 403-434 show how ctrl_in routes are created for agents (only when `n >= 4` channels)
- `/Users/ww/src/plumbing/lib/fabric/fabric.ml` — Reference: lines 195-224 show how sockets are collected and passed to the bridge handleragentId: ab50e73e88ed4ed4e (for resuming to continue this agent's work if needed) <usage>total_tokens: 38064 tool_uses: 11 duration_ms: 96322</usage>
Excellent — the Plan agent produced a thorough design. Now let me send this to the review agents. Let me first write the draft plan, then launch reviews.
# PLMB-037 Step 5b: Fix Bridge Mux Hang + Complete Switchover
## Context
Steps 0–4 and 5a are merged to main. The plumb binary switchover
from `Exec.execute` to `Fabric.execute` + `Bridge.make_handler` is
in progress on branch `plmb-037-step5b` (worktree at
`/tmp/plmb-037-step5b`). The basic switchover is implemented:
- `bin/plumb/main.ml` rewritten to use Fabric
- `bin/plumb/dune` updated with new deps
- `lib/fabric/bridge.ml` updated (M1 fork safety, M2 dispatch
thread join, M3 MCP temp file cleanup, expanded tool_dispatch type)
- `lib/fabric/inline.ml` updated with EOF propagation
- `lib/fabric/fabric.ml` added `cleanup` function (no sleep)
- `lib/plumb/plumb.ml` re-exports `Prefix_path`, `Resource_path`,
`Tool_dispatch`
- Signal handling: `Sys.set_signal` instead of `Sys.catch_break`
**What works:** All inline-only pipeline tests pass (identity, copy,
merge, barrier, filter, map, wiring, tensor, format/parse JSON).
**What hangs:** Agent-based pipeline tests. Root cause identified.
## Root Cause: Bridge ctrl_in Mux Fibre Blocks Forever
When an agent has `has_control = true` (product input type like
`(!Query, !Control) → !Answer`), the routing table creates a
ctrl_in PULL socket. The bridge starts a ctrl_in mux fibre that
calls `T.recv ctrl_in_sock`. If the ctrl_in channel has no upstream
writer (declared but nothing writes to it), recv blocks forever.
`mux_source_done()` never fires for the ctrl_in source, child stdin
never closes, the agent never exits.
### How the old Exec solved this
In `lib/plumb/exec.ml` (lines 986–1028), the ctrl pipe uses a
shared write-end (`my_ctrl_fd`). After data EOF, the data mux thread
closes `my_ctrl_fd`, which triggers EOF in the ctrl mux thread (Unix
pipe semantics: reader gets EOF when all write-ends close). The old
exec explicitly manages this lifecycle.
### Why the bridge doesn't have this
The bridge's data_in mux fibre sends ctrl stop directly to the
child's stdin pipe (bypassing the ZMQ ctrl_in socket). The ctrl_in
mux fibre reads from ZMQ, not a pipe. Nothing sends EOF on the ZMQ
ctrl_in socket, so the fibre blocks forever.
## Fix: Eio.Promise as Shutdown Signal
Use an `Eio.Promise.t` to signal the ctrl_in mux fibre to exit when
data EOF arrives. The ctrl_in mux fibre races `T.recv` against
`Promise.await` using `Eio.Fiber.first`. This is safe because
`T.recv` uses `Eio_unix.await_readable` internally (confirmed in
`lib/transport/transport_zmq.ml`), which is Eio-cancellable.
### Design
From the operational semantics:
> **agent:** Input closes → agent produces final output → output
> closes.
Data EOF is the trigger for agent shutdown. The bridge already sends
ctrl stop to the child after data EOF (R4). The ctrl_in mux fibre
should exit at the same time — it has no more work to do.
The bridge is a natural transformation (not a morphism). Its internal
coordination mechanism (Eio promise) is private — it doesn't affect
the ZMQ protocol or the child's stdio protocol.
### Implementation
**File:** `lib/fabric/bridge.ml`
1. **Add shutdown promise** (after `mux_source_done` declaration):
```ocaml
let ctrl_shutdown, resolve_ctrl_shutdown = Eio.Promise.create () in
```
2. **Modify ctrl_in mux fibre** (lines 289–313): Race `T.recv`
against `Promise.await ctrl_shutdown` using `Eio.Fiber.first`:
```ocaml
| Some ctrl_in_sock ->
Eio.Fiber.fork_daemon ~sw (fun () ->
(try while true do
match Eio.Fiber.first
(fun () -> Eio.Promise.await ctrl_shutdown; `Shutdown)
(fun () -> `Msg (T.recv ctrl_in_sock))
with
| `Shutdown -> raise Exit
| `Msg (_topic, payload) ->
if is_eof payload then begin
Eio_unix.run_in_systhread (fun () ->
(try write_eof mux_m child_in_oc "ctrl_in"
with Sys_error _ -> ()));
raise Exit
end;
Eio_unix.run_in_systhread (fun () ->
try write_envelope mux_m child_in_oc "ctrl_in" payload
with Sys_error _ -> raise Exit)
done with Exit | Transport.Transport_error _ -> ());
Eio_unix.run_in_systhread (fun () -> mux_source_done ());
`Stop_daemon)
```
3. **Resolve promise on data EOF** (after ctrl stop write, ~line
282, before `mux_source_done()`):
```ocaml
Eio.Promise.resolve resolve_ctrl_shutdown ();
```
### Edge Cases
- **ctrl_in has upstream writer**: Normal recv wins the race for
each message. EOF sentinel exits normally. Promise resolution is
redundant but harmless.
- **Data EOF before any ctrl_in messages**: Promise resolves,
`Fiber.first` cancels recv, fibre exits. Correct — agent already
received stop via pipe.
- **No data_in socket**: Promise never resolves. If ctrl_in also
has no socket, both `mux_source_done()` fire immediately (existing
code). If ctrl_in HAS a socket with a writer, the ctrl_in fibre
runs normally.
- **No ctrl_in socket**: Existing code calls `mux_source_done()`
immediately. Promise is created but never used. No change needed.
### No change to n_sources
`n_sources = 2` remains correct. The promise-based cancellation
ensures the ctrl_in source terminates promptly after data EOF.
## Pre-existing Test Failure
`wiring_telemetry_source.plumb` fails on main (exit 2, not 0)
because `use js` can't find the js module. This is a pre-existing
issue — not caused by this switchover. The test expects exit 0 but
gets exit 2. Will investigate separately.
## Files to Modify
| File | Change |
|------|--------|
| `lib/fabric/bridge.ml` | Add Eio.Promise for ctrl_in shutdown |
All other changes from the initial implementation are already done
in the worktree.
## Verification
1. `dune build --root=.` — compiles
2. `dune test --root=.` — all tests pass (expect same pre-existing
failure on `wiring_telemetry_source` as main)
3. Manual: `echo '{"name":"Alice","age":30}' | ./plumb test/fixtures/identity.plumb` → echo
4. Manual: `echo '' | ./plumb test/fixtures/agent_control.plumb` → exit 0 (no hang)
5. Full test suite completes in ≤ 20s (no 1.5s shutdown sleep)
6. Ctrl-C during a pipeline exits cleanly
The file /Users/ww/.claude/plans/abundant-cuddling-oasis.md has been updated successfully.
{
"subagent_type": "theory-critic",
"prompt": "Review this design for algebraic/categorical coherence.\n\n## Context\n\nThe plumbing language describes a monoidal (CD) category. Morphisms are processes. The bridge is explicitly NOT a morphism — it is a natural transformation between two concrete stream representations (ZMQ topic-framed sockets and the agent binary's port-multiplexed stdin/stdout protocol). The bridge module docstring says: \"R7: the bridge is NOT a morphism, it is a natural transformation between two concrete stream representations.\"\n\n## The Fix\n\nWhen an agent has product input `(!A, !Control)`, the agent's ports include `data_in` and `ctrl_in`. The bridge creates mux fibres that read from ZMQ PULL sockets and write to the child's stdin pipe. When the ctrl_in PULL socket has no upstream writer, the ctrl_in mux fibre blocks forever.\n\nThe fix: use an `Eio.Promise` as a shutdown signal. The data_in mux fibre, after receiving data EOF, resolves the promise. The ctrl_in mux fibre races `T.recv` against `Promise.await` using `Eio.Fiber.first`. When the promise resolves, the recv is cancelled and the fibre exits.\n\n## Questions for Review\n\n1. Is it categorically sound for the bridge (as natural transformation) to have this internal coordination? The promise is purely internal — it doesn't affect the ZMQ protocol or the child's stdio protocol.\n\n2. The operational semantics say \"Input closes → agent produces final output → output closes.\" Data EOF IS the input close event. Control is a secondary input. Is it correct to terminate the ctrl_in listener on data EOF? The agent has already received a ctrl stop message via the pipe before the promise fires.\n\n3. The `Product [Stream A; Stream Control]` type on agent input means the agent receives both data and control messages. In the categorical model, this product type has two components — each with independent lifecycle. Does it violate the product semantics to shut down one component (ctrl_in) based on EOF from the other (data_in)?\n\n4. In the operational semantics, channel close propagation for merge says \"Output closes only when ALL inputs have closed.\" But the bridge's mux is not a merge — it's a natural transformation. The bridge's child stdin pipe is the codomain of the natural transformation, not a categorical merge. Is the mux-source-counting scheme (close child stdin when all sources are done) the right structural choice?\n\nPlease critique the design from a categorical/algebraic perspective. Focus on whether the internal coordination mechanism is principled or ad hoc."
}{
"subagent_type": "architect",
"prompt": "Review the bridge mux hang fix for architectural coherence.\n\n## System Context\n\nThe plumbing system has these layers:\n- `lib/language/` — AST, parser, type system\n- `lib/runtime/` — routing table compiler, validation\n- `lib/fabric/` — ZMQ-based execution (inline morphisms, bridge for subprocesses)\n- `lib/plumb/` — old pipe-based executor (being replaced), tool dispatch, stdlib paths\n- `lib/spawn/` — subprocess spawning, spec synthesis\n- `bin/plumb/` — CLI binary (being switched from Exec to Fabric)\n\nKey dependency: fabric depends on spawn_lib, runtime, language, transport. It does NOT depend on plumb. The bridge is in fabric.\n\n## The Bug\n\nThe bridge in `lib/fabric/bridge.ml` creates mux fibres for agent data_in and ctrl_in ports. When ctrl_in has no upstream ZMQ writer, the ctrl_in mux fibre blocks forever on `T.recv`. The `mux_source_done()` counter never reaches 0, child stdin never closes, the agent never exits.\n\n## The Fix\n\nUse an `Eio.Promise.t` as a shutdown signal. The data_in mux fibre resolves the promise after data EOF. The ctrl_in mux fibre races `T.recv` against `Promise.await` using `Eio.Fiber.first`.\n\n## Questions\n\n1. **Boundary concern:** The bridge already sends ctrl stop directly to child stdin after data EOF (bypassing ZMQ). The promise adds another internal coordination mechanism. Is the bridge accumulating too much internal state? (It already has: mux_m mutex, mux_remaining counter, tool_req_oc ref, tool_resp_r ref, dispatch_thread ref, mcp_temp_path ref, plus the new promise.)\n\n2. **Alternative: Fix at the routing layer.** Instead of fixing the bridge, could the routing table compiler ensure that ctrl_in channels always have a writer? E.g., the compiler could emit a special \"null source\" process that immediately sends EOF on unconnected ctrl_in channels. This would make the bridge simpler (no promise needed) but adds complexity to the routing layer.\n\n3. **Alternative: Fix at the fabric layer.** When the fabric detects a PULL socket with no corresponding PUSH socket (no route writes to it), it could skip creating the socket entirely. Then `find_pull \"ctrl_in\"` returns None, and the bridge calls `mux_source_done()` immediately.\n\n4. **Eio.Fiber.first concern:** `Fiber.first` runs two concurrent fibres. When one completes, the other is cancelled. The cancellation of `T.recv` requires the underlying `Eio_unix.await_readable` to handle cancellation gracefully. Is this well-tested in the transport layer?\n\n5. **Lifecycle ordering:** The data_in EOF handler first writes ctrl stop to child stdin, THEN resolves the promise. This means the ctrl_in fibre might be processing a message when it gets cancelled. Is there a race where a ctrl_in message is partially written to child stdin?\n\nPlease evaluate the design against the system's architectural principles and suggest any concerns or alternatives."
}{
"subagent_type": "engineer",
"prompt": "Review the bridge mux hang fix for implementation correctness.\n\n## The Bug\n\nIn `lib/fabric/bridge.ml`, the bridge creates mux fibres for agent subprocess ports. When the ctrl_in PULL socket has no upstream writer (common case: agent with product input where ctrl channel is declared but nothing writes to it), the ctrl_in mux fibre blocks forever on `T.recv ctrl_in_sock`. This prevents `mux_source_done()` from firing, which prevents child stdin from closing, which prevents the agent from exiting.\n\n## Current Code (bridge.ml, lines 289-313)\n\n```ocaml\n(* === Mux: control input (ZMQ PULL → child stdin) === *)\n(match find_pull \"ctrl_in\" with\n | Some ctrl_in_sock ->\n Eio.Fiber.fork_daemon ~sw (fun () ->\n (try\n while true do\n let (_topic, payload) = T.recv ctrl_in_sock in\n if is_eof payload then begin\n Eio_unix.run_in_systhread (fun () ->\n (try write_eof mux_m child_in_oc \"ctrl_in\"\n with Sys_error _ -> ()));\n raise Exit\n end;\n Eio_unix.run_in_systhread (fun () ->\n try write_envelope mux_m child_in_oc \"ctrl_in\" payload\n with Sys_error _ -> raise Exit)\n done\n with\n | Exit -> ()\n | Transport.Transport_error (_, _msg) -> ());\n Eio_unix.run_in_systhread (fun () -> mux_source_done ());\n `Stop_daemon)\n | None ->\n (* No control socket — mux source done immediately *)\n mux_source_done ());\n```\n\n## The Fix\n\nAdd `Eio.Promise.t` for shutdown coordination:\n\n```ocaml\nlet ctrl_shutdown, resolve_ctrl_shutdown = Eio.Promise.create () in\n```\n\nModify ctrl_in mux fibre to race recv against promise:\n\n```ocaml\n| Some ctrl_in_sock ->\n Eio.Fiber.fork_daemon ~sw (fun () ->\n (try while true do\n match Eio.Fiber.first\n (fun () -> Eio.Promise.await ctrl_shutdown; `Shutdown)\n (fun () -> `Msg (T.recv ctrl_in_sock))\n with\n | `Shutdown -> raise Exit\n | `Msg (_topic, payload) ->\n if is_eof payload then begin ... raise Exit end;\n ...\n done with Exit | Transport.Transport_error _ -> ());\n Eio_unix.run_in_systhread (fun () -> mux_source_done ());\n `Stop_daemon)\n```\n\nData_in EOF handler resolves promise (after ctrl stop write):\n\n```ocaml\nEio.Promise.resolve resolve_ctrl_shutdown ();\n```\n\n## Questions\n\n1. Does `Eio.Fiber.first` correctly cancel the `T.recv` call when the promise resolves? `T.recv` calls `recv_raw` which uses `Eio_unix.await_readable`. Is `await_readable` cancelled cleanly by Eio's cancellation mechanism?\n\n2. Race condition: what if the ctrl_in fibre is in the middle of `Eio_unix.run_in_systhread (write_envelope ...)` when the promise fires? `run_in_systhread` is not cancellable once the system thread is running. Does `Fiber.first` wait for the current operation to complete before cancelling?\n\n3. The promise is created unconditionally but only used when `find_pull \"ctrl_in\"` returns `Some`. Is there a cost to creating an unused promise?\n\n4. What happens if `resolve_ctrl_shutdown` is called twice? (E.g., data_in EOF fires, resolves promise, then cancel() closes sockets, causing transport error in data_in fibre which tries to resolve again.) Is `Eio.Promise.resolve` idempotent?\n\n5. Memory: does the promise keep references alive that would otherwise be GC'd?\n\n6. The `Eio.Fiber.first` pattern means each recv call spawns two fibres. For high-throughput ctrl_in (many messages), is this a performance concern? Or is ctrl_in typically low-volume (only stop/pause signals)?\n\nPlease focus on correctness, race conditions, and implementation details."
}Now I understand the full picture. Here is my architectural review.
---
## Architectural Review: Bridge Mux Hang Fix
### The Bug, Precisely
The scenario: an agent's type signature includes a control port (input type is a product `data * ctrl`), so `has_control = true` and the routing compiler emits a `ctrl_in` route. But after channel resolution, the ctrl_in PULL socket has no corresponding PUSH socket connected to it -- nobody writes to that channel. The `T.recv` on that socket blocks forever, `mux_source_done()` never reaches 0, child stdin never closes, the agent hangs.
This is a **structural liveness bug**: the bridge's mux counter protocol assumes every PULL socket will eventually receive an EOF sentinel, but the routing table makes no such guarantee.
### Evaluating the Three Alternatives
**Alternative 1 (the proposed fix): Eio.Promise as shutdown signal in the bridge.**
This is a local, tactical fix. The data_in fibre resolves a promise on EOF; the ctrl_in fibre races `T.recv` against `Promise.await`. It works, but I have concerns:
- **Accumulating internal coordination state.** The bridge already has a mutex, a counter, four refs, and a thread. Adding a promise creates a second shutdown signalling path alongside `mux_source_done()`. Two shutdown mechanisms for what should be one lifecycle event is a design smell. The bridge is becoming a state machine without being modelled as one.
- **Implicit ordering dependency.** Data EOF must happen before ctrl_in can exit. This couples the two mux fibres in a way that isn't expressed in the types or the routing table. If someone later adds a third input port, they need to know about this promise. The coupling is hidden.
- **Race on question 5 is real but benign.** If ctrl_in is mid-write when the promise resolves, `Fiber.first` cancels it. The write is to child stdin via `run_in_systhread` with the mux mutex held. Cancellation of `run_in_systhread` waits for the systhread to finish (Eio semantics), so the write completes atomically. The mutex protects against interleaving with the subsequent ctrl stop write. No partial writes.
- **Question 4 (Fiber.first + T.recv cancellation):** `Fiber.first` cancels the losing fibre by raising `Eio.Cancel.Cancelled` into it. `T.recv` uses `Eio_unix.await_readable`, which is cancellation-aware -- it checks the cancellation context before blocking. This is standard Eio machinery and well-tested. Not a concern.
**Verdict on Alternative 1:** It works. It is the right short-term fix if you need to ship today. But it patches a symptom rather than fixing the structural problem.
**Alternative 2: Routing table compiler emits null sources for unconnected ports.**
This is architecturally cleaner in principle -- the routing layer guarantees that every PULL socket has a writer -- but it violates a boundary I care about. The routing table is a static, declarative wiring specification. It should not contain runtime behaviour ("send EOF then exit"). Introducing synthetic processes that exist solely to satisfy the bridge's internal protocol bleeds the bridge's lifecycle concerns into the compiler. The compiler should not need to know that the bridge counts mux sources.
**Verdict on Alternative 2:** Reject. Wrong layer.
**Alternative 3: Fabric skips socket creation for unconnected ports.**
This is the right fix, and it belongs at the right layer.
The insight: a PULL socket with no PUSH socket connected to it will never receive data. The fabric already knows this -- it creates sockets from routes, and if no route targets `(proc_name, "ctrl_in")`, no PULL socket should exist. Currently, `find_pull "ctrl_in"` returns `None` in this case and the bridge already handles it correctly (line 312: calls `mux_source_done()` immediately).
But the bug scenario is different: a route *does* target ctrl_in (from a channel), but after channel resolution, nobody writes to that channel. The PULL socket exists and is bound, but no PUSH socket connects to it. The fabric could detect this: after Phase 2 (connect), check which PULL sockets have zero connected peers. Strip them from the socket lists passed to the bridge. Then `find_pull "ctrl_in"` returns `None` and the existing code path handles it.
**However**, ZMQ doesn't expose a "connected peer count" API for PULL sockets. So this detection would need to happen at the routing table level: scan routes for PULL endpoints that have no corresponding PUSH route. This is a static analysis of the resolved routing table, which is a legitimate thing for the fabric to do before socket creation.
### My Recommendation
**Ship Alternative 1 now, but model it properly. Then fix at the fabric layer.**
For the immediate fix:
1. Use the promise approach, but reframe the bridge's lifecycle as an explicit state machine rather than a bag of refs. The bridge has three states: Running, Draining (data EOF received, waiting for ctrl/tools to finish), Stopped. The promise is the Running-to-Draining transition. This makes the shutdown protocol legible instead of implicit.
2. `n_sources` should not count ctrl_in if `find_pull "ctrl_in"` is `None`. Currently the code at line 220 hardcodes `n_sources = 2`. It should be `1 + (if has_ctrl_socket then 1 else 0) + (if has_tools ... then 1 else 0)`. This alone might fix the bug without needing the promise at all -- if `n_sources` correctly counts only the mux fibres that are actually started.
Wait. Let me re-read lines 220 and 289-313 together.
Line 220: `n_sources = 2` (always counts data_in and ctrl_in).
Line 289-313: If `find_pull "ctrl_in"` is `None`, it calls `mux_source_done()` immediately, decrementing the counter.
So if ctrl_in has no PULL socket, the counter works fine: it starts at 2, ctrl_in immediately decrements to 1, data_in eventually decrements to 0, stdin closes.
The bug is specifically when `find_pull "ctrl_in"` returns `Some sock` but nobody ever sends to that socket. The PULL socket exists (a route was created) but has no connected writer. That is the gap.
**The real fix** is at the routing table or fabric level: if a resolved route's source endpoint has no PUSH socket (because the source process doesn't exist or the channel was a dead end), don't create the PULL socket. This is a static analysis pass over the resolved routing table before socket creation.
Concretely, in `fabric.ml` Phase 1, the fabric creates both PUSH and PULL sockets for every route. But some routes may have sources that are boundary ports with no external writer. The fabric should filter out routes where the source endpoint will never produce data. This requires knowing which boundary ports are externally driven -- information the caller has but the fabric currently doesn't.
**Practical recommendation:**
1. **Immediate**: Fix `n_sources` to only count mux fibres that are actually started. Then, for the ctrl_in case specifically, if the PULL socket exists but the bridge knows (from `has_control` and the routing table) that no writer will connect, call `mux_source_done()` immediately instead of starting a recv loop. This is two lines of logic, no promise needed.
2. **Follow-up**: Add a routing table validation pass that flags PULL sockets with no connected PUSH source. The fabric can warn or skip socket creation for these. This prevents the entire class of bugs, not just ctrl_in.
### Summary
The promise-based fix works but adds complexity to an already complex module. The simpler fix is: don't start a mux fibre for a socket you know will never receive data. The bridge already has the information it needs (`has_control`, the socket lists from the fabric). Use it.
The deeper issue is that the fabric creates sockets for routes that have dangling sources. That deserves a follow-up ticket, not a bridge-level workaround.agentId: a552dc36f8f7c4b09 (for resuming to continue this agent's work if needed) <usage>total_tokens: 38249 tool_uses: 5 duration_ms: 80978</usage>
## Review: Bridge Internal Coordination for Product Input Shutdown
### Verdict
The proposed fix is **categorically sound**. The internal coordination mechanism is principled, not ad hoc. But I have a question about whether it is necessary at all. Let me take each question in order.
---
### 1. Internal coordination within a natural transformation: sound
A natural transformation `eta : F => G` is a family of morphisms satisfying naturality squares. What happens inside each component is unconstrained by naturality -- naturality only requires that the squares commute. The bridge's internal Promise is coordination *within* the codomain representation (the multiplexed stdio pipe). It does not appear at the categorical level, does not affect the ZMQ protocol, does not affect the child's stdin protocol. It is invisible to the algebra.
This is analogous to how `merge`'s internal mutex is not visible in the algebra. The algebra says `merge : A + B -> A | B`. The implementation uses a mutex to serialise writes. The mutex is not a morphism. Same principle here.
---
### 2. Terminating ctrl_in on data EOF: correct
The operational semantics are explicit: "Input closes -> agent produces final output -> output closes." Data EOF IS the input close event. The agent enters a **drain phase** (documented in `executor.md`) where it continues handling control commands until stop arrives or ctrl_in closes. The current bridge code already sends `{"stop": true}` on the child's ctrl_in port after data EOF (the R4 annotation at line 274-282 of `bridge.ml`). This is not just killing control -- it is *completing the control protocol* before shutdown.
The lifecycle asymmetry is deliberate and well-documented. Data drives the agent's lifecycle; control is subordinate. Treating them symmetrically would be wrong: if ctrl_in EOF killed the agent while data was still flowing, you would lose data.
---
### 3. Asymmetric product component shutdown: does not violate product semantics
`Product [Stream A; Stream Control]` is `!A x !Control`. The product structure governs the *interface* -- two input ports. It does not prescribe lifecycle policy. The two components are independent objects categorically, but nothing in the product structure *forbids* one component's lifecycle being tied to another. That is an operational concern, settled by the process lifecycle semantics, not by the product type.
The key point: `(!A, !Control)` is not a symmetric product operationally, even though it is categorically. This is fine. The category tells you *what* the ports are. The operational semantics tell you *how they behave at shutdown*. These are different layers.
---
### 4. Mux-source-counting for child stdin close: correct structural choice
The bridge's mux is not a categorical merge. It is a transport-level resource management scheme. The child process needs its stdin to close so it receives EOF and can exit. When is it safe to close? When all writers are done. This is Unix pipe semantics made explicit.
The source-counting scheme is the right choice because:
- Closing child stdin on data EOF alone would kill the tool dispatch channel and control channel prematurely
- The child may still be processing tool responses or drain-phase control messages
- The scheme matches merge's close semantics ("output closes only when ALL inputs have closed") at the transport level, even though it is not a morphism
---
### The critical question: is the Promise fix necessary?
Before adding the Promise, I would verify two things.
**First**, looking at `bridge.ml` lines 290-313, the ctrl_in mux fibre is only created when `find_pull "ctrl_in"` returns `Some sock`. If the PULL socket has no upstream writer, does the socket exist at all? If the fabric does not create it, `find_pull` returns `None`, `mux_source_done()` is called immediately (line 313), and there is no blocking. The Promise would only be needed if the socket *exists* but has no upstream writer, which would be a fabric wiring issue.
**Second**, the ctrl_in mux fibre is launched with `Eio.Fiber.fork_daemon`. Daemon fibres are cancelled when their switch is released. If the switch is released when the child exits and the demux fibre completes, the blocked daemon fibre should be cancelled automatically by Eio. Verify whether the switch lifetime already handles this.
If daemon cancellation already cleans this up, no fix is needed. If the switch lifetime extends beyond the child's lifetime (e.g., it is the fabric's top-level switch), then the Promise is the right fix. But even then, consider whether narrowing the switch scope is cleaner than adding an inter-fibre coordination mechanism.
---
### Summary
| Question | Answer |
|----------|--------|
| Internal coordination in natural transformation | Sound. Implementation detail, invisible to algebra. |
| Terminate ctrl_in on data EOF | Correct. Matches documented lifecycle semantics. |
| Asymmetric product component shutdown | Does not violate product semantics. Asymmetry is operational, deliberate. |
| Mux-source-counting for stdin close | Correct. Transport-level merge semantics, not categorical merge. |
| Promise-based fix | Principled if needed, but verify daemon fibre cancellation and socket existence first. |
The design is clean. The bridge correctly separates categorical concerns (the natural transformation between stream representations) from operational concerns (lifecycle, shutdown ordering, resource management). The proposed Promise does not pollute the algebra. My only pushback is pragmatic: make sure the problem actually manifests before adding the mechanism.agentId: aecd672b74b1b2983 (for resuming to continue this agent's work if needed) <usage>total_tokens: 54021 tool_uses: 11 duration_ms: 118264</usage>
I have enough information to answer all the questions thoroughly. The key facts from reading the source:
1. `T.recv` / `recv_raw` uses `Eio_unix.await_readable fd` — pure Eio, cancellable
2. `T.recv` does NOT use `run_in_systhread` — it's entirely in Eio fibre context
3. ZMQ sockets have a 1-second receive timeout set (`set_receive_timeout s 1000`)
4. `mux_source_done` uses a bare OCaml `Mutex` (not Eio-safe from fibres)
5. The ctrl_in fibre calls `Eio_unix.run_in_systhread` for writes
Now I can give a complete, precise review.
---
Here is the review, question by question.
## Q1: Does `Fiber.first` correctly cancel `T.recv`?
**Yes, with one subtlety.**
`recv_raw` is a pure Eio loop. Its blocking point is `Eio_unix.await_readable fd` — a standard Eio suspension that the scheduler cancels by raising `Eio.Cancel.Cancelled` into the fibre. When `Fiber.first` picks the winning branch it cancels the other by injecting that exception. `await_readable` is a scheduled sleep on an Eio effect, so it unblocks cleanly.
The subtlety: the loop in `recv_raw` polls `Zmq.Socket.events s` before waiting. If there happens to be data in the ZMQ receive buffer when the promise fires, the fibre will execute `Zmq.Socket.recv ~block:false s` synchronously (no yield point) before cancellation has a chance to land. This means `Fiber.first` may return `\`Msg` instead of `\`Shutdown` even when the promise has already been resolved, if a genuine message arrives at the same moment. That is correct behaviour — a real ctrl message should be forwarded — but you must be aware the race is decided by whoever gets scheduled first, not by promise resolution order.
**There is no incorrectness here, only a scheduling non-determinism that is benign.**
## Q2: What happens during `run_in_systhread` when the promise fires?
**This is the most important correctness question, and the fix has a real gap here.**
`Eio_unix.run_in_systhread f` submits `f` to a thread pool and suspends the Eio fibre until `f` completes. The Eio documentation states that `run_in_systhread` is not cancellable once the system thread has started — cancellation is only possible while queued. Once the systhread is executing `write_envelope`, the fibre is suspended at the `run_in_systhread` call and `Fiber.first` cannot interrupt it.
Here is the sequence that matters:
```
ctrl_in fibre: Fiber.first returns `Msg (...)
ctrl_in fibre: calls run_in_systhread(write_envelope ...) ← fibre suspended
data_in fibre: EOF received, resolves promise
Fiber.first in ctrl_in fibre: cannot fire yet — fibre is suspended in systhread
systhread: write_envelope completes
ctrl_in fibre: resumes, loops back to Fiber.first
Fiber.first: now sees promise resolved → returns `Shutdown → raises Exit
```
This is **correct**. The message in flight gets written (no partial write, no corruption), and the fibre then exits on the next loop iteration. `Fiber.first` is only evaluated at the top of `while true`, so the systhread-based write completes atomically from the loop's perspective.
However, there is a related hazard: the `Sys_error _ -> raise Exit` inside `run_in_systhread`. If the pipe is already closed (e.g., child exited), `write_envelope` raises `Sys_error`, which is raised *inside the systhread* and then re-raised in the Eio fibre when `run_in_systhread` completes. That exits the loop, skips `mux_source_done`, and reaches the code after the `try...with` block where `mux_source_done` is called. That path is correct — `mux_source_done` fires regardless.
**No correctness issue here. The design is sound.**
## Q3: Cost of creating an unused promise
**Negligible. Not worth further thought.**
`Eio.Promise.create ()` allocates two small heap objects — a state cell and a resolver. If `find_pull "ctrl_in"` returns `None`, neither is ever used and both are collected at the next minor GC. There is no thread, socket, or kernel resource involved. The `None` branch calls `mux_source_done ()` immediately and ignores the promise. This is fine.
## Q4: Is `Eio.Promise.resolve` idempotent?
**No. Resolving an already-resolved Eio promise raises `Invalid_argument`.**
This is the one **genuine bug** in the proposed fix.
Eio promises are single-assignment. `Promise.resolve` raises `Invalid_argument "Promise already resolved"` if called twice. The scenario you described is real:
1. `data_in` EOF fires → calls `Eio.Promise.resolve resolve_ctrl_shutdown ()`
2. Some path (transport error, switch cancellation) triggers another code path that calls `resolve_ctrl_shutdown ()` again
The fix must guard the resolve. The idiomatic approach is one of:
**Option A — try/ignore:**
```ocaml
(try Eio.Promise.resolve resolve_ctrl_shutdown ()
with Invalid_argument _ -> ())
```
This is simple but swallows all `Invalid_argument`, which is too broad.
**Option B — atomic flag:**
```ocaml
let ctrl_shutdown_resolved = Atomic.make false in
let resolve_ctrl_shutdown_once () =
if Atomic.compare_and_set ctrl_shutdown_resolved false true then
Eio.Promise.resolve resolve_ctrl_shutdown ()
in
```
This is explicit, thread-safe (since `data_in` runs in an Eio fibre and `resolve` itself is not concurrent), and self-documenting. Use this.
**Option C — use `Promise.resolve_exn` (does not exist):**
There is no `resolve_opt` or `try_resolve` in Eio's public API as of 0.15/1.0. You must guard the call yourself.
Note: looking at the current code again, there is only one call site for `resolve_ctrl_shutdown` in the proposed fix (the `data_in` EOF handler). If that is the only caller and it is only reachable once (because `data_in` exits after EOF), there may be no double-resolve in practice. But this depends on the transport error path in `data_in`. If `Transport.Transport_error` is caught and the same code path runs, or if the switch cancellation causes `data_in` to fire its cleanup twice, you have a double-resolve. Guard it unconditionally.
## Q5: Memory / GC references
**Not a concern.**
The promise holds a reference to the callback list (fibres waiting on `await`), which in this case is zero or one fibre (the ctrl_in fibre). Once the promise resolves, the waiters are woken and the reference is dropped. The promise itself is referenced by the ctrl_in closure (through `ctrl_shutdown`) and by the `data_in` closure (through `resolve_ctrl_shutdown`). Both closures are owned by their fork_daemon fibres. When both fibres exit, the closures are collected. The promise does not pin anything that wouldn't already be live.
## Q6: `Fiber.first` overhead per recv call
**Not a concern for this use case, but worth understanding.**
`Fiber.first f g` runs both `f` and `g` concurrently in a local switch. The winning branch's result is returned; the losing branch is cancelled. The overhead is: two fibre context allocations, two effect dispatches (one per branch start), and one cancellation. On OCaml 5 with effects this is cheap — microseconds, not milliseconds.
For the ctrl_in channel, which carries stop/pause signals at human-event frequency (maybe tens of messages per session, not per second), this overhead is completely irrelevant.
If you were calling `Fiber.first` on a high-throughput data path (thousands of messages per second), the overhead would be measurable but probably still acceptable compared to ZMQ and pipe I/O. On the ctrl_in channel: not worth a second thought.
---
## Summary of findings
| # | Status | Finding |
|---|--------|---------|
| 1 | Correct | `await_readable` is cancellable by `Fiber.first`; race on buffered data is benign |
| 2 | Correct | `run_in_systhread` atomicity is not a problem; write completes before cancellation |
| 3 | Correct | Unused promise is trivially cheap |
| **4** | **Bug** | `Promise.resolve` is **not idempotent** — double-resolve raises `Invalid_argument` |
| 5 | Correct | No GC pinning issues |
| 6 | Correct | `Fiber.first` overhead is negligible on ctrl_in |
**The fix is sound except for Q4.** Before shipping, wrap `resolve_ctrl_shutdown ()` in an `Atomic.compare_and_set` guard (or equivalent) so double-resolution cannot raise. Everything else in the proposed approach is correct.agentId: a063824488d36b2ec (for resuming to continue this agent's work if needed) <usage>total_tokens: 35583 tool_uses: 27 duration_ms: 336000</usage>
Excellent reviews. All three converge on key findings. Let me synthesise and update the plan.
Theory critic: Sound. The promise is internal to the natural transformation, invisible to the algebra. Lifecycle asymmetry (data drives shutdown, control is subordinate) is correct per operational semantics. Raises question: does daemon fibre cancellation already handle this?
Architect: The promise works but is a symptom-patch. Simpler fix: don't start a mux fibre for a socket that will never receive data. The bridge already has the information (find_pull returning Some vs None). The deeper issue: the fabric creates PULL sockets for routes with dangling sources. Recommends: fix n_sources to count only started fibres, and skip the recv loop when the socket has no writer.
Engineer: The Fiber.first + T.recv cancellation is correct. run_in_systhread atomicity is fine. One bug: Eio.Promise.resolve is NOT idempotent — double-resolve raises Invalid_argument. Must guard with Atomic.compare_and_set.
The architect's observation is the most incisive: the simpler fix is to not start the fibre at all when no writer exists. But detecting "no writer" requires knowing whether any PUSH socket connects to the PULL socket — information the fabric has but the bridge doesn't directly.
Let me check whether the fabric can provide this information:
{
"pattern": "proc_pull|pull_sockets.*filter_map",
"path": "/tmp/plmb-037-step5b/lib/fabric/fabric.ml",
"output_mode": "content",
"-A": 5
}211: let proc_pull = List.filter_map (fun (r : Runtime.Routing.route) -> 212- if r.dst = pe.proc_name then 213- let ep = endpoint ~pipeline_id r.dst r.dst_port in 214- match Hashtbl.find_opt pull_sockets ep with 215- | Some s -> Some (r.dst_port, s) 216- | None -> None -- 220: ~push_sockets:proc_push ~pull_sockets:proc_pull 221- | None -> 222- Runtime.Log.info "fabric" 223- ["event", `String "subprocess_deferred"; 224- "name", `String pe.proc_name]) 225- | Runtime.Routing.Inline ->
The fabric collects proc_pull by filtering routes where r.dst = pe.proc_name. A PULL socket for ctrl_in only appears if a route exists with dst = proc_name and dst_port = "ctrl_in". But does such a route have a corresponding PUSH socket (a writer)?
Let me check: when agent_control.plumb declares let ch_ctrl : !Control = channel but nothing writes to it, does the routing table have a route to ctrl_in?
{
"file_path": "/tmp/plmb-037-step5b/lib/runtime/routing.ml",
"offset": 403,
"limit": 35
}403→ | Agent _ -> 404→ (* Variable channels: data_in, [ctrl_in], data_out, [ctrl_out], [telemetry] *) 405→ let n = List.length channels in 406→ (* Minimum: input, output (2 channels) 407→ With control: input, ctrl_in, output, ctrl_out (4) 408→ With telemetry: +1 *) 409→ if n >= 2 then begin 410→ let ch_in = List.nth channels 0 in 411→ add_route ~src:ch_in ~src_port:"out" ~dst:proc_name ~dst_port:"data_in" 412→ ~channel_name:ch_in; 413→ (* The output channel is at index 1 (no ctrl) or 2 (with ctrl) *) 414→ let out_idx = if n >= 4 then 2 else 1 in 415→ let ch_out = List.nth channels out_idx in 416→ add_route ~src:proc_name ~src_port:"data_out" ~dst:ch_out ~dst_port:"in" 417→ ~channel_name:ch_out; 418→ (* Control channels if present *) 419→ if n >= 4 then begin 420→ let ch_ctrl_in = List.nth channels 1 in 421→ let ch_ctrl_out = List.nth channels 3 in 422→ add_route ~src:ch_ctrl_in ~src_port:"out" ~dst:proc_name ~dst_port:"ctrl_in" 423→ ~channel_name:ch_ctrl_in; 424→ add_route ~src:proc_name ~src_port:"ctrl_out" ~dst:ch_ctrl_out ~dst_port:"in" 425→ ~channel_name:ch_ctrl_out 426→ end; 427→ (* Telemetry if present *) 428→ let telem_idx = if n >= 4 then 4 else 2 in 429→ if telem_idx < n then begin 430→ let ch_telem = List.nth channels telem_idx in 431→ add_route ~src:proc_name ~src_port:"telemetry" ~dst:ch_telem ~dst_port:"in" 432→ ~channel_name:ch_telem 433→ end 434→ end 435→ 436→ | Plumb (plumb_ports, _) -> 437→ (* N port channels, matching the plumb body's port list *)
So when n >= 4 (agent has control ports), a route is created: ch_ctrl_in:out → proc_name:ctrl_in. The source is the ctrl_in channel. After channel resolution (resolve_channels), this becomes a direct route from whatever writes to ch_ctrl_in to the agent's ctrl_in port.
If nothing writes to ch_ctrl_in, the channel has readers (the agent) but no writers. Looking at resolve_channels (line 117-205):
{
"file_path": "/tmp/plmb-037-step5b/lib/runtime/routing.ml",
"offset": 155,
"limit": 25
}155→ ) routes; 156→ 157→ (* Cross-join writers and readers for each channel *) 158→ let resolved = ref (List.rev !direct) in 159→ List.iter (fun ch_name -> 160→ let ch_writers = try Hashtbl.find writers ch_name with Not_found -> [] in 161→ let ch_readers = try Hashtbl.find readers ch_name with Not_found -> [] in 162→ let n_writers = List.length ch_writers in 163→ let n_readers = List.length ch_readers in 164→ 165→ match n_writers, n_readers with 166→ | 0, 0 -> () (* unused channel *) 167→ | _, 0 -> 168→ (* Writers with no readers — telemetry sinks. Re-emit unchanged. *) 169→ List.iter (fun w -> resolved := w :: !resolved) ch_writers 170→ | 0, _ -> 171→ (* Readers with no writers — likely compiler bug. *) 172→ Log.warn "routing" 173→ ["event", `String "channel_no_writers"; 174→ "channel", `String ch_name] 175→ | _ when n_writers > 1 && n_readers > 1 -> 176→ failwith (Printf.sprintf 177→ "resolve_channels: channel %s has %d writers and %d readers \ 178→ — no categorical meaning" ch_name n_writers n_readers) 179→ | _ ->
Line 170-174: When a channel has readers but no writers (0, _), the routing compiler logs a warning and drops the route entirely — it's not added to resolved. So the resolved routing table should NOT contain a route to ctrl_in when ch_ctrl_in has no writer.
This means find_pull "ctrl_in" should return None for unconnected ctrl_in channels. The bridge's existing code (line 312-313) would call mux_source_done() immediately.
So the bug might be subtler than expected. Let me check whether the ctrl_in channel actually has a writer in the test case. The agent_control.plumb fixture declares let ch_ctrl : !Control = channel and spawns smart(input=input, ctrl_in=ch_ctrl, output=output). The route ch_ctrl:out → smart0:ctrl_in has ch_ctrl as source. But nothing writes to ch_ctrl. So ch_ctrl has readers (smart0) but no writers. The route should be dropped.
Unless... ch_ctrl is both read and written by the same agent? No — the agent reads from ctrl_in, it doesn't write to it.
Let me verify by actually running the routing table on this fixture:
cd /tmp/plmb-037-step5b && nix develop --command sh -c 'echo "" | PIPELINE_DEBUG=1 ./_build/default/bin/plumb/main.exe test/fixtures/agent_control.plumb 2>&1 | head -20'
warning: Git tree '/private/tmp/plmb-037-step5b' is dirty
{"log":"warn","event":"routing","event":"channel_no_writers","channel":"ch_ctrl"}
{"error":"internal","detail":"Failure(\"resolve_channels: channel __wc0 still appears as route dst\")"}There's the crash. The routing compiler logs the channel_no_writers warning for ch_ctrl, drops the route, but then __wc0 (the implicit ctrl_out channel from patch_ctrl_out) still appears as a route dst — a dangling reference after resolution.
So this test case crashes (exit 1) rather than hanging. Let me check if there are other agent tests that actually hang. Let me test wiring_control.plumb:
cd /tmp/plmb-037-step5b && nix develop --command sh -c 'echo "" | PIPELINE_DEBUG=1 ./_build/default/bin/plumb/main.exe test/fixtures/wiring_control.plumb 2>&1 | head -20'
warning: Git tree '/private/tmp/plmb-037-step5b' is dirty
{"log":"warn","event":"routing","event":"channel_no_writers","channel":"__wc0"}
{"error":"internal","detail":"Failure(\"resolve_channels: channel __wc1 still appears as route dst\")"}Same crash — resolution failure. Let me test the eliza_session.plumb which is the eof-control test:
cd /tmp/plmb-037-step5b && nix develop --command sh -c 'echo "\"hello\"" | PIPELINE_DEBUG=1 timeout 5 ./_build/default/bin/plumb/main.exe test/fixtures/eliza_session.plumb 2>&1 | head -30'
warning: Git tree '/private/tmp/plmb-037-step5b' is dirty
{"log":"warn","event":"routing","event":"channel_no_writers","channel":"ctrl"}
{"error":"internal","detail":"Failure(\"resolve_channels: channel __wc0 still appears as route dst\")"}So ALL agent-with-control tests crash with a resolution error. Let me now test wiring_telemetry_source.plumb (which has no control, only telemetry):
cd /tmp/plmb-037-step5b && nix develop --command sh -c 'echo "" | PIPELINE_DEBUG=1 timeout 5 ./_build/default/bin/plumb/main.exe test/fixtures/wiring_telemetry_source.plumb 2>&1 | head -30'
warning: Git tree '/private/tmp/plmb-037-step5b' is dirty
{"error":"config_error","category":"config","detail":"cannot find module js (searched test/fixtures and 0 library directories)"}Right — this one fails because use js can't find the module (search path doesn't include stdlib). This is the same failure as on main.
So the agent tests that actually hang must be ones that DON'T have control channels and DO resolve successfully. Let me check which test the process was actually stuck on when I observed the hang. Earlier I saw pid 6326 running wiring_telemetry_source.plumb, but that should exit immediately with exit code 2. Unless the test framework itself is stuck reading stdout.
Wait — looking at run_plumb (lines 11-39), it reads stdout BEFORE stderr. If the binary writes to stderr first, stdout might block. But In_channel.input_all reads until EOF on the fd, so it should complete when the process exits.
Let me re-run the test and see which one hangs:
cd /tmp/plmb-037-step5b && nix develop --command sh -c 'cd _build/default/test && timeout 30 ./test_plumb.exe test "agent control port" -v' 2>&1 | tail -30
┌──────────────────────────────────────────────────────────────────────────────┐ │ [FAIL] agent control port 4 wiring implicit control. │ └──────────────────────────────────────────────────────────────────────────────┘ FAIL exit 0 Expected: `0' Received: `1' Raised at Alcotest_engine__Test.check in file "src/alcotest-engine/test.ml", lines 216-226, characters 4-19 Called from Alcotest_engine__Core.Make.protect_test.(fun) in file "src/alcotest-engine/core.ml", line 186, characters 17-23 Called from Alcotest_engine__Monad.Identity.catch in file "src/alcotest-engine/monad.ml", line 24, characters 31-35 ────────────────────────────────────────────────────────────────────────────── ┌──────────────────────────────────────────────────────────────────────────────┐ │ [FAIL] agent control port 5 wiring telemetry as c... │ └──────────────────────────────────────────────────────────────────────────────┘ FAIL exit 0 Expected: `0' Received: `2' Raised at Alcotest_engine__Test.check in file "src/alcotest-engine/test.ml", lines 216-226, characters 4-19 Called from Alcotest_engine__Core.Make.protect_test.(fun) in file "src/alcotest-engine/core.ml", line 186, characters 17-23 Called from Alcotest_engine__Monad.Identity.catch in file "src/alcotest-engine/monad.ml", line 24, characters 31-35 ────────────────────────────────────────────────────────────────────────────── 5 failures! in 0.071s. 6 tests run.
They all FAIL (not hang!) and complete in 0.071s. The agent control port tests exit 1 (internal error from the resolve_channels crash) or exit 2 (config error), but they don't hang.
So the hang is elsewhere. Let me run the full test_plumb with timeout to find what actually hangs:
cd /tmp/plmb-037-step5b && nix develop --command sh -c 'cd _build/default/test && timeout 30 ./test_plumb.exe -v' 2>&1 | tail -40
Expected: `0' Received: `1' Raised at Alcotest_engine__Test.check in file "src/alcotest-engine/test.ml", lines 216-226, characters 4-19 Called from Alcotest_engine__Core.Make.protect_test.(fun) in file "src/alcotest-engine/core.ml", line 186, characters 17-23 Called from Alcotest_engine__Monad.Identity.catch in file "src/alcotest-engine/monad.ml", line 24, characters 31-35 ────────────────────────────────────────────────────────────────────────────── ┌──────────────────────────────────────────────────────────────────────────────┐ │ [FAIL] _format_json 3 wrong type rejected. │ └──────────────────────────────────────────────────────────────────────────────┘ FAIL exit 2 (config_error) Expected: `2' Received: `0' Raised at Alcotest_engine__Test.check in file "src/alcotest-engine/test.ml", lines 216-226, characters 4-19 Called from Alcotest_engine__Core.Make.protect_test.(fun) in file "src/alcotest-engine/core.ml", line 186, characters 17-23 Called from Alcotest_engine__Monad.Identity.catch in file "src/alcotest-engine/monad.ml", line 24, characters 31-35 ────────────────────────────────────────────────────────────────────────────── ┌──────────────────────────────────────────────────────────────────────────────┐ │ [FAIL] _parse_json 3 wrong type rejected. │ └──────────────────────────────────────────────────────────────────────────────┘ FAIL exit 2 (config_error) Expected: `2' Received: `0' Raised at Alcotest_engine__Test.check in file "src/alcotest-engine/test.ml", lines 216-226, characters 4-19 Called from Alcotest_engine__Core.Make.protect_test.(fun) in file "src/alcotest-engine/core.ml", line 186, characters 17-23 Called from Alcotest_engine__Monad.Identity.catch in file "src/alcotest-engine/monad.ml", line 24, characters 31-35 ────────────────────────────────────────────────────────────────────────────── 21 failures! in 0.912s. 87 tests run.
All 87 tests complete in 0.912s — no hanging. There are 21 failures but no hangs. The earlier hanging was from before I added EOF propagation to inline morphisms.
The failures are:
1. Agent control port tests: exit 1 instead of 0 (resolve_channels crash on ctrl_out channels)
2. _format_json wrong type / _parse_json wrong type: exit 0 instead of 2 (validation regression)
3. wiring_telemetry_source: exit 2 instead of 0 (pre-existing on main)
4. eof-control tests: exit 1 instead of 0 (same resolve_channels crash)
Let me check the type rejection tests and verify the agent failures are the same as on main:
cd /Users/ww/src/plumbing && nix develop --command sh -c 'cd _build/default/test && timeout 30 ./test_plumb.exe -v' 2>&1 | grep FAIL
[FAIL] agent control port 5 wiring telemetry as chai... │ [FAIL] agent control port 5 wiring telemetry as c... │ FAIL exit 0
On main, only 1 failure (wiring_telemetry_source). On my branch, 21 failures. Let me identify the new failures:
cd /tmp/plmb-037-step5b && nix develop --command sh -c 'cd _build/default/test && ./test_plumb.exe -v' 2>&1 | grep FAIL | head -25
[FAIL] barrier 4 type mismatch. [FAIL] project 2 out of bounds. [FAIL] type checking 0 direct type mismatch. [FAIL] type checking 1 named type mismatch. [FAIL] heterogeneous merge 0 accepts (A, B) → A | B. [FAIL] heterogeneous merge 1 rejects wrong output type. [FAIL] product type checking 0 copy output mismatch. [FAIL] product type checking 1 copy wrong arity. [FAIL] named spawn 2 unknown port rejected. [FAIL] named spawn 3 missing port rejected. [FAIL] agent telemetry port 0 non-product rejects 3 ch... [FAIL] agent telemetry port 1 product type mismatch. [FAIL] agent control port 0 product input accepted. [FAIL] agent control port 1 control type mismatch. [FAIL] agent control port 3 control + telemetry. [FAIL] agent control port 4 wiring implicit control. [FAIL] agent control port 5 wiring telemetry as chai... [FAIL] eof-control 0 exits cleanly with input. [FAIL] eof-control 1 exits cleanly on empty i... [FAIL] _format_json 3 wrong type rejected. [FAIL] _parse_json 3 wrong type rejected. │ [FAIL] barrier 4 type mismatch. │ FAIL exit 2 │ [FAIL] project 2 out of bounds. │ FAIL exit 2 (config error)
cd /tmp/plmb-037-step5b && nix develop --command sh -c 'cd _build/default/test && ./test_plumb.exe test "barrier" -v' 2>&1 | grep -A5 "type mismatch"
[FAIL] barrier 4 type mismatch. ASSERT exit 0 ASSERT one output line ASSERT product [OK] barrier 5 homogeneous. ASSERT exit 0 -- [SKIP] type checking 0 direct type mismatch. [SKIP] type checking 1 named type mismatch. [SKIP] type checking 2 reserved name rejected. [SKIP] heterogeneous merge 0 accepts (A, B) → A | B. [SKIP] heterogeneous merge 1 rejects wrong output type. [SKIP] product type checking 0 copy output mismatch. [SKIP] product type checking 1 copy wrong arity. -- [SKIP] agent telemetry port 1 product type mismatch. [SKIP] agent telemetry port 2 product wrong arity. [SKIP] agent control port 0 product input accepted. [SKIP] agent control port 1 control type mismatch. [SKIP] agent control port 2 control wrong arity. [SKIP] agent control port 3 control + telemetry. [SKIP] agent control port 4 wiring implicit control. [SKIP] agent control port 5 wiring telemetry as chai... [SKIP] filter 0 passes matching. -- [SKIP] filter 4 type mismatch exit 2. [SKIP] filter 5 field on string rejected. [SKIP] filter 6 non-bool predicate rejec... [SKIP] filter 7 json field access accepted. [SKIP] filter 8 json compound expr accep... [SKIP] toplevel filter 0 field on string rejected. -- [SKIP] toplevel map 1 output type mismatch rej... [SKIP] toplevel map 2 valid expr with matching... [SKIP] toplevel map 3 json field access accepted. [SKIP] map 0 transforms record. [SKIP] map 1 below threshold. [SKIP] map 2 field projection. -- │ [FAIL] barrier 4 type mismatch. │ └──────────────────────────────────────────────────────────────────────────────┘ FAIL exit 2 Expected: `2' Received: `1'
The barrier type mismatch test expects exit 2 but gets exit 1. That means a config error that used to be caught at exit 2 now becomes an unhandled exception at exit 1. This is a different class of failure — likely related to how Routing.compile is now called directly in main.ml vs how Exec.execute called it internally with different error handling.
Let me categorise the failures:
Exit 1 instead of 2: barrier type mismatch, project OOB, type checking, heterogeneous merge, product checking, named spawn, agent port tests — these are routing/validation errors that Routing.compile raises as exceptions, not Error.Process_error. The old Exec caught these internally. The new main.ml needs to catch them.
Exit 0 instead of 2: _format_json wrong type, _parse_json wrong type — validation errors that used to be caught but now pass through.
Agent control crashes: resolve_channels failure on __wc0 channels — the patch_ctrl_out desugarer adds ctrl_out channels that become dangling after resolution.
Pre-existing: wiring_telemetry_source (exit 2, same as main).
Now I have a complete picture. Let me update the plan with all findings.
# PLMB-037 Step 5b: Complete Switchover — Fix Plan
## Context
The plumb binary switchover from `Exec.execute` to
`Fabric.execute` + `Bridge.make_handler` is in progress on branch
`plmb-037-step5b` (worktree `/tmp/plmb-037-step5b`). The core
switchover is implemented. All inline-only pipeline tests pass
(identity, copy, merge, barrier, filter, map, wiring, tensor,
format/parse JSON — 66 tests).
21 test failures remain. No hanging — earlier hangs were from
missing EOF propagation in inline morphisms (fixed). The failures
fall into three categories, all fixable.
## Failure Analysis
### Category A: Exit 1 instead of Exit 2 (12 failures)
Tests: barrier type mismatch, project OOB, direct/named type
mismatch, heterogeneous merge, product type checking, named spawn
errors.
**Cause:** `Routing.compile` raises bare `Failure` exceptions for
validation errors. The old `Exec.execute` caught these internally
and converted them to `Error.Process_error` (exit 2). The new
`main.ml` calls `Routing.compile` directly — `Failure` propagates
to `Entry.main` which catches it as an unknown exception (exit 1).
**Fix:** Wrap the `Routing.compile` call in main.ml with a handler
that converts `Failure msg` to `Error.Process_error (Error.make
Config "config_error" msg)`. This matches what `Exec.execute` did.
### Category B: Exit 0 instead of Exit 2 (2 failures)
Tests: `_format_json wrong type`, `_parse_json wrong type`.
**Cause:** These fixtures have type errors that the old Exec path
caught during validation. Need to check what validation step the
new path skips.
Looking at the fixtures:
- `format_json_wrong_type.plumb` — uses `_format_json` with wrong
input/output types
- `parse_json_wrong_type.plumb` — same for `_parse_json`
The old Exec called `Validate.build_type_env` and ran validation
checks internally. The new main.ml calls
`Spec.load_main_from_file` which validates, then calls
`Routing.compile`. The format/parse_json morphisms are compiler-
inserted by the desugarer — their type checking may happen at a
stage the new path doesn't reach.
**Fix:** Investigate the specific validation that these fixtures
exercise. Likely needs the same error-catching as Category A, or
the type error is caught at a stage the fabric path does reach.
### Category C: resolve_channels crash (7 failures)
Tests: agent control port (accepted, mismatch, telemetry, wiring
control), eof-control (both).
**Cause:** The `patch_ctrl_out` desugarer (desugar.ml:31–55) adds
implicit `ctrl_out` channels (named `__wc0`) for agents with
control input. After channel resolution, these implicit channels
become dangling — no writer, and `resolve_channels` fails with
"channel __wc0 still appears as route dst".
This happens because `patch_ctrl_out` is called by
`Spec.desugar_plumbs` which runs during `load_main_from_file`.
The implicit ctrl_out channel has a reader (the agent's ctrl_out
port) but no writer. `resolve_channels` detects this: 0 writers
means the route is dropped, but the channel still appears as a
dst in other routes.
On main (old Exec path), this doesn't crash because `Exec.execute`
handles ctrl_out differently — it creates the ctrl pipe internally
and doesn't go through the routing table. The routing table is
new (PLMB-037 Step 3), and `patch_ctrl_out`'s implicit channels
weren't tested against it.
**Fix:** Two options:
1. **Quick:** In `resolve_channels`, treat channels with 0 writers
and readers that are only agent ctrl_out ports as no-ops (drop
both routes silently). This is a routing compiler fix.
2. **Proper:** `patch_ctrl_out` should not add ctrl_out channels
when there's no consumer. Or the routing compiler should handle
the "readers with no writers" case by dropping both sides of the
channel instead of leaving dangling routes.
The proper fix is (2): in `resolve_channels`, when a channel has
readers but no writers, drop the reader routes too (don't just log
a warning and leave them dangling). This matches the categorical
semantics: a channel with no writer is ν-bound name with no
producer — the readers should be vacuously satisfied (or skipped).
### Pre-existing (1 failure)
`wiring_telemetry_source` — exit 2 on both main and branch (can't
find `js` module). Not caused by this switchover. Skip for now.
## Implementation Plan
### Step 1: Fix resolve_channels (Category C)
**File:** `lib/runtime/routing.ml`, `resolve_channels` function
When a channel has readers but no writers (line 170–174), currently:
```ocaml
| 0, _ ->
Log.warn "routing" ["event", `String "channel_no_writers"; ...]
```
The warning is logged but the reader routes are left in place,
causing validation failure later.
Fix: also remove the reader routes (don't add them to `resolved`).
The readers will never receive data — leaving the routes creates
dangling PULL sockets.
```ocaml
| 0, _ ->
(* Readers with no writers — drop both sides.
ν-restricted channel with no producer. *)
Log.warn "routing"
["event", `String "channel_no_writers";
"channel", `String ch_name]
```
No `resolved := ...` — the readers are simply not emitted.
### Step 2: Fix error handling (Category A)
**File:** `bin/plumb/main.ml`
Wrap the routing table compilation in a handler:
```ocaml
let routing_table =
try Routing.compile ~prog ~ports ~pb
~type_env ~main_binding:main ~seeded:pb.pb_seeded
with Failure msg ->
Runtime.Error.raise_ Config "config_error" msg
in
```
This converts bare `Failure` from `Routing.compile` into
`Process_error` which `Entry.main` catches at exit 2.
### Step 3: Investigate Category B
**Files:** `test/fixtures/format_json_wrong_type.plumb`,
`test/fixtures/parse_json_wrong_type.plumb`
Check what type error these exercise. Likely the validation runs
during `Spec.load_main_from_file` (which both old and new paths
call), so the error should be caught. If they're getting exit 0,
the binary is somehow not detecting the error. Investigate and fix.
### Step 4: Bridge ctrl_in mux (deferred)
The ctrl_in hang analysis from the reviews is **not needed for this
step**. The `resolve_channels` fix (Step 1) ensures that ctrl_in
channels with no writers produce no PULL sockets. `find_pull
"ctrl_in"` returns `None` and the bridge correctly calls
`mux_source_done()` immediately.
The `Eio.Promise` fix is only needed if a ctrl_in channel has an
upstream writer that never sends EOF. This is a future concern
(session protocols on ctrl channels). Defer to a follow-up ticket.
## Review Findings (incorporated)
**Theory critic:** The bridge's internal coordination is sound —
implementation detail within the natural transformation, invisible
to the algebra. Lifecycle asymmetry (data drives shutdown, control
subordinate) is correct per operational semantics.
**Architect:** The simpler fix is to not create sockets for
dangling routes. The `resolve_channels` fix achieves this at the
routing layer. The `Eio.Promise` is a tactical fix for a scenario
that the routing fix eliminates.
**Engineer:** If the promise is needed in future, guard
`Promise.resolve` with `Atomic.compare_and_set` (not idempotent).
`Fiber.first` + `T.recv` cancellation via `await_readable` is
correct.
## Files to Modify
| File | Change |
|------|--------|
| `lib/runtime/routing.ml` | Drop reader routes for writerless channels |
| `bin/plumb/main.ml` | Catch `Failure` from `Routing.compile` |
| Category B investigation | TBD — may need `bin/plumb/main.ml` or fixture changes |
## Verification
1. `nix develop --command dune build --root=.`
2. `nix develop --command dune test --root=.` — expect ≤ 1 failure
(pre-existing `wiring_telemetry_source`)
3. Full test suite completes in < 20s
4. Manual: identity, copy-merge, agent control all work
5. Agent with empty stdin exits cleanly (no hang)
The file /Users/ww/.claude/plans/abundant-cuddling-oasis.md has been updated successfully.
The plan is complete. Regarding the user's note about nix develop — yes, all commands in the verification section already use nix develop --command.
Summary of findings:
Failure exceptions not caught as exit 2 (12), and 2 type-validation regressionsresolve_channels (drop reader routes for writerless channels) and error handling in main.mlEio.Promise bridge fix from the original plan is not needed — the routing fix eliminates the scenario{
"allowedPrompts": [
{
"tool": "Bash",
"prompt": "build project with nix develop and dune"
},
{
"tool": "Bash",
"prompt": "run tests with nix develop and dune"
},
{
"tool": "Bash",
"prompt": "run manual pipeline tests"
},
{
"tool": "Bash",
"prompt": "check process status"
},
{
"tool": "Bash",
"prompt": "git operations in worktree"
}
],
"plan": "# PLMB-037 Step 5b: Complete Switchover — Fix Plan\n\n## Context\n\nThe plumb binary switchover from `Exec.execute` to\n`Fabric.execute` + `Bridge.make_handler` is in progress on branch\n`plmb-037-step5b` (worktree `/tmp/plmb-037-step5b`). The core\nswitchover is implemented. All inline-only pipeline tests pass\n(identity, copy, merge, barrier, filter, map, wiring, tensor,\nformat/parse JSON — 66 tests).\n\n21 test failures remain. No hanging — earlier hangs were from\nmissing EOF propagation in inline morphisms (fixed). The failures\nfall into three categories, all fixable.\n\n## Failure Analysis\n\n### Category A: Exit 1 instead of Exit 2 (12 failures)\n\nTests: barrier type mismatch, project OOB, direct/named type\nmismatch, heterogeneous merge, product type checking, named spawn\nerrors.\n\n**Cause:** `Routing.compile` raises bare `Failure` exceptions for\nvalidation errors. The old `Exec.execute` caught these internally\nand converted them to `Error.Process_error` (exit 2). The new\n`main.ml` calls `Routing.compile` directly — `Failure` propagates\nto `Entry.main` which catches it as an unknown exception (exit 1).\n\n**Fix:** Wrap the `Routing.compile` call in main.ml with a handler\nthat converts `Failure msg` to `Error.Process_error (Error.make\nConfig \"config_error\" msg)`. This matches what `Exec.execute` did.\n\n### Category B: Exit 0 instead of Exit 2 (2 failures)\n\nTests: `_format_json wrong type`, `_parse_json wrong type`.\n\n**Cause:** These fixtures have type errors that the old Exec path\ncaught during validation. Need to check what validation step the\nnew path skips.\n\nLooking at the fixtures:\n- `format_json_wrong_type.plumb` — uses `_format_json` with wrong\n input/output types\n- `parse_json_wrong_type.plumb` — same for `_parse_json`\n\nThe old Exec called `Validate.build_type_env` and ran validation\nchecks internally. The new main.ml calls\n`Spec.load_main_from_file` which validates, then calls\n`Routing.compile`. The format/parse_json morphisms are compiler-\ninserted by the desugarer — their type checking may happen at a\nstage the new path doesn't reach.\n\n**Fix:** Investigate the specific validation that these fixtures\nexercise. Likely needs the same error-catching as Category A, or\nthe type error is caught at a stage the fabric path does reach.\n\n### Category C: resolve_channels crash (7 failures)\n\nTests: agent control port (accepted, mismatch, telemetry, wiring\ncontrol), eof-control (both).\n\n**Cause:** The `patch_ctrl_out` desugarer (desugar.ml:31–55) adds\nimplicit `ctrl_out` channels (named `__wc0`) for agents with\ncontrol input. After channel resolution, these implicit channels\nbecome dangling — no writer, and `resolve_channels` fails with\n\"channel __wc0 still appears as route dst\".\n\nThis happens because `patch_ctrl_out` is called by\n`Spec.desugar_plumbs` which runs during `load_main_from_file`.\nThe implicit ctrl_out channel has a reader (the agent's ctrl_out\nport) but no writer. `resolve_channels` detects this: 0 writers\nmeans the route is dropped, but the channel still appears as a\ndst in other routes.\n\nOn main (old Exec path), this doesn't crash because `Exec.execute`\nhandles ctrl_out differently — it creates the ctrl pipe internally\nand doesn't go through the routing table. The routing table is\nnew (PLMB-037 Step 3), and `patch_ctrl_out`'s implicit channels\nweren't tested against it.\n\n**Fix:** Two options:\n1. **Quick:** In `resolve_channels`, treat channels with 0 writers\n and readers that are only agent ctrl_out ports as no-ops (drop\n both routes silently). This is a routing compiler fix.\n2. **Proper:** `patch_ctrl_out` should not add ctrl_out channels\n when there's no consumer. Or the routing compiler should handle\n the \"readers with no writers\" case by dropping both sides of the\n channel instead of leaving dangling routes.\n\nThe proper fix is (2): in `resolve_channels`, when a channel has\nreaders but no writers, drop the reader routes too (don't just log\na warning and leave them dangling). This matches the categorical\nsemantics: a channel with no writer is ν-bound name with no\nproducer — the readers should be vacuously satisfied (or skipped).\n\n### Pre-existing (1 failure)\n\n`wiring_telemetry_source` — exit 2 on both main and branch (can't\nfind `js` module). Not caused by this switchover. Skip for now.\n\n## Implementation Plan\n\n### Step 1: Fix resolve_channels (Category C)\n\n**File:** `lib/runtime/routing.ml`, `resolve_channels` function\n\nWhen a channel has readers but no writers (line 170–174), currently:\n```ocaml\n| 0, _ ->\n Log.warn \"routing\" [\"event\", `String \"channel_no_writers\"; ...]\n```\nThe warning is logged but the reader routes are left in place,\ncausing validation failure later.\n\nFix: also remove the reader routes (don't add them to `resolved`).\nThe readers will never receive data — leaving the routes creates\ndangling PULL sockets.\n\n```ocaml\n| 0, _ ->\n (* Readers with no writers — drop both sides.\n ν-restricted channel with no producer. *)\n Log.warn \"routing\"\n [\"event\", `String \"channel_no_writers\";\n \"channel\", `String ch_name]\n```\nNo `resolved := ...` — the readers are simply not emitted.\n\n### Step 2: Fix error handling (Category A)\n\n**File:** `bin/plumb/main.ml`\n\nWrap the routing table compilation in a handler:\n\n```ocaml\nlet routing_table =\n try Routing.compile ~prog ~ports ~pb\n ~type_env ~main_binding:main ~seeded:pb.pb_seeded\n with Failure msg ->\n Runtime.Error.raise_ Config \"config_error\" msg\nin\n```\n\nThis converts bare `Failure` from `Routing.compile` into\n`Process_error` which `Entry.main` catches at exit 2.\n\n### Step 3: Investigate Category B\n\n**Files:** `test/fixtures/format_json_wrong_type.plumb`,\n`test/fixtures/parse_json_wrong_type.plumb`\n\nCheck what type error these exercise. Likely the validation runs\nduring `Spec.load_main_from_file` (which both old and new paths\ncall), so the error should be caught. If they're getting exit 0,\nthe binary is somehow not detecting the error. Investigate and fix.\n\n### Step 4: Bridge ctrl_in mux (deferred)\n\nThe ctrl_in hang analysis from the reviews is **not needed for this\nstep**. The `resolve_channels` fix (Step 1) ensures that ctrl_in\nchannels with no writers produce no PULL sockets. `find_pull\n\"ctrl_in\"` returns `None` and the bridge correctly calls\n`mux_source_done()` immediately.\n\nThe `Eio.Promise` fix is only needed if a ctrl_in channel has an\nupstream writer that never sends EOF. This is a future concern\n(session protocols on ctrl channels). Defer to a follow-up ticket.\n\n## Review Findings (incorporated)\n\n**Theory critic:** The bridge's internal coordination is sound —\nimplementation detail within the natural transformation, invisible\nto the algebra. Lifecycle asymmetry (data drives shutdown, control\nsubordinate) is correct per operational semantics.\n\n**Architect:** The simpler fix is to not create sockets for\ndangling routes. The `resolve_channels` fix achieves this at the\nrouting layer. The `Eio.Promise` is a tactical fix for a scenario\nthat the routing fix eliminates.\n\n**Engineer:** If the promise is needed in future, guard\n`Promise.resolve` with `Atomic.compare_and_set` (not idempotent).\n`Fiber.first` + `T.recv` cancellation via `await_readable` is\ncorrect.\n\n## Files to Modify\n\n| File | Change |\n|------|--------|\n| `lib/runtime/routing.ml` | Drop reader routes for writerless channels |\n| `bin/plumb/main.ml` | Catch `Failure` from `Routing.compile` |\n| Category B investigation | TBD — may need `bin/plumb/main.ml` or fixture changes |\n\n## Verification\n\n1. `nix develop --command dune build --root=.`\n2. `nix develop --command dune test --root=.` — expect ≤ 1 failure\n (pre-existing `wiring_telemetry_source`)\n3. Full test suite completes in < 20s\n4. Manual: identity, copy-merge, agent control all work\n5. Agent with empty stdin exits cleanly (no hang)\n"
}The user doesn't want to proceed with this tool use. The tool use was rejected (eg. if it was a file edit, the new_string was NOT written to the file). To tell you how to proceed, the user said: do not do the quick fix do th proper fix for C
Let me update the plan. The user wants the proper fix for Category C — not just dropping reader routes in resolve_channels, but fixing patch_ctrl_out so it doesn't create phantom channels, and making the routing compiler handle the general case correctly.
Let me investigate what patch_ctrl_out actually does and why it creates dangling channels:
{
"file_path": "/tmp/plmb-037-step5b/lib/language/desugar.ml",
"offset": 28,
"limit": 30
} 28→
29→(** Inject implicit ctrl_out channels into user spawns for agent bindings
30→ with control that don't explicitly wire ctrl_out. *)
31→let patch_ctrl_out type_env prog fresh (pb : plumb_body) =
32→ let binding_map =
33→ List.map (fun (b : binding) -> (b.bd_name, b)) pb.pb_bindings
34→ @ List.map (fun (b : binding) -> (b.bd_name, b)) prog.prog_bindings
35→ in
36→ let extra_channels = ref [] in
37→ let patched_spawns = List.map (fun (sp : spawn_stmt) ->
38→ match List.assoc_opt sp.sp_process binding_map with
39→ | Some b when (match b.impl with Agent _ -> true | _ -> false) ->
40→ let has_named = List.exists (function Labeled _ -> true | _ -> false)
41→ sp.sp_args in
42→ let has_ctrl_out = List.exists (function
43→ | Labeled ("ctrl_out", _) -> true | _ -> false) sp.sp_args in
44→ (match Validate.resolve type_env b.input with
45→ | Product [Stream _; Stream ctrl_t] when has_named && not has_ctrl_out ->
46→ let ch = fresh "wc" in
47→ extra_channels := { ch_name = ch; ch_type = ctrl_t }
48→ :: !extra_channels;
49→ { sp with sp_args = sp.sp_args @ [Labeled ("ctrl_out", ch)] }
50→ | _ -> sp)
51→ | _ -> sp
52→ ) pb.pb_spawns in
53→ { pb with
54→ pb_channels = pb.pb_channels @ List.rev !extra_channels;
55→ pb_spawns = patched_spawns }
56→
57→let desugar type_env (prog : program) (main : binding) (pb : plumb_body) : plumb_body =I see the issue. patch_ctrl_out checks b.input (line 44) — when the input is a product (!A, !Control), it adds a ctrl_out labeled argument. But this is checking the INPUT type to determine if there's a CTRL_OUT port. That seems wrong — ctrl_out is about the output, not the input.
Wait — looking at the agent port convention: when input = (!A, !Control), the agent has 4 ports: data_in, ctrl_in, data_out, ctrl_out. The ctrl_out port carries control responses. The function adds a ctrl_out channel when the agent has control input but the user didn't explicitly wire ctrl_out.
The problem: the implicit ctrl_out channel (__wc0) has the agent writing to it (via the ctrl_out port), but nothing reads from it. In the old Exec, this was handled by creating a drain (discard) thread for unclaimed ctrl_out. The routing compiler doesn't do this.
Actually wait — let me re-read. The routing table for agents with control (line 419-425):
if n >= 4 then begin
let ch_ctrl_in = List.nth channels 1 in
let ch_ctrl_out = List.nth channels 3 in
add_route ~src:ch_ctrl_in ~src_port:"out" ~dst:proc_name ~dst_port:"ctrl_in"
add_route ~src:proc_name ~src_port:"ctrl_out" ~dst:ch_ctrl_out ~dst_port:"in"
end
So ctrl_out creates a route: proc_name:ctrl_out → ch_ctrl_out:in. The channel ch_ctrl_out (__wc0) has one writer (the agent) but no reader. In resolve_channels, this case is handled at line 167-169:
| _, 0 ->
(* Writers with no readers — telemetry sinks. Re-emit unchanged. *)
List.iter (fun w -> resolved := w :: !resolved) ch_writers
Writers with no readers are kept as-is ("telemetry sinks"). So the route proc_name:ctrl_out → __wc0:in stays, but __wc0 is a channel name, not a process. After resolution, the route should have been resolved but it isn't — because __wc0 as destination has no corresponding reader route to cross-join with.
Actually wait, let me re-trace. The route is proc_name:ctrl_out → ch_ctrl_out:in. Here ch_ctrl_out is a channel. So dst = ch_ctrl_out is a channel. This route goes into writers for ch_ctrl_out. Then at resolution:
- ch_ctrl_out has writers (the agent writes via ctrl_out) but no readers
- The _, 0 case re-emits the writer routes unchanged
But the re-emitted route has dst = ch_ctrl_out which is a channel name, not a process. The validation at line 196-203 then catches this:
if is_channel r.dst && is_real_node r.src then
failwith "resolve_channels: channel %s still appears as route dst"
So the crash is: the writer route has a channel as dst, and after "re-emitting unchanged", the validation catches it.
The fix needs to handle "writers with no readers" properly. For ctrl_out with no reader, the route should be dropped entirely (the data goes nowhere). Or: the route should be resolved to a process-to-process route, but since there's no reader process, it should be dropped.
Now also for ctrl_in: ch_ctrl_in has readers (agent's ctrl_in port) but no writers. The 0, _ case logs a warning but doesn't remove the reader routes. The reader route ch_ctrl_in:out → proc_name:ctrl_in has a channel as src. This should also be caught by validation, but maybe it isn't because is_channel r.src && is_real_node r.dst checks the opposite direction.
Let me look at the full validation:
{
"file_path": "/tmp/plmb-037-step5b/lib/runtime/routing.ml",
"offset": 195,
"limit": 12
}195→ (* Validate: no channel names remain as src or dst *) 196→ List.iter (fun r -> 197→ if is_channel r.src && is_real_node r.dst then 198→ failwith (Printf.sprintf 199→ "resolve_channels: channel %s still appears as route src" r.src); 200→ if is_channel r.dst && is_real_node r.src then 201→ failwith (Printf.sprintf 202→ "resolve_channels: channel %s still appears as route dst" r.dst) 203→ ) !resolved; 204→ 205→ List.rev !resolved 206→
Both cases are checked. The __wc0 crash is from line 200-202 (channel as dst, agent as src). The ch_ctrl case (readers with no writers) would also crash at line 197-199 if the reader routes were kept (channel as src, agent as dst).
Currently for 0, _ (readers no writers), the reader routes are NOT added to resolved (just a warning). So they don't crash. But for _, 0 (writers no readers), the writer routes ARE re-emitted unchanged (line 169), and then they crash because the channel name remains as dst.
The proper fix:
In resolve_channels: When a channel has writers but no readers (_, 0), drop the routes entirely. Don't re-emit the writer routes with channel names as dst. These routes go nowhere — the data is lost. The "telemetry sinks" comment is wrong — a telemetry sink without a reader should be a discard, not a re-emitted route.
Wait — but existing telemetry channels DO work on main. Let me check how. On main, the old Exec creates a drain thread for unclaimed channels. The routing table is new (Step 3) and was only used by the fabric test, which tested simple pipelines without unclaimed channels.
So the proper fix is: in resolve_channels, when n_writers > 0 and n_readers = 0, drop the routes (don't add to resolved). The channel's data goes nowhere, which is correct for ctrl_out and telemetry channels with no consumer — the bridge handles telemetry by sending EOF sentinels on the telemetry PUSH socket regardless.
Let me also check: does the bridge handle the case where there's NO telemetry PUSH socket? Looking at bridge.ml:
{
"file_path": "/tmp/plmb-037-step5b/lib/fabric/bridge.ml",
"offset": 338,
"limit": 20
}338→ `Stop_daemon) 339→ | None -> ()); 340→ 341→ (* === Demux: child stdout → ZMQ PUSH sockets === 342→ 343→ Pipe reads use Eio_unix.run_in_systhread (blocking input_line). 344→ ZMQ sends run in Eio fibre context. *) 345→ let data_out_sock = find_push "data_out" in 346→ let ctrl_out_sock = find_push "ctrl_out" in 347→ let telemetry_sock = find_push "telemetry" in 348→ 349→ Eio.Fiber.fork_daemon ~sw (fun () -> 350→ let ic = Unix.in_channel_of_descr child_out_r in 351→ (try while true do 352→ let line = Eio_unix.run_in_systhread (fun () -> 353→ input_line ic) in 354→ if String.length (String.trim line) > 0 then 355→ match Yojson.Safe.from_string line with 356→ | `Assoc fields -> 357→ let port = match List.assoc_opt "__port" fields with
The bridge uses Option.iter for all socket sends (data_out_sock, ctrl_out_sock, telemetry_sock). If a socket is None, the data is silently dropped. So it already handles missing sockets correctly. Dropping the dangling routes means no PUSH socket is created, find_push "ctrl_out" returns None, and ctrl_out data from the agent is discarded — which is the correct behaviour when nothing consumes it.
Now let me update the plan with the proper fix:
### Category C: resolve_channels crash (7 failures) Tests: agent control port (accepted, mismatch, telemetry, wiring control), eof-control (both). **Cause:** The `patch_ctrl_out` desugarer (desugar.ml:31–55) adds implicit `ctrl_out` channels (named `__wc0`) for agents with control input. After channel resolution, these implicit channels become dangling — no writer, and `resolve_channels` fails with "channel __wc0 still appears as route dst". This happens because `patch_ctrl_out` is called by `Spec.desugar_plumbs` which runs during `load_main_from_file`. The implicit ctrl_out channel has a reader (the agent's ctrl_out port) but no writer. `resolve_channels` detects this: 0 writers means the route is dropped, but the channel still appears as a dst in other routes. On main (old Exec path), this doesn't crash because `Exec.execute` handles ctrl_out differently — it creates the ctrl pipe internally and doesn't go through the routing table. The routing table is new (PLMB-037 Step 3), and `patch_ctrl_out`'s implicit channels weren't tested against it. **Fix:** Two options: 1. **Quick:** In `resolve_channels`, treat channels with 0 writers and readers that are only agent ctrl_out ports as no-ops (drop both routes silently). This is a routing compiler fix. 2. **Proper:** `patch_ctrl_out` should not add ctrl_out channels when there's no consumer. Or the routing compiler should handle the "readers with no writers" case by dropping both sides of the channel instead of leaving dangling routes. The proper fix is (2): in `resolve_channels`, when a channel has readers but no writers, drop the reader routes too (don't just log a warning and leave them dangling). This matches the categorical semantics: a channel with no writer is ν-bound name with no producer — the readers should be vacuously satisfied (or skipped).
### Category C: resolve_channels crash (7 failures) Tests: agent control port (accepted, mismatch, telemetry, wiring control), eof-control (both). **Cause:** Two bugs in `resolve_channels` (routing.ml:165–193): 1. **Writers with no readers** (`_, 0` case, line 167–169): The code re-emits writer routes unchanged. But these routes have a channel name as `dst` (e.g., `smart0:ctrl_out → __wc0:in`). After resolution, channel names must not appear as src or dst. The validation at line 200–202 catches this and crashes. This affects ctrl_out and telemetry channels with no consumer. The `patch_ctrl_out` desugarer (desugar.ml:31–55) adds implicit `ctrl_out` channels (`__wc0`) that trigger this. 2. **Readers with no writers** (`0, _` case, line 170–174): The code logs a warning but the reader routes were already NOT added to `resolved` (correct). However, the same pattern — a channel with no upstream — affects ctrl_in channels declared but unused. Currently handled correctly (routes dropped, warning logged). On main (old Exec path), these don't crash because `Exec.execute` creates ctrl pipes internally and doesn't use the routing table. The routing table is new (PLMB-037 Step 3) and `patch_ctrl_out`'s implicit channels weren't tested against it. **Root fix:** Both dangling-channel cases in `resolve_channels` should drop routes cleanly: - `_, 0` (writers, no readers): Drop the routes. The channel's data goes nowhere. The bridge already handles missing PUSH sockets via `Option.iter` — it silently discards data for ports with no socket. This is categorically correct: a ν-bound channel with no reader is a discard (⊤). - `0, _` (readers, no writers): Already correct (routes dropped, warning logged). A ν-bound channel with no writer is an empty stream (⊥). Readers never receive data. The bridge handles both cases gracefully: - No PUSH socket for ctrl_out/telemetry → `find_push` returns `None` → `Option.iter` skips the send → data discarded - No PULL socket for ctrl_in → `find_pull` returns `None` → `mux_source_done()` called immediately → no blocked fibre
The file /Users/ww/.claude/plans/abundant-cuddling-oasis.md has been updated successfully.
## Implementation Plan
### Step 1: Fix resolve_channels (Category C)
**File:** `lib/runtime/routing.ml`, `resolve_channels` function
When a channel has readers but no writers (line 170–174), currently:
```ocaml
| 0, _ ->
Log.warn "routing" ["event", `String "channel_no_writers"; ...]
```
The warning is logged but the reader routes are left in place,
causing validation failure later.
Fix: also remove the reader routes (don't add them to `resolved`).
The readers will never receive data — leaving the routes creates
dangling PULL sockets.
```ocaml
| 0, _ ->
(* Readers with no writers — drop both sides.
ν-restricted channel with no producer. *)
Log.warn "routing"
["event", `String "channel_no_writers";
"channel", `String ch_name]
```
No `resolved := ...` — the readers are simply not emitted.## Implementation Plan
### Step 1: Fix resolve_channels (Category C)
**File:** `lib/runtime/routing.ml`, lines 165–178
The `_, 0` case (writers with no readers) currently re-emits
writer routes unchanged, leaving channel names as dst:
```ocaml
| _, 0 ->
(* Writers with no readers — telemetry sinks. Re-emit unchanged. *)
List.iter (fun w -> resolved := w :: !resolved) ch_writers
```
Fix: drop the routes entirely. A channel with writers but no
readers is a discard (⊤). The data goes nowhere. The bridge
handles missing PUSH sockets via `Option.iter` — it silently
discards data for ports with no ZMQ socket.
```ocaml
| _, 0 ->
(* Writers with no readers — ν-bound channel as discard (⊤).
Drop the routes: the data goes nowhere. The bridge handles
missing PUSH sockets via Option.iter (silent discard). *)
Runtime.Log.debug "routing"
["event", `String "channel_no_readers";
"channel", `String ch_name;
"n_writers", `Int n_writers]
```
The `0, _` case (readers with no writers) is already correct:
warning logged, routes not emitted.The file /Users/ww/.claude/plans/abundant-cuddling-oasis.md has been updated successfully.
## Files to Modify | File | Change | |------|--------| | `lib/runtime/routing.ml` | Drop reader routes for writerless channels | | `bin/plumb/main.ml` | Catch `Failure` from `Routing.compile` | | Category B investigation | TBD — may need `bin/plumb/main.ml` or fixture changes |
## Files to Modify | File | Change | |------|--------| | `lib/runtime/routing.ml` | Fix `_, 0` case: drop writer routes for readerless channels | | `bin/plumb/main.ml` | Catch `Failure` from `Routing.compile` as exit 2 | | Category B investigation | TBD — may need `bin/plumb/main.ml` or fixture changes |
The file /Users/ww/.claude/plans/abundant-cuddling-oasis.md has been updated successfully.
{
"allowedPrompts": [
{
"tool": "Bash",
"prompt": "build project with nix develop and dune"
},
{
"tool": "Bash",
"prompt": "run tests with nix develop and dune"
},
{
"tool": "Bash",
"prompt": "run manual pipeline tests"
},
{
"tool": "Bash",
"prompt": "check process status"
},
{
"tool": "Bash",
"prompt": "git operations in worktree"
}
],
"plan": "# PLMB-037 Step 5b: Complete Switchover — Fix Plan\n\n## Context\n\nThe plumb binary switchover from `Exec.execute` to\n`Fabric.execute` + `Bridge.make_handler` is in progress on branch\n`plmb-037-step5b` (worktree `/tmp/plmb-037-step5b`). The core\nswitchover is implemented. All inline-only pipeline tests pass\n(identity, copy, merge, barrier, filter, map, wiring, tensor,\nformat/parse JSON — 66 tests).\n\n21 test failures remain. No hanging — earlier hangs were from\nmissing EOF propagation in inline morphisms (fixed). The failures\nfall into three categories, all fixable.\n\n## Failure Analysis\n\n### Category A: Exit 1 instead of Exit 2 (12 failures)\n\nTests: barrier type mismatch, project OOB, direct/named type\nmismatch, heterogeneous merge, product type checking, named spawn\nerrors.\n\n**Cause:** `Routing.compile` raises bare `Failure` exceptions for\nvalidation errors. The old `Exec.execute` caught these internally\nand converted them to `Error.Process_error` (exit 2). The new\n`main.ml` calls `Routing.compile` directly — `Failure` propagates\nto `Entry.main` which catches it as an unknown exception (exit 1).\n\n**Fix:** Wrap the `Routing.compile` call in main.ml with a handler\nthat converts `Failure msg` to `Error.Process_error (Error.make\nConfig \"config_error\" msg)`. This matches what `Exec.execute` did.\n\n### Category B: Exit 0 instead of Exit 2 (2 failures)\n\nTests: `_format_json wrong type`, `_parse_json wrong type`.\n\n**Cause:** These fixtures have type errors that the old Exec path\ncaught during validation. Need to check what validation step the\nnew path skips.\n\nLooking at the fixtures:\n- `format_json_wrong_type.plumb` — uses `_format_json` with wrong\n input/output types\n- `parse_json_wrong_type.plumb` — same for `_parse_json`\n\nThe old Exec called `Validate.build_type_env` and ran validation\nchecks internally. The new main.ml calls\n`Spec.load_main_from_file` which validates, then calls\n`Routing.compile`. The format/parse_json morphisms are compiler-\ninserted by the desugarer — their type checking may happen at a\nstage the new path doesn't reach.\n\n**Fix:** Investigate the specific validation that these fixtures\nexercise. Likely needs the same error-catching as Category A, or\nthe type error is caught at a stage the fabric path does reach.\n\n### Category C: resolve_channels crash (7 failures)\n\nTests: agent control port (accepted, mismatch, telemetry, wiring\ncontrol), eof-control (both).\n\n**Cause:** Two bugs in `resolve_channels` (routing.ml:165–193):\n\n1. **Writers with no readers** (`_, 0` case, line 167–169): The\n code re-emits writer routes unchanged. But these routes have a\n channel name as `dst` (e.g., `smart0:ctrl_out → __wc0:in`).\n After resolution, channel names must not appear as src or dst.\n The validation at line 200–202 catches this and crashes.\n\n This affects ctrl_out and telemetry channels with no consumer.\n The `patch_ctrl_out` desugarer (desugar.ml:31–55) adds implicit\n `ctrl_out` channels (`__wc0`) that trigger this.\n\n2. **Readers with no writers** (`0, _` case, line 170–174): The\n code logs a warning but the reader routes were already NOT added\n to `resolved` (correct). However, the same pattern — a channel\n with no upstream — affects ctrl_in channels declared but unused.\n Currently handled correctly (routes dropped, warning logged).\n\nOn main (old Exec path), these don't crash because `Exec.execute`\ncreates ctrl pipes internally and doesn't use the routing table.\nThe routing table is new (PLMB-037 Step 3) and `patch_ctrl_out`'s\nimplicit channels weren't tested against it.\n\n**Root fix:** Both dangling-channel cases in `resolve_channels`\nshould drop routes cleanly:\n\n- `_, 0` (writers, no readers): Drop the routes. The channel's\n data goes nowhere. The bridge already handles missing PUSH\n sockets via `Option.iter` — it silently discards data for ports\n with no socket. This is categorically correct: a ν-bound channel\n with no reader is a discard (⊤).\n\n- `0, _` (readers, no writers): Already correct (routes dropped,\n warning logged). A ν-bound channel with no writer is an empty\n stream (⊥). Readers never receive data.\n\nThe bridge handles both cases gracefully:\n- No PUSH socket for ctrl_out/telemetry → `find_push` returns\n `None` → `Option.iter` skips the send → data discarded\n- No PULL socket for ctrl_in → `find_pull` returns `None` →\n `mux_source_done()` called immediately → no blocked fibre\n\n### Pre-existing (1 failure)\n\n`wiring_telemetry_source` — exit 2 on both main and branch (can't\nfind `js` module). Not caused by this switchover. Skip for now.\n\n## Implementation Plan\n\n### Step 1: Fix resolve_channels (Category C)\n\n**File:** `lib/runtime/routing.ml`, lines 165–178\n\nThe `_, 0` case (writers with no readers) currently re-emits\nwriter routes unchanged, leaving channel names as dst:\n\n```ocaml\n| _, 0 ->\n (* Writers with no readers — telemetry sinks. Re-emit unchanged. *)\n List.iter (fun w -> resolved := w :: !resolved) ch_writers\n```\n\nFix: drop the routes entirely. A channel with writers but no\nreaders is a discard (⊤). The data goes nowhere. The bridge\nhandles missing PUSH sockets via `Option.iter` — it silently\ndiscards data for ports with no ZMQ socket.\n\n```ocaml\n| _, 0 ->\n (* Writers with no readers — ν-bound channel as discard (⊤).\n Drop the routes: the data goes nowhere. The bridge handles\n missing PUSH sockets via Option.iter (silent discard). *)\n Runtime.Log.debug \"routing\"\n [\"event\", `String \"channel_no_readers\";\n \"channel\", `String ch_name;\n \"n_writers\", `Int n_writers]\n```\n\nThe `0, _` case (readers with no writers) is already correct:\nwarning logged, routes not emitted.\n\n### Step 2: Fix error handling (Category A)\n\n**File:** `bin/plumb/main.ml`\n\nWrap the routing table compilation in a handler:\n\n```ocaml\nlet routing_table =\n try Routing.compile ~prog ~ports ~pb\n ~type_env ~main_binding:main ~seeded:pb.pb_seeded\n with Failure msg ->\n Runtime.Error.raise_ Config \"config_error\" msg\nin\n```\n\nThis converts bare `Failure` from `Routing.compile` into\n`Process_error` which `Entry.main` catches at exit 2.\n\n### Step 3: Investigate Category B\n\n**Files:** `test/fixtures/format_json_wrong_type.plumb`,\n`test/fixtures/parse_json_wrong_type.plumb`\n\nCheck what type error these exercise. Likely the validation runs\nduring `Spec.load_main_from_file` (which both old and new paths\ncall), so the error should be caught. If they're getting exit 0,\nthe binary is somehow not detecting the error. Investigate and fix.\n\n### Step 4: Bridge ctrl_in mux (deferred)\n\nThe ctrl_in hang analysis from the reviews is **not needed for this\nstep**. The `resolve_channels` fix (Step 1) ensures that ctrl_in\nchannels with no writers produce no PULL sockets. `find_pull\n\"ctrl_in\"` returns `None` and the bridge correctly calls\n`mux_source_done()` immediately.\n\nThe `Eio.Promise` fix is only needed if a ctrl_in channel has an\nupstream writer that never sends EOF. This is a future concern\n(session protocols on ctrl channels). Defer to a follow-up ticket.\n\n## Review Findings (incorporated)\n\n**Theory critic:** The bridge's internal coordination is sound —\nimplementation detail within the natural transformation, invisible\nto the algebra. Lifecycle asymmetry (data drives shutdown, control\nsubordinate) is correct per operational semantics.\n\n**Architect:** The simpler fix is to not create sockets for\ndangling routes. The `resolve_channels` fix achieves this at the\nrouting layer. The `Eio.Promise` is a tactical fix for a scenario\nthat the routing fix eliminates.\n\n**Engineer:** If the promise is needed in future, guard\n`Promise.resolve` with `Atomic.compare_and_set` (not idempotent).\n`Fiber.first` + `T.recv` cancellation via `await_readable` is\ncorrect.\n\n## Files to Modify\n\n| File | Change |\n|------|--------|\n| `lib/runtime/routing.ml` | Fix `_, 0` case: drop writer routes for readerless channels |\n| `bin/plumb/main.ml` | Catch `Failure` from `Routing.compile` as exit 2 |\n| Category B investigation | TBD — may need `bin/plumb/main.ml` or fixture changes |\n\n## Verification\n\n1. `nix develop --command dune build --root=.`\n2. `nix develop --command dune test --root=.` — expect ≤ 1 failure\n (pre-existing `wiring_telemetry_source`)\n3. Full test suite completes in < 20s\n4. Manual: identity, copy-merge, agent control all work\n5. Agent with empty stdin exits cleanly (no hang)\n"
}The user doesn't want to proceed with this tool use. The tool use was rejected (eg. if it was a file edit, the new_string was NOT written to the file). To tell you how to proceed, the user said: send that proposed fix for review
{
"subagent_type": "theory-critic",
"prompt": "Review this fix to the routing table channel resolver for categorical coherence.\n\n## Context\n\nThe plumbing language describes a monoidal CD category. Composition is implemented via ν-restriction: `f ; g = (νc)(f(in,c) | g(c,out))`. The routing table compiler resolves ν-bound channels by cross-joining writer and reader routes, eliminating channels as intermediaries.\n\nThe resolved routing table has an invariant: no channel names remain as route src or dst — only process names and boundary ports.\n\n## The Bug\n\nIn `resolve_channels` (lib/runtime/routing.ml:165–178), channels are classified by writer/reader count:\n\n```ocaml\n| 0, 0 -> () (* unused — dropped *)\n| _, 0 -> re-emit writer routes unchanged (* BUG *)\n| 0, _ -> log warning (* readers dropped — correct *)\n| _ -> cross-join writers and readers\n```\n\nThe `_, 0` case (writers with no readers) re-emits writer routes with channel names as dst. This violates the post-resolution invariant and crashes validation.\n\nThis case arises from `patch_ctrl_out` in the desugarer, which adds implicit ctrl_out channels for agents with control input. The implicit channel has the agent as writer but no reader (nobody consumes ctrl_out responses).\n\n## The Fix\n\nChange the `_, 0` case to drop the routes entirely:\n\n```ocaml\n| _, 0 ->\n (* Writers with no readers — ν-bound channel as discard (⊤).\n Drop the routes: the data goes nowhere. *)\n Log.debug \"routing\" [...]\n```\n\nThe comment says \"telemetry sinks\" but the current code doesn't create a sink — it leaves dangling routes. The fix drops them. The bridge handles missing PUSH sockets via `Option.iter` (silent discard), so dropping the routes is operationally correct.\n\n## Questions\n\n1. Is it categorically correct to treat a ν-bound channel with writers but no readers as a discard (⊤)? In the algebra, `f ; ⊤ = ⊤` — composition with discard discards the output. Is dropping the routes the correct realisation of this?\n\n2. The current comment says \"telemetry sinks\". In the operational semantics, telemetry is a secondary output (observation channel). If a telemetry channel has no consumer, should the system:\n (a) Discard silently (drop routes) — the fix\n (b) Create an implicit discard process in the routing table\n (c) Error (you declared an output with no consumer)\n\n3. The `0, _` case (readers with no writers) is treated differently — warning only, routes dropped. The asymmetry: writerless readers get dropped + warned, readerless writers get dropped + debug-logged. Is this asymmetry principled? In the algebra:\n - Channel with no writer = empty stream (⊥): `⊥ ; g = ⊥`\n - Channel with no reader = discard: `f ; ⊤ = ⊤` (or just `f` if we don't force consumption)\n\n4. Does dropping readerless writer routes affect EOF propagation? If the agent writes to ctrl_out but the PUSH socket doesn't exist, the bridge discards the data. The bridge still sends EOF sentinels on demux exit. But with no PUSH socket for ctrl_out, no EOF is sent on that port. Is this correct? (The ctrl_out port has no downstream consumer, so nobody is waiting for the EOF.)\n\nPlease critique from a categorical perspective. Is the fix principled?"
}{
"subagent_type": "architect",
"prompt": "Review the resolve_channels fix for architectural coherence.\n\n## System Context\n\nThe routing table compiler in `lib/runtime/routing.ml` is the interface between the language layer (AST, types) and the fabric layer (ZMQ sockets, processes). It compiles a plumb body into a `routing_table` with process entries and routes. The fabric reads this table to create sockets and wire processes.\n\nPost-resolution invariant: all route src/dst are process names or boundary ports — no channel names remain.\n\n## The Bug\n\n`resolve_channels` has a case for channels with writers but no readers (`_, 0`). Currently it re-emits writer routes unchanged, leaving channel names in the resolved table. This crashes the post-resolution validation.\n\nThe case arises from `patch_ctrl_out` (desugar.ml:31–55), which adds implicit ctrl_out channels for agents with control input. These channels have no consumer.\n\n## The Fix\n\nDrop the routes for channels with writers but no readers:\n\n```ocaml\n| _, 0 ->\n Log.debug \"routing\" [\"event\", `String \"channel_no_readers\"; ...]\n```\n\nNo routes emitted. The bridge handles missing PUSH sockets by silently discarding data (`Option.iter` on `find_push \"ctrl_out\"`).\n\n## Questions\n\n1. **Layer boundary:** Is this the right layer for the fix? The channel was created by `patch_ctrl_out` in the desugarer. Should the desugarer not create the channel in the first place? Or is it correct for the desugarer to create it (the agent DOES have a ctrl_out port) and let the routing compiler handle the \"no consumer\" case?\n\n2. **Existing tests on the routing table:** The test file `test/test_routing.ml` tests the routing compiler. Does it test the `_, 0` case? Will changing this break existing routing tests?\n\n3. **Impact on the fabric:** When routes are dropped, no PUSH socket is created for ctrl_out. The bridge calls `find_push \"ctrl_out\"` and gets `None`. `Option.iter` skips the send. The demux tries to send EOF on ctrl_out_sock (`Option.iter (fun sock -> send_eof ...)`), gets `None`, skips. Is there any code path in the fabric or bridge that REQUIRES a ctrl_out PUSH socket to exist?\n\n4. **Impact on telemetry:** The same case (`_, 0`) affects telemetry channels with no consumer. If an agent has telemetry output but the pipeline doesn't wire it, the telemetry PUSH socket would not be created. The bridge would discard telemetry data. Is this the right behaviour? Or should telemetry always have a socket (for observability)?\n\n5. **The `patch_ctrl_out` question:** Looking at the desugarer, `patch_ctrl_out` adds ctrl_out for agents with control input but no explicit ctrl_out wiring. Should this function also add a consumer (discard process)? That would make the routing table valid without changing `resolve_channels`. But it adds a phantom process.\n\nPlease evaluate the fix against the system architecture and recommend whether this is sufficient or whether companion changes are needed."
}## Review verdict: the fix is correct and at the right layer Here are the key findings from tracing through the code: **The routing compiler is the right place to fix this.** `patch_ctrl_out` (desugar.ml:31-55) is doing its job correctly: the agent *has* a ctrl_out port, so the desugarer names it. The routing compiler's job is to compile that topology into operational wiring, and a channel with writers but no readers is dead code. Dropping the routes is the operational realisation of nu-restriction when a channel has no continuation on one side. **The bridge is already designed for this.** I traced every reference to `ctrl_out` and `telemetry` sockets in `/private/tmp/plmb-037-step5b/lib/fabric/bridge.ml`: - Line 346: `find_push "ctrl_out"` uses `List.assoc_opt` -- returns `None`, no crash - Lines 368-370: `Option.iter` on the demux send -- skips when `None` - Lines 422-424: EOF sentinel -- `Option.iter`, skips when `None` The `on_observe` callback (bridge.ml:374-381) fires on the parsed payload *before* the socket send, so telemetry observation works even without a telemetry route. **The fabric socket creation** (`/private/tmp/plmb-037-step5b/lib/fabric/fabric.ml` lines 112-136, 203-218) derives sockets from routes via `List.filter_map`. No routes means no sockets -- the subprocess handler receives an empty list for that port, which is correct. **One mandatory companion change:** `/private/tmp/plmb-037-step5b/test/test_routing.ml` has no test for the `_, 0` case. Every existing resolution test involves channels with both writers and readers. A test is needed that compiles a pipeline with an agent that has ctrl_in (triggering `patch_ctrl_out`) but no explicit ctrl_out consumer, and asserts the table compiles cleanly with no channel names remaining in routes. **Do not add a phantom discard process in the desugarer.** That crosses the layer boundary (desugarer making operational decisions), adds invisible processes to logs and metrics, and consumes resources for data nobody wants. Full analysis is in the plan file at `/Users/ww/.claude/plans/abundant-cuddling-oasis-agent-a93fcbec86eba39fd.md`.
agentId: a93fcbec86eba39fd (for resuming to continue this agent's work if needed) <usage>total_tokens: 45889 tool_uses: 11 duration_ms: 87912</usage>
Here is my review.
---
## Verdict: The fix is principled. Drop the routes.
The change from re-emitting dangling routes to dropping them is categorically sound. But there is a deeper fix available, and the comment and log levels deserve attention.
---
### Question 1: Is dropping readerless writer routes categorically correct as discard?
**Yes.** In the nu-restriction interpretation of composition, `f ; g = (nu c)(f(in, c) | g(c, out))`, a channel `c` that is bound but has no reader is an unobserved name. In an asynchronous process calculus (which this is -- ZMQ PUSH/PULL is asynchronous), a send on a restricted channel with no receiver is simply lost. Dropping the route is the correct operational realisation: no socket is created, the bridge's `Option.iter` skips the send, data goes nowhere. This is `f ; !_A` -- composition with the terminal morphism.
**However, there is a categorical subtlety worth confronting.** The design sketch is explicit: "Every output channel must go somewhere -- discard is how you explicitly throw away a stream you don't need. The static form inserts it automatically on unconnected channels." Implicit discard at the routing compiler level is not the same as explicit discard in the algebra.
The desugarer (`patch_ctrl_out`) creates an implicit `ctrl_out` channel with no consumer. The principled long-term fix is for `patch_ctrl_out` to also emit `spawn discard(ctrl_out_channel)`. Then the routing table sees a writer (agent) and a reader (discard process), the cross-join works normally, and no special case is needed in `resolve_channels`. The `(_, 0)` route-drop becomes a safety net for unanticipated edge cases, not the primary mechanism for handling ctrl_out.
**Recommendation:** Accept the route-dropping fix as an interim measure. File a follow-up for the desugarer to emit explicit discard spawns on unconnected secondary outputs.
### Question 2: Telemetry channels with no consumer
**(a) is correct: silent discard.** Telemetry is a secondary output -- an observation channel. If an agent has type `A -> B`, telemetry extends this to `A -> B x T`. When nobody observes, the correct move is to post-compose with `id x !_T`: keep the primary output, discard telemetry. This is the default when no observer is wired.
Option (b) (explicit discard process) is categorically identical but wastes an inline fibre for no semantic gain. Option (c) (error) is wrong -- it would force every telemetry-capable agent to have a consumer, coupling infrastructure concerns to pipeline topology. Telemetry is opt-in observation, not a mandatory output.
The comment should say "unobserved secondary outputs (ctrl_out, telemetry)" rather than just "telemetry sinks", since the actual trigger is ctrl_out.
### Question 3: Is the asymmetry between `(_, 0)` and `(0, _)` principled?
**Yes, the asymmetry is principled. The severity levels are directionally correct.**
- `(0, _)` -- readers with no writers: This is `bot ; g`. The readers will block forever waiting for data that never arrives. This is a **liveness violation** -- a deadlock risk. The downstream processes are still spawned and will hang on recv. A warning is the minimum; this case is arguably a compiler bug and should be promoted to an error.
- `(_, 0)` -- writers with no readers: This is `f ; top`. The writer sends into the void. No deadlock risk in an async model. The data is lost, which is fine when intentional. `Log.debug` is appropriate.
The asymmetry reflects the asymmetry between liveness and safety: an unread channel is safe (data is lost, nothing blocks); an unwritten channel is a liveness hazard (readers block forever). Consider promoting the `(0, _)` case from warning to error -- if a channel has readers but no writer after desugaring, that is almost certainly a compiler bug.
### Question 4: Does dropping routes affect EOF propagation?
**No issue.** The reasoning is:
1. The bridge sends EOF sentinels on all output sockets when the demux loop exits (lines 419-427 of `bridge.ml`).
2. With no route for `ctrl_out`, the fabric creates no PUSH socket. The bridge holds `ctrl_out_sock = None`.
3. `Option.iter` on `None` is a no-op -- no EOF is sent.
4. No downstream process exists to receive EOF on ctrl_out, so nobody hangs.
The EOF propagation invariant is: "every process that has a reader receives an EOF when its writer terminates." With no reader, the invariant holds vacuously. The supervisor tracks process exit via waitpid, not EOF on individual ports, so missing EOF on an unwired port has no shutdown consequences.
---
### Recommended implementation
For `routing.ml` lines 167-169:
```ocaml
| _, 0 ->
(* Writers with no readers: unobserved secondary output (ctrl_out,
telemetry). Drop routes -- equivalent to implicit discard (!).
The bridge handles missing PUSH sockets via Option.iter. *)
Log.debug "routing"
["event", `String "channel_no_readers";
"channel", `String ch_name;
"n_writers", `Int n_writers]
```
**Follow-up for the desugarer** (separate change): in `patch_ctrl_out`, after adding the implicit ctrl_out channel, also emit `spawn discard(ctrl_out_channel)`. This makes discard explicit in the algebra, and the `(_, 0)` case in `resolve_channels` becomes a pure safety net rather than the mechanism for handling unobserved outputs.agentId: a18bfaf491488c94a (for resuming to continue this agent's work if needed) <usage>total_tokens: 39770 tool_uses: 13 duration_ms: 121045</usage>
Both reviewers agree the fix is principled. Let me summarise and update the plan:
Theory critic: Dropping routes is categorically sound — f ; ⊤ = ⊤. Recommends a follow-up for the desugarer to emit explicit discard spawns, making this a safety net. The (0, _) case (readers no writers) should be promoted from warning to error (liveness violation).
Architect: The routing compiler is the right layer. The bridge already handles missing sockets. Mandatory companion change: add a routing test for the _, 0 case. Do NOT add phantom discard processes in the desugarer (crosses layer boundary).
The two reviewers disagree on the follow-up: theory critic wants the desugarer to emit explicit discard, architect says don't. Both agree the routing fix is correct for now. I'll note both views and defer the follow-up.
## Review Findings (incorporated) **Theory critic:** The bridge's internal coordination is sound — implementation detail within the natural transformation, invisible to the algebra. Lifecycle asymmetry (data drives shutdown, control subordinate) is correct per operational semantics. **Architect:** The simpler fix is to not create sockets for dangling routes. The `resolve_channels` fix achieves this at the routing layer. The `Eio.Promise` is a tactical fix for a scenario that the routing fix eliminates. **Engineer:** If the promise is needed in future, guard `Promise.resolve` with `Atomic.compare_and_set` (not idempotent). `Fiber.first` + `T.recv` cancellation via `await_readable` is correct.
## Review Findings ### Round 1 (bridge mux hang — no longer needed) **Theory critic:** Bridge internal coordination is sound. Lifecycle asymmetry correct. Promise-based fix principled but check if needed. **Architect:** Simpler fix is routing-level. Don't create sockets for dangling routes. Promise is a tactical fix for a scenario the routing fix eliminates. **Engineer:** `Fiber.first` + `T.recv` cancellation correct. `Promise.resolve` not idempotent — must guard. Defer to follow-up. ### Round 2 (resolve_channels fix) **Theory critic:** Dropping readerless writer routes is categorically sound: `f ; ⊤ = ⊤` (composition with terminal morphism). The `(_, 0)` / `(0, _)` asymmetry is principled: unread channel is safe (data lost, nothing blocks); unwritten channel is a liveness hazard (readers block). Suggests promoting `(0, _)` from warning to error. Recommends follow-up: desugarer emits explicit discard spawns for unconnected secondary outputs. **Architect:** Routing compiler is the right layer. Bridge already handles missing sockets (confirmed by code trace). Mandatory: add a routing test for the `(_, 0)` case. Do NOT add phantom discard in the desugarer — crosses layer boundary, consumes resources for data nobody wants. **Disagreement on follow-up:** Theory wants explicit discard in algebra; architect says it crosses layer boundaries. Defer to follow-up ticket — both agree the routing fix is correct now.
The file /Users/ww/.claude/plans/abundant-cuddling-oasis.md has been updated successfully.
## Files to Modify | File | Change | |------|--------| | `lib/runtime/routing.ml` | Fix `_, 0` case: drop writer routes for readerless channels | | `bin/plumb/main.ml` | Catch `Failure` from `Routing.compile` as exit 2 | | Category B investigation | TBD — may need `bin/plumb/main.ml` or fixture changes |
## Files to Modify | File | Change | |------|--------| | `lib/runtime/routing.ml` | Fix `_, 0` case: drop writer routes for readerless channels | | `bin/plumb/main.ml` | Catch `Failure` from `Routing.compile` as exit 2 | | `test/test_routing.ml` | Add test: agent with ctrl_out but no consumer compiles cleanly | | Category B investigation | TBD — may need `bin/plumb/main.ml` or fixture changes |
The file /Users/ww/.claude/plans/abundant-cuddling-oasis.md has been updated successfully.
{
"allowedPrompts": [
{
"tool": "Bash",
"prompt": "build project with nix develop and dune"
},
{
"tool": "Bash",
"prompt": "run tests with nix develop and dune"
},
{
"tool": "Bash",
"prompt": "run manual pipeline tests"
},
{
"tool": "Bash",
"prompt": "check process status"
},
{
"tool": "Bash",
"prompt": "git operations in worktree"
}
],
"plan": "# PLMB-037 Step 5b: Complete Switchover — Fix Plan\n\n## Context\n\nThe plumb binary switchover from `Exec.execute` to\n`Fabric.execute` + `Bridge.make_handler` is in progress on branch\n`plmb-037-step5b` (worktree `/tmp/plmb-037-step5b`). The core\nswitchover is implemented. All inline-only pipeline tests pass\n(identity, copy, merge, barrier, filter, map, wiring, tensor,\nformat/parse JSON — 66 tests).\n\n21 test failures remain. No hanging — earlier hangs were from\nmissing EOF propagation in inline morphisms (fixed). The failures\nfall into three categories, all fixable.\n\n## Failure Analysis\n\n### Category A: Exit 1 instead of Exit 2 (12 failures)\n\nTests: barrier type mismatch, project OOB, direct/named type\nmismatch, heterogeneous merge, product type checking, named spawn\nerrors.\n\n**Cause:** `Routing.compile` raises bare `Failure` exceptions for\nvalidation errors. The old `Exec.execute` caught these internally\nand converted them to `Error.Process_error` (exit 2). The new\n`main.ml` calls `Routing.compile` directly — `Failure` propagates\nto `Entry.main` which catches it as an unknown exception (exit 1).\n\n**Fix:** Wrap the `Routing.compile` call in main.ml with a handler\nthat converts `Failure msg` to `Error.Process_error (Error.make\nConfig \"config_error\" msg)`. This matches what `Exec.execute` did.\n\n### Category B: Exit 0 instead of Exit 2 (2 failures)\n\nTests: `_format_json wrong type`, `_parse_json wrong type`.\n\n**Cause:** These fixtures have type errors that the old Exec path\ncaught during validation. Need to check what validation step the\nnew path skips.\n\nLooking at the fixtures:\n- `format_json_wrong_type.plumb` — uses `_format_json` with wrong\n input/output types\n- `parse_json_wrong_type.plumb` — same for `_parse_json`\n\nThe old Exec called `Validate.build_type_env` and ran validation\nchecks internally. The new main.ml calls\n`Spec.load_main_from_file` which validates, then calls\n`Routing.compile`. The format/parse_json morphisms are compiler-\ninserted by the desugarer — their type checking may happen at a\nstage the new path doesn't reach.\n\n**Fix:** Investigate the specific validation that these fixtures\nexercise. Likely needs the same error-catching as Category A, or\nthe type error is caught at a stage the fabric path does reach.\n\n### Category C: resolve_channels crash (7 failures)\n\nTests: agent control port (accepted, mismatch, telemetry, wiring\ncontrol), eof-control (both).\n\n**Cause:** Two bugs in `resolve_channels` (routing.ml:165–193):\n\n1. **Writers with no readers** (`_, 0` case, line 167–169): The\n code re-emits writer routes unchanged. But these routes have a\n channel name as `dst` (e.g., `smart0:ctrl_out → __wc0:in`).\n After resolution, channel names must not appear as src or dst.\n The validation at line 200–202 catches this and crashes.\n\n This affects ctrl_out and telemetry channels with no consumer.\n The `patch_ctrl_out` desugarer (desugar.ml:31–55) adds implicit\n `ctrl_out` channels (`__wc0`) that trigger this.\n\n2. **Readers with no writers** (`0, _` case, line 170–174): The\n code logs a warning but the reader routes were already NOT added\n to `resolved` (correct). However, the same pattern — a channel\n with no upstream — affects ctrl_in channels declared but unused.\n Currently handled correctly (routes dropped, warning logged).\n\nOn main (old Exec path), these don't crash because `Exec.execute`\ncreates ctrl pipes internally and doesn't use the routing table.\nThe routing table is new (PLMB-037 Step 3) and `patch_ctrl_out`'s\nimplicit channels weren't tested against it.\n\n**Root fix:** Both dangling-channel cases in `resolve_channels`\nshould drop routes cleanly:\n\n- `_, 0` (writers, no readers): Drop the routes. The channel's\n data goes nowhere. The bridge already handles missing PUSH\n sockets via `Option.iter` — it silently discards data for ports\n with no socket. This is categorically correct: a ν-bound channel\n with no reader is a discard (⊤).\n\n- `0, _` (readers, no writers): Already correct (routes dropped,\n warning logged). A ν-bound channel with no writer is an empty\n stream (⊥). Readers never receive data.\n\nThe bridge handles both cases gracefully:\n- No PUSH socket for ctrl_out/telemetry → `find_push` returns\n `None` → `Option.iter` skips the send → data discarded\n- No PULL socket for ctrl_in → `find_pull` returns `None` →\n `mux_source_done()` called immediately → no blocked fibre\n\n### Pre-existing (1 failure)\n\n`wiring_telemetry_source` — exit 2 on both main and branch (can't\nfind `js` module). Not caused by this switchover. Skip for now.\n\n## Implementation Plan\n\n### Step 1: Fix resolve_channels (Category C)\n\n**File:** `lib/runtime/routing.ml`, lines 165–178\n\nThe `_, 0` case (writers with no readers) currently re-emits\nwriter routes unchanged, leaving channel names as dst:\n\n```ocaml\n| _, 0 ->\n (* Writers with no readers — telemetry sinks. Re-emit unchanged. *)\n List.iter (fun w -> resolved := w :: !resolved) ch_writers\n```\n\nFix: drop the routes entirely. A channel with writers but no\nreaders is a discard (⊤). The data goes nowhere. The bridge\nhandles missing PUSH sockets via `Option.iter` — it silently\ndiscards data for ports with no ZMQ socket.\n\n```ocaml\n| _, 0 ->\n (* Writers with no readers — ν-bound channel as discard (⊤).\n Drop the routes: the data goes nowhere. The bridge handles\n missing PUSH sockets via Option.iter (silent discard). *)\n Runtime.Log.debug \"routing\"\n [\"event\", `String \"channel_no_readers\";\n \"channel\", `String ch_name;\n \"n_writers\", `Int n_writers]\n```\n\nThe `0, _` case (readers with no writers) is already correct:\nwarning logged, routes not emitted.\n\n### Step 2: Fix error handling (Category A)\n\n**File:** `bin/plumb/main.ml`\n\nWrap the routing table compilation in a handler:\n\n```ocaml\nlet routing_table =\n try Routing.compile ~prog ~ports ~pb\n ~type_env ~main_binding:main ~seeded:pb.pb_seeded\n with Failure msg ->\n Runtime.Error.raise_ Config \"config_error\" msg\nin\n```\n\nThis converts bare `Failure` from `Routing.compile` into\n`Process_error` which `Entry.main` catches at exit 2.\n\n### Step 3: Investigate Category B\n\n**Files:** `test/fixtures/format_json_wrong_type.plumb`,\n`test/fixtures/parse_json_wrong_type.plumb`\n\nCheck what type error these exercise. Likely the validation runs\nduring `Spec.load_main_from_file` (which both old and new paths\ncall), so the error should be caught. If they're getting exit 0,\nthe binary is somehow not detecting the error. Investigate and fix.\n\n### Step 4: Bridge ctrl_in mux (deferred)\n\nThe ctrl_in hang analysis from the reviews is **not needed for this\nstep**. The `resolve_channels` fix (Step 1) ensures that ctrl_in\nchannels with no writers produce no PULL sockets. `find_pull\n\"ctrl_in\"` returns `None` and the bridge correctly calls\n`mux_source_done()` immediately.\n\nThe `Eio.Promise` fix is only needed if a ctrl_in channel has an\nupstream writer that never sends EOF. This is a future concern\n(session protocols on ctrl channels). Defer to a follow-up ticket.\n\n## Review Findings\n\n### Round 1 (bridge mux hang — no longer needed)\n\n**Theory critic:** Bridge internal coordination is sound. Lifecycle\nasymmetry correct. Promise-based fix principled but check if\nneeded.\n\n**Architect:** Simpler fix is routing-level. Don't create sockets\nfor dangling routes. Promise is a tactical fix for a scenario the\nrouting fix eliminates.\n\n**Engineer:** `Fiber.first` + `T.recv` cancellation correct.\n`Promise.resolve` not idempotent — must guard. Defer to follow-up.\n\n### Round 2 (resolve_channels fix)\n\n**Theory critic:** Dropping readerless writer routes is\ncategorically sound: `f ; ⊤ = ⊤` (composition with terminal\nmorphism). The `(_, 0)` / `(0, _)` asymmetry is principled:\nunread channel is safe (data lost, nothing blocks); unwritten\nchannel is a liveness hazard (readers block). Suggests promoting\n`(0, _)` from warning to error. Recommends follow-up: desugarer\nemits explicit discard spawns for unconnected secondary outputs.\n\n**Architect:** Routing compiler is the right layer. Bridge already\nhandles missing sockets (confirmed by code trace). Mandatory:\nadd a routing test for the `(_, 0)` case. Do NOT add phantom\ndiscard in the desugarer — crosses layer boundary, consumes\nresources for data nobody wants.\n\n**Disagreement on follow-up:** Theory wants explicit discard in\nalgebra; architect says it crosses layer boundaries. Defer to\nfollow-up ticket — both agree the routing fix is correct now.\n\n## Files to Modify\n\n| File | Change |\n|------|--------|\n| `lib/runtime/routing.ml` | Fix `_, 0` case: drop writer routes for readerless channels |\n| `bin/plumb/main.ml` | Catch `Failure` from `Routing.compile` as exit 2 |\n| `test/test_routing.ml` | Add test: agent with ctrl_out but no consumer compiles cleanly |\n| Category B investigation | TBD — may need `bin/plumb/main.ml` or fixture changes |\n\n## Verification\n\n1. `nix develop --command dune build --root=.`\n2. `nix develop --command dune test --root=.` — expect ≤ 1 failure\n (pre-existing `wiring_telemetry_source`)\n3. Full test suite completes in < 20s\n4. Manual: identity, copy-merge, agent control all work\n5. Agent with empty stdin exits cleanly (no hang)\n"
}The user doesn't want to proceed with this tool use. The tool use was rejected (eg. if it was a file edit, the new_string was NOT written to the file). To tell you how to proceed, the user said: explicit discards in the algebra are correct. optimising them away in the runtime is correct. without explicit discards the program's boundary is wonky.
Good point — the algebra requires explicit discards for unconnected secondary outputs. Without them, the program has an output port that goes nowhere, which is a malformed boundary. The correct approach is:
spawn discard(ctrl_out_channel) when adding an implicit ctrl_out channel — makes the algebra explicitdrain_discard fibre that just reads and drops messagesLet me update the plan:
### Step 1: Fix resolve_channels (Category C)
**File:** `lib/runtime/routing.ml`, lines 165–178
The `_, 0` case (writers with no readers) currently re-emits
writer routes unchanged, leaving channel names as dst:
```ocaml
| _, 0 ->
(* Writers with no readers — telemetry sinks. Re-emit unchanged. *)
List.iter (fun w -> resolved := w :: !resolved) ch_writers
```
Fix: drop the routes entirely. A channel with writers but no
readers is a discard (⊤). The data goes nowhere. The bridge
handles missing PUSH sockets via `Option.iter` — it silently
discards data for ports with no ZMQ socket.
```ocaml
| _, 0 ->
(* Writers with no readers: unobserved secondary output (ctrl_out,
telemetry). Drop routes -- equivalent to implicit discard (!).
The bridge handles missing PUSH sockets via Option.iter. *)
Runtime.Log.debug "routing"
["event", `String "channel_no_readers";
"channel", `String ch_name;
"n_writers", `Int n_writers]
```
The `0, _` case (readers with no writers) is already correct:
warning logged, routes not emitted.### Step 1: Explicit discards for unconnected secondary outputs (Category C)
The algebra requires explicit discards. Without them, the
program's boundary has an output port that goes nowhere — a
malformed boundary. The desugarer must emit `spawn discard(ch)`
for unconnected secondary outputs, making the discard explicit
in the program. The routing compiler then resolves the channel
normally (writer = agent, reader = discard process). The fabric
runs the discard as an inline `drain_discard` fibre.
**File:** `lib/language/desugar.ml`, `patch_ctrl_out` (lines 31–55)
After adding the implicit ctrl_out channel, also emit a discard
spawn so the channel has a reader:
```ocaml
| Product [Stream _; Stream ctrl_t] when has_named && not has_ctrl_out ->
let ch = fresh "wc" in
extra_channels := { ch_name = ch; ch_type = ctrl_t }
:: !extra_channels;
extra_spawns := { sp_process = "discard";
sp_args = [Positional ch] }
:: !extra_spawns;
{ sp with sp_args = sp.sp_args @ [Labeled ("ctrl_out", ch)] }
```
This makes the program algebraically well-formed: every output
goes somewhere (discard is ⊤ — the terminal morphism). The
routing compiler sees a writer (agent:ctrl_out) and a reader
(discard process), cross-joins them, and produces a valid table.
The fabric's `drain_discard` inline morphism reads and drops
messages — the runtime optimisation of the algebraic discard.
**File:** `lib/runtime/routing.ml`, lines 165–178
Also fix the `_, 0` case as a safety net. Currently it re-emits
writer routes unchanged, leaving channel names as dst (crashes
validation). Change to drop routes:
```ocaml
| _, 0 ->
(* Writers with no readers: safety net for unresolved discards.
Categorically this is f ; ⊤ — drop the routes. The bridge
handles missing PUSH sockets via Option.iter. *)
Log.debug "routing"
["event", `String "channel_no_readers";
"channel", `String ch_name;
"n_writers", `Int n_writers]
```
The `0, _` case (readers with no writers) is already correct:
warning logged, routes not emitted.<tool_use_error>String to replace not found in file.
String: ### Step 1: Fix resolve_channels (Category C)
**File:** `lib/runtime/routing.ml`, lines 165–178
The `_, 0` case (writers with no readers) currently re-emits
writer routes unchanged, leaving channel names as dst:
```ocaml
| _, 0 ->
(* Writers with no readers — telemetry sinks. Re-emit unchanged. *)
List.iter (fun w -> resolved := w :: !resolved) ch_writers
```
Fix: drop the routes entirely. A channel with writers but no
readers is a discard (⊤). The data goes nowhere. The bridge
handles missing PUSH sockets via `Option.iter` — it silently
discards data for ports with no ZMQ socket.
```ocaml
| _, 0 ->
(* Writers with no readers: unobserved secondary output (ctrl_out,
telemetry). Drop routes -- equivalent to implicit discard (!).
The bridge handles missing PUSH sockets via Option.iter. *)
Runtime.Log.debug "routing"
["event", `String "channel_no_readers";
"channel", `String ch_name;
"n_writers", `Int n_writers]
```
The `0, _` case (readers with no writers) is already correct:
warning logged, routes not emitted.</tool_use_error>{
"file_path": "/Users/ww/.claude/plans/abundant-cuddling-oasis.md",
"offset": 75,
"limit": 50
} 75→2. **Readers with no writers** (`0, _` case, line 170–174): The
76→ code logs a warning but the reader routes were already NOT added
77→ to `resolved` (correct). However, the same pattern — a channel
78→ with no upstream — affects ctrl_in channels declared but unused.
79→ Currently handled correctly (routes dropped, warning logged).
80→
81→On main (old Exec path), these don't crash because `Exec.execute`
82→creates ctrl pipes internally and doesn't use the routing table.
83→The routing table is new (PLMB-037 Step 3) and `patch_ctrl_out`'s
84→implicit channels weren't tested against it.
85→
86→**Root fix:** Both dangling-channel cases in `resolve_channels`
87→should drop routes cleanly:
88→
89→- `_, 0` (writers, no readers): Drop the routes. The channel's
90→ data goes nowhere. The bridge already handles missing PUSH
91→ sockets via `Option.iter` — it silently discards data for ports
92→ with no socket. This is categorically correct: a ν-bound channel
93→ with no reader is a discard (⊤).
94→
95→- `0, _` (readers, no writers): Already correct (routes dropped,
96→ warning logged). A ν-bound channel with no writer is an empty
97→ stream (⊥). Readers never receive data.
98→
99→The bridge handles both cases gracefully:
100→- No PUSH socket for ctrl_out/telemetry → `find_push` returns
101→ `None` → `Option.iter` skips the send → data discarded
102→- No PULL socket for ctrl_in → `find_pull` returns `None` →
103→ `mux_source_done()` called immediately → no blocked fibre
104→
105→### Pre-existing (1 failure)
106→
107→`wiring_telemetry_source` — exit 2 on both main and branch (can't
108→find `js` module). Not caused by this switchover. Skip for now.
109→
110→## Implementation Plan
111→
112→### Step 1: Fix resolve_channels (Category C)
113→
114→**File:** `lib/runtime/routing.ml`, lines 165–178
115→
116→The `_, 0` case (writers with no readers) currently re-emits
117→writer routes unchanged, leaving channel names as dst:
118→
119→```ocaml
120→| _, 0 ->
121→ (* Writers with no readers — telemetry sinks. Re-emit unchanged. *)
122→ List.iter (fun w -> resolved := w :: !resolved) ch_writers
123→```
124→{
"file_path": "/Users/ww/.claude/plans/abundant-cuddling-oasis.md",
"offset": 124,
"limit": 40
}124→ 125→Fix: drop the routes entirely. A channel with writers but no 126→readers is a discard (⊤). The data goes nowhere. The bridge 127→handles missing PUSH sockets via `Option.iter` — it silently 128→discards data for ports with no ZMQ socket. 129→ 130→```ocaml 131→| _, 0 -> 132→ (* Writers with no readers — ν-bound channel as discard (⊤). 133→ Drop the routes: the data goes nowhere. The bridge handles 134→ missing PUSH sockets via Option.iter (silent discard). *) 135→ Runtime.Log.debug "routing" 136→ ["event", `String "channel_no_readers"; 137→ "channel", `String ch_name; 138→ "n_writers", `Int n_writers] 139→``` 140→ 141→The `0, _` case (readers with no writers) is already correct: 142→warning logged, routes not emitted. 143→ 144→### Step 2: Fix error handling (Category A) 145→ 146→**File:** `bin/plumb/main.ml` 147→ 148→Wrap the routing table compilation in a handler: 149→ 150→```ocaml 151→let routing_table = 152→ try Routing.compile ~prog ~ports ~pb 153→ ~type_env ~main_binding:main ~seeded:pb.pb_seeded 154→ with Failure msg -> 155→ Runtime.Error.raise_ Config "config_error" msg 156→in 157→``` 158→ 159→This converts bare `Failure` from `Routing.compile` into 160→`Process_error` which `Entry.main` catches at exit 2. 161→ 162→### Step 3: Investigate Category B 163→
**Root fix:** Both dangling-channel cases in `resolve_channels` should drop routes cleanly: - `_, 0` (writers, no readers): Drop the routes. The channel's data goes nowhere. The bridge already handles missing PUSH sockets via `Option.iter` — it silently discards data for ports with no socket. This is categorically correct: a ν-bound channel with no reader is a discard (⊤). - `0, _` (readers, no writers): Already correct (routes dropped, warning logged). A ν-bound channel with no writer is an empty stream (⊥). Readers never receive data. The bridge handles both cases gracefully: - No PUSH socket for ctrl_out/telemetry → `find_push` returns `None` → `Option.iter` skips the send → data discarded - No PULL socket for ctrl_in → `find_pull` returns `None` → `mux_source_done()` called immediately → no blocked fibre
**Root fix:** The algebra requires explicit discards for unconnected secondary outputs. Without them the program's boundary is malformed — an output port that goes nowhere. - **Desugarer** must emit `spawn discard(ctrl_out_channel)` when adding implicit ctrl_out channels. This makes the discard explicit in the algebra (`f ; ⊤`). The routing compiler then resolves the channel normally (writer = agent, reader = discard). - **Routing compiler** (`resolve_channels`): also fix the `_, 0` case as a safety net. Drop routes instead of re-emitting with dangling channel names. - `0, _` (readers, no writers): Already correct (routes dropped, warning logged). Semantics: empty stream (⊥). The bridge handles both cases gracefully at runtime: - No PUSH socket → `Option.iter` skips send → data discarded - No PULL socket → `mux_source_done()` immediate → no block
The file /Users/ww/.claude/plans/abundant-cuddling-oasis.md has been updated successfully.
### Step 1: Fix resolve_channels (Category C)
**File:** `lib/runtime/routing.ml`, lines 165–178
The `_, 0` case (writers with no readers) currently re-emits
writer routes unchanged, leaving channel names as dst:
```ocaml
| _, 0 ->
(* Writers with no readers — telemetry sinks. Re-emit unchanged. *)
List.iter (fun w -> resolved := w :: !resolved) ch_writers
```
Fix: drop the routes entirely. A channel with writers but no
readers is a discard (⊤). The data goes nowhere. The bridge
handles missing PUSH sockets via `Option.iter` — it silently
discards data for ports with no ZMQ socket.
```ocaml
| _, 0 ->
(* Writers with no readers — ν-bound channel as discard (⊤).
Drop the routes: the data goes nowhere. The bridge handles
missing PUSH sockets via Option.iter (silent discard). *)
Runtime.Log.debug "routing"
["event", `String "channel_no_readers";
"channel", `String ch_name;
"n_writers", `Int n_writers]
```
The `0, _` case (readers with no writers) is already correct:
warning logged, routes not emitted.### Step 1: Explicit discards in the desugarer (Category C)
**File:** `lib/language/desugar.ml`, `patch_ctrl_out` (lines 31–55)
The algebra requires every output port to have a consumer.
`patch_ctrl_out` adds implicit ctrl_out channels for agents with
control input, but doesn't add a consumer. Fix: also emit
`spawn discard(ctrl_out_channel)`.
Add an `extra_spawns` ref alongside `extra_channels`:
```ocaml
let extra_spawns = ref [] in
```
In the `Product [Stream _; Stream ctrl_t]` branch, after creating
the channel, add a discard spawn:
```ocaml
extra_spawns := { sp_process = "discard";
sp_args = [Positional ch] }
:: !extra_spawns;
```
Return the patched plumb body with both extra channels and spawns:
```ocaml
{ pb with
pb_channels = pb.pb_channels @ List.rev !extra_channels;
pb_spawns = patched_spawns @ List.rev !extra_spawns }
```
The routing compiler resolves the channel normally: agent writes
to ctrl_out, discard reads from it. The fabric runs discard as
an inline `drain_discard` fibre (already exists in inline.ml).
### Step 1b: Safety net in resolve_channels
**File:** `lib/runtime/routing.ml`, lines 167–169
Change the `_, 0` case from re-emitting routes to dropping them:
```ocaml
| _, 0 ->
(* Writers with no readers: safety net. Algebraically this is
f ; ⊤ — the data goes nowhere. The desugarer should have
emitted an explicit discard; if it didn't, drop silently. *)
Log.debug "routing"
["event", `String "channel_no_readers";
"channel", `String ch_name;
"n_writers", `Int n_writers]
```The file /Users/ww/.claude/plans/abundant-cuddling-oasis.md has been updated successfully.
## Files to Modify | File | Change | |------|--------| | `lib/runtime/routing.ml` | Fix `_, 0` case: drop writer routes for readerless channels | | `bin/plumb/main.ml` | Catch `Failure` from `Routing.compile` as exit 2 | | `test/test_routing.ml` | Add test: agent with ctrl_out but no consumer compiles cleanly | | Category B investigation | TBD — may need `bin/plumb/main.ml` or fixture changes |
## Files to Modify | File | Change | |------|--------| | `lib/language/desugar.ml` | `patch_ctrl_out`: emit `spawn discard(ch)` for implicit ctrl_out | | `lib/runtime/routing.ml` | Safety net: `_, 0` drops routes instead of re-emitting | | `bin/plumb/main.ml` | Catch `Failure` from `Routing.compile` as exit 2 | | `test/test_routing.ml` | Add test: agent with ctrl_out compiles with discard in table | | Category B investigation | TBD — may need `bin/plumb/main.ml` or fixture changes |
The file /Users/ww/.claude/plans/abundant-cuddling-oasis.md has been updated successfully.
**Disagreement on follow-up:** Theory wants explicit discard in algebra; architect says it crosses layer boundaries. Defer to follow-up ticket — both agree the routing fix is correct now.
**Resolution:** Explicit discards in the algebra are correct — without them the program's boundary is malformed. Optimising them away in the runtime (e.g., dropping routes as a safety net) is a valid runtime optimisation. Both layers are needed: desugarer emits explicit discard (algebra), routing compiler drops dangling routes (safety net).
The file /Users/ww/.claude/plans/abundant-cuddling-oasis.md has been updated successfully.
{
"allowedPrompts": [
{
"tool": "Bash",
"prompt": "build project with nix develop and dune"
},
{
"tool": "Bash",
"prompt": "run tests with nix develop and dune"
},
{
"tool": "Bash",
"prompt": "run manual pipeline tests"
},
{
"tool": "Bash",
"prompt": "check process status"
},
{
"tool": "Bash",
"prompt": "git operations in worktree"
}
],
"plan": "# PLMB-037 Step 5b: Complete Switchover — Fix Plan\n\n## Context\n\nThe plumb binary switchover from `Exec.execute` to\n`Fabric.execute` + `Bridge.make_handler` is in progress on branch\n`plmb-037-step5b` (worktree `/tmp/plmb-037-step5b`). The core\nswitchover is implemented. All inline-only pipeline tests pass\n(identity, copy, merge, barrier, filter, map, wiring, tensor,\nformat/parse JSON — 66 tests).\n\n21 test failures remain. No hanging — earlier hangs were from\nmissing EOF propagation in inline morphisms (fixed). The failures\nfall into three categories, all fixable.\n\n## Failure Analysis\n\n### Category A: Exit 1 instead of Exit 2 (12 failures)\n\nTests: barrier type mismatch, project OOB, direct/named type\nmismatch, heterogeneous merge, product type checking, named spawn\nerrors.\n\n**Cause:** `Routing.compile` raises bare `Failure` exceptions for\nvalidation errors. The old `Exec.execute` caught these internally\nand converted them to `Error.Process_error` (exit 2). The new\n`main.ml` calls `Routing.compile` directly — `Failure` propagates\nto `Entry.main` which catches it as an unknown exception (exit 1).\n\n**Fix:** Wrap the `Routing.compile` call in main.ml with a handler\nthat converts `Failure msg` to `Error.Process_error (Error.make\nConfig \"config_error\" msg)`. This matches what `Exec.execute` did.\n\n### Category B: Exit 0 instead of Exit 2 (2 failures)\n\nTests: `_format_json wrong type`, `_parse_json wrong type`.\n\n**Cause:** These fixtures have type errors that the old Exec path\ncaught during validation. Need to check what validation step the\nnew path skips.\n\nLooking at the fixtures:\n- `format_json_wrong_type.plumb` — uses `_format_json` with wrong\n input/output types\n- `parse_json_wrong_type.plumb` — same for `_parse_json`\n\nThe old Exec called `Validate.build_type_env` and ran validation\nchecks internally. The new main.ml calls\n`Spec.load_main_from_file` which validates, then calls\n`Routing.compile`. The format/parse_json morphisms are compiler-\ninserted by the desugarer — their type checking may happen at a\nstage the new path doesn't reach.\n\n**Fix:** Investigate the specific validation that these fixtures\nexercise. Likely needs the same error-catching as Category A, or\nthe type error is caught at a stage the fabric path does reach.\n\n### Category C: resolve_channels crash (7 failures)\n\nTests: agent control port (accepted, mismatch, telemetry, wiring\ncontrol), eof-control (both).\n\n**Cause:** Two bugs in `resolve_channels` (routing.ml:165–193):\n\n1. **Writers with no readers** (`_, 0` case, line 167–169): The\n code re-emits writer routes unchanged. But these routes have a\n channel name as `dst` (e.g., `smart0:ctrl_out → __wc0:in`).\n After resolution, channel names must not appear as src or dst.\n The validation at line 200–202 catches this and crashes.\n\n This affects ctrl_out and telemetry channels with no consumer.\n The `patch_ctrl_out` desugarer (desugar.ml:31–55) adds implicit\n `ctrl_out` channels (`__wc0`) that trigger this.\n\n2. **Readers with no writers** (`0, _` case, line 170–174): The\n code logs a warning but the reader routes were already NOT added\n to `resolved` (correct). However, the same pattern — a channel\n with no upstream — affects ctrl_in channels declared but unused.\n Currently handled correctly (routes dropped, warning logged).\n\nOn main (old Exec path), these don't crash because `Exec.execute`\ncreates ctrl pipes internally and doesn't use the routing table.\nThe routing table is new (PLMB-037 Step 3) and `patch_ctrl_out`'s\nimplicit channels weren't tested against it.\n\n**Root fix:** The algebra requires explicit discards for\nunconnected secondary outputs. Without them the program's\nboundary is malformed — an output port that goes nowhere.\n\n- **Desugarer** must emit `spawn discard(ctrl_out_channel)` when\n adding implicit ctrl_out channels. This makes the discard\n explicit in the algebra (`f ; ⊤`). The routing compiler then\n resolves the channel normally (writer = agent, reader = discard).\n\n- **Routing compiler** (`resolve_channels`): also fix the `_, 0`\n case as a safety net. Drop routes instead of re-emitting with\n dangling channel names.\n\n- `0, _` (readers, no writers): Already correct (routes dropped,\n warning logged). Semantics: empty stream (⊥).\n\nThe bridge handles both cases gracefully at runtime:\n- No PUSH socket → `Option.iter` skips send → data discarded\n- No PULL socket → `mux_source_done()` immediate → no block\n\n### Pre-existing (1 failure)\n\n`wiring_telemetry_source` — exit 2 on both main and branch (can't\nfind `js` module). Not caused by this switchover. Skip for now.\n\n## Implementation Plan\n\n### Step 1: Explicit discards in the desugarer (Category C)\n\n**File:** `lib/language/desugar.ml`, `patch_ctrl_out` (lines 31–55)\n\nThe algebra requires every output port to have a consumer.\n`patch_ctrl_out` adds implicit ctrl_out channels for agents with\ncontrol input, but doesn't add a consumer. Fix: also emit\n`spawn discard(ctrl_out_channel)`.\n\nAdd an `extra_spawns` ref alongside `extra_channels`:\n\n```ocaml\nlet extra_spawns = ref [] in\n```\n\nIn the `Product [Stream _; Stream ctrl_t]` branch, after creating\nthe channel, add a discard spawn:\n\n```ocaml\nextra_spawns := { sp_process = \"discard\";\n sp_args = [Positional ch] }\n :: !extra_spawns;\n```\n\nReturn the patched plumb body with both extra channels and spawns:\n\n```ocaml\n{ pb with\n pb_channels = pb.pb_channels @ List.rev !extra_channels;\n pb_spawns = patched_spawns @ List.rev !extra_spawns }\n```\n\nThe routing compiler resolves the channel normally: agent writes\nto ctrl_out, discard reads from it. The fabric runs discard as\nan inline `drain_discard` fibre (already exists in inline.ml).\n\n### Step 1b: Safety net in resolve_channels\n\n**File:** `lib/runtime/routing.ml`, lines 167–169\n\nChange the `_, 0` case from re-emitting routes to dropping them:\n\n```ocaml\n| _, 0 ->\n (* Writers with no readers: safety net. Algebraically this is\n f ; ⊤ — the data goes nowhere. The desugarer should have\n emitted an explicit discard; if it didn't, drop silently. *)\n Log.debug \"routing\"\n [\"event\", `String \"channel_no_readers\";\n \"channel\", `String ch_name;\n \"n_writers\", `Int n_writers]\n```\n\n### Step 2: Fix error handling (Category A)\n\n**File:** `bin/plumb/main.ml`\n\nWrap the routing table compilation in a handler:\n\n```ocaml\nlet routing_table =\n try Routing.compile ~prog ~ports ~pb\n ~type_env ~main_binding:main ~seeded:pb.pb_seeded\n with Failure msg ->\n Runtime.Error.raise_ Config \"config_error\" msg\nin\n```\n\nThis converts bare `Failure` from `Routing.compile` into\n`Process_error` which `Entry.main` catches at exit 2.\n\n### Step 3: Investigate Category B\n\n**Files:** `test/fixtures/format_json_wrong_type.plumb`,\n`test/fixtures/parse_json_wrong_type.plumb`\n\nCheck what type error these exercise. Likely the validation runs\nduring `Spec.load_main_from_file` (which both old and new paths\ncall), so the error should be caught. If they're getting exit 0,\nthe binary is somehow not detecting the error. Investigate and fix.\n\n### Step 4: Bridge ctrl_in mux (deferred)\n\nThe ctrl_in hang analysis from the reviews is **not needed for this\nstep**. The `resolve_channels` fix (Step 1) ensures that ctrl_in\nchannels with no writers produce no PULL sockets. `find_pull\n\"ctrl_in\"` returns `None` and the bridge correctly calls\n`mux_source_done()` immediately.\n\nThe `Eio.Promise` fix is only needed if a ctrl_in channel has an\nupstream writer that never sends EOF. This is a future concern\n(session protocols on ctrl channels). Defer to a follow-up ticket.\n\n## Review Findings\n\n### Round 1 (bridge mux hang — no longer needed)\n\n**Theory critic:** Bridge internal coordination is sound. Lifecycle\nasymmetry correct. Promise-based fix principled but check if\nneeded.\n\n**Architect:** Simpler fix is routing-level. Don't create sockets\nfor dangling routes. Promise is a tactical fix for a scenario the\nrouting fix eliminates.\n\n**Engineer:** `Fiber.first` + `T.recv` cancellation correct.\n`Promise.resolve` not idempotent — must guard. Defer to follow-up.\n\n### Round 2 (resolve_channels fix)\n\n**Theory critic:** Dropping readerless writer routes is\ncategorically sound: `f ; ⊤ = ⊤` (composition with terminal\nmorphism). The `(_, 0)` / `(0, _)` asymmetry is principled:\nunread channel is safe (data lost, nothing blocks); unwritten\nchannel is a liveness hazard (readers block). Suggests promoting\n`(0, _)` from warning to error. Recommends follow-up: desugarer\nemits explicit discard spawns for unconnected secondary outputs.\n\n**Architect:** Routing compiler is the right layer. Bridge already\nhandles missing sockets (confirmed by code trace). Mandatory:\nadd a routing test for the `(_, 0)` case. Do NOT add phantom\ndiscard in the desugarer — crosses layer boundary, consumes\nresources for data nobody wants.\n\n**Resolution:** Explicit discards in the algebra are correct —\nwithout them the program's boundary is malformed. Optimising\nthem away in the runtime (e.g., dropping routes as a safety net)\nis a valid runtime optimisation. Both layers are needed:\ndesugarer emits explicit discard (algebra), routing compiler\ndrops dangling routes (safety net).\n\n## Files to Modify\n\n| File | Change |\n|------|--------|\n| `lib/language/desugar.ml` | `patch_ctrl_out`: emit `spawn discard(ch)` for implicit ctrl_out |\n| `lib/runtime/routing.ml` | Safety net: `_, 0` drops routes instead of re-emitting |\n| `bin/plumb/main.ml` | Catch `Failure` from `Routing.compile` as exit 2 |\n| `test/test_routing.ml` | Add test: agent with ctrl_out compiles with discard in table |\n| Category B investigation | TBD — may need `bin/plumb/main.ml` or fixture changes |\n\n## Verification\n\n1. `nix develop --command dune build --root=.`\n2. `nix develop --command dune test --root=.` — expect ≤ 1 failure\n (pre-existing `wiring_telemetry_source`)\n3. Full test suite completes in < 20s\n4. Manual: identity, copy-merge, agent control all work\n5. Agent with empty stdin exits cleanly (no hang)\n"
}The user doesn't want to proceed with this tool use. The tool use was rejected (eg. if it was a file edit, the new_string was NOT written to the file). To tell you how to proceed, the user said: routing compiler dropping dangling routes should at least emit a warning, probably an error
The theory critic said the same: (_, 0) — writers with no readers — after the desugarer has had its chance to emit discards, is likely a compiler bug. A warning at minimum, an error is better.
### Step 1b: Safety net in resolve_channels
**File:** `lib/runtime/routing.ml`, lines 167–169
Change the `_, 0` case from re-emitting routes to dropping them:
```ocaml
| _, 0 ->
(* Writers with no readers: safety net. Algebraically this is
f ; ⊤ — the data goes nowhere. The desugarer should have
emitted an explicit discard; if it didn't, drop silently. *)
Log.debug "routing"
["event", `String "channel_no_readers";
"channel", `String ch_name;
"n_writers", `Int n_writers]
```### Step 1b: Warn on dangling routes in resolve_channels
**File:** `lib/runtime/routing.ml`, lines 167–169
Change the `_, 0` case from re-emitting routes (crash) to
dropping them with a warning. After desugaring, every output
should have a consumer — if it doesn't, the compiler missed
an explicit discard. This is at least a warning, possibly an
error in future (once all desugarer paths emit discards):
```ocaml
| _, 0 ->
(* Writers with no readers after desugaring — the desugarer
should have emitted an explicit discard. Drop the routes
(they would leave channel names in the resolved table) and
warn. Promote to error once all desugar paths are covered. *)
Log.warn "routing"
["event", `String "channel_no_readers";
"channel", `String ch_name;
"n_writers", `Int n_writers]
```The file /Users/ww/.claude/plans/abundant-cuddling-oasis.md has been updated successfully.
{
"allowedPrompts": [
{
"tool": "Bash",
"prompt": "build project with nix develop and dune"
},
{
"tool": "Bash",
"prompt": "run tests with nix develop and dune"
},
{
"tool": "Bash",
"prompt": "run manual pipeline tests"
},
{
"tool": "Bash",
"prompt": "check process status"
},
{
"tool": "Bash",
"prompt": "git operations in worktree"
}
],
"plan": "# PLMB-037 Step 5b: Complete Switchover — Fix Plan\n\n## Context\n\nThe plumb binary switchover from `Exec.execute` to\n`Fabric.execute` + `Bridge.make_handler` is in progress on branch\n`plmb-037-step5b` (worktree `/tmp/plmb-037-step5b`). The core\nswitchover is implemented. All inline-only pipeline tests pass\n(identity, copy, merge, barrier, filter, map, wiring, tensor,\nformat/parse JSON — 66 tests).\n\n21 test failures remain. No hanging — earlier hangs were from\nmissing EOF propagation in inline morphisms (fixed). The failures\nfall into three categories, all fixable.\n\n## Failure Analysis\n\n### Category A: Exit 1 instead of Exit 2 (12 failures)\n\nTests: barrier type mismatch, project OOB, direct/named type\nmismatch, heterogeneous merge, product type checking, named spawn\nerrors.\n\n**Cause:** `Routing.compile` raises bare `Failure` exceptions for\nvalidation errors. The old `Exec.execute` caught these internally\nand converted them to `Error.Process_error` (exit 2). The new\n`main.ml` calls `Routing.compile` directly — `Failure` propagates\nto `Entry.main` which catches it as an unknown exception (exit 1).\n\n**Fix:** Wrap the `Routing.compile` call in main.ml with a handler\nthat converts `Failure msg` to `Error.Process_error (Error.make\nConfig \"config_error\" msg)`. This matches what `Exec.execute` did.\n\n### Category B: Exit 0 instead of Exit 2 (2 failures)\n\nTests: `_format_json wrong type`, `_parse_json wrong type`.\n\n**Cause:** These fixtures have type errors that the old Exec path\ncaught during validation. Need to check what validation step the\nnew path skips.\n\nLooking at the fixtures:\n- `format_json_wrong_type.plumb` — uses `_format_json` with wrong\n input/output types\n- `parse_json_wrong_type.plumb` — same for `_parse_json`\n\nThe old Exec called `Validate.build_type_env` and ran validation\nchecks internally. The new main.ml calls\n`Spec.load_main_from_file` which validates, then calls\n`Routing.compile`. The format/parse_json morphisms are compiler-\ninserted by the desugarer — their type checking may happen at a\nstage the new path doesn't reach.\n\n**Fix:** Investigate the specific validation that these fixtures\nexercise. Likely needs the same error-catching as Category A, or\nthe type error is caught at a stage the fabric path does reach.\n\n### Category C: resolve_channels crash (7 failures)\n\nTests: agent control port (accepted, mismatch, telemetry, wiring\ncontrol), eof-control (both).\n\n**Cause:** Two bugs in `resolve_channels` (routing.ml:165–193):\n\n1. **Writers with no readers** (`_, 0` case, line 167–169): The\n code re-emits writer routes unchanged. But these routes have a\n channel name as `dst` (e.g., `smart0:ctrl_out → __wc0:in`).\n After resolution, channel names must not appear as src or dst.\n The validation at line 200–202 catches this and crashes.\n\n This affects ctrl_out and telemetry channels with no consumer.\n The `patch_ctrl_out` desugarer (desugar.ml:31–55) adds implicit\n `ctrl_out` channels (`__wc0`) that trigger this.\n\n2. **Readers with no writers** (`0, _` case, line 170–174): The\n code logs a warning but the reader routes were already NOT added\n to `resolved` (correct). However, the same pattern — a channel\n with no upstream — affects ctrl_in channels declared but unused.\n Currently handled correctly (routes dropped, warning logged).\n\nOn main (old Exec path), these don't crash because `Exec.execute`\ncreates ctrl pipes internally and doesn't use the routing table.\nThe routing table is new (PLMB-037 Step 3) and `patch_ctrl_out`'s\nimplicit channels weren't tested against it.\n\n**Root fix:** The algebra requires explicit discards for\nunconnected secondary outputs. Without them the program's\nboundary is malformed — an output port that goes nowhere.\n\n- **Desugarer** must emit `spawn discard(ctrl_out_channel)` when\n adding implicit ctrl_out channels. This makes the discard\n explicit in the algebra (`f ; ⊤`). The routing compiler then\n resolves the channel normally (writer = agent, reader = discard).\n\n- **Routing compiler** (`resolve_channels`): also fix the `_, 0`\n case as a safety net. Drop routes instead of re-emitting with\n dangling channel names.\n\n- `0, _` (readers, no writers): Already correct (routes dropped,\n warning logged). Semantics: empty stream (⊥).\n\nThe bridge handles both cases gracefully at runtime:\n- No PUSH socket → `Option.iter` skips send → data discarded\n- No PULL socket → `mux_source_done()` immediate → no block\n\n### Pre-existing (1 failure)\n\n`wiring_telemetry_source` — exit 2 on both main and branch (can't\nfind `js` module). Not caused by this switchover. Skip for now.\n\n## Implementation Plan\n\n### Step 1: Explicit discards in the desugarer (Category C)\n\n**File:** `lib/language/desugar.ml`, `patch_ctrl_out` (lines 31–55)\n\nThe algebra requires every output port to have a consumer.\n`patch_ctrl_out` adds implicit ctrl_out channels for agents with\ncontrol input, but doesn't add a consumer. Fix: also emit\n`spawn discard(ctrl_out_channel)`.\n\nAdd an `extra_spawns` ref alongside `extra_channels`:\n\n```ocaml\nlet extra_spawns = ref [] in\n```\n\nIn the `Product [Stream _; Stream ctrl_t]` branch, after creating\nthe channel, add a discard spawn:\n\n```ocaml\nextra_spawns := { sp_process = \"discard\";\n sp_args = [Positional ch] }\n :: !extra_spawns;\n```\n\nReturn the patched plumb body with both extra channels and spawns:\n\n```ocaml\n{ pb with\n pb_channels = pb.pb_channels @ List.rev !extra_channels;\n pb_spawns = patched_spawns @ List.rev !extra_spawns }\n```\n\nThe routing compiler resolves the channel normally: agent writes\nto ctrl_out, discard reads from it. The fabric runs discard as\nan inline `drain_discard` fibre (already exists in inline.ml).\n\n### Step 1b: Warn on dangling routes in resolve_channels\n\n**File:** `lib/runtime/routing.ml`, lines 167–169\n\nChange the `_, 0` case from re-emitting routes (crash) to\ndropping them with a warning. After desugaring, every output\nshould have a consumer — if it doesn't, the compiler missed\nan explicit discard. This is at least a warning, possibly an\nerror in future (once all desugarer paths emit discards):\n\n```ocaml\n| _, 0 ->\n (* Writers with no readers after desugaring — the desugarer\n should have emitted an explicit discard. Drop the routes\n (they would leave channel names in the resolved table) and\n warn. Promote to error once all desugar paths are covered. *)\n Log.warn \"routing\"\n [\"event\", `String \"channel_no_readers\";\n \"channel\", `String ch_name;\n \"n_writers\", `Int n_writers]\n```\n\n### Step 2: Fix error handling (Category A)\n\n**File:** `bin/plumb/main.ml`\n\nWrap the routing table compilation in a handler:\n\n```ocaml\nlet routing_table =\n try Routing.compile ~prog ~ports ~pb\n ~type_env ~main_binding:main ~seeded:pb.pb_seeded\n with Failure msg ->\n Runtime.Error.raise_ Config \"config_error\" msg\nin\n```\n\nThis converts bare `Failure` from `Routing.compile` into\n`Process_error` which `Entry.main` catches at exit 2.\n\n### Step 3: Investigate Category B\n\n**Files:** `test/fixtures/format_json_wrong_type.plumb`,\n`test/fixtures/parse_json_wrong_type.plumb`\n\nCheck what type error these exercise. Likely the validation runs\nduring `Spec.load_main_from_file` (which both old and new paths\ncall), so the error should be caught. If they're getting exit 0,\nthe binary is somehow not detecting the error. Investigate and fix.\n\n### Step 4: Bridge ctrl_in mux (deferred)\n\nThe ctrl_in hang analysis from the reviews is **not needed for this\nstep**. The `resolve_channels` fix (Step 1) ensures that ctrl_in\nchannels with no writers produce no PULL sockets. `find_pull\n\"ctrl_in\"` returns `None` and the bridge correctly calls\n`mux_source_done()` immediately.\n\nThe `Eio.Promise` fix is only needed if a ctrl_in channel has an\nupstream writer that never sends EOF. This is a future concern\n(session protocols on ctrl channels). Defer to a follow-up ticket.\n\n## Review Findings\n\n### Round 1 (bridge mux hang — no longer needed)\n\n**Theory critic:** Bridge internal coordination is sound. Lifecycle\nasymmetry correct. Promise-based fix principled but check if\nneeded.\n\n**Architect:** Simpler fix is routing-level. Don't create sockets\nfor dangling routes. Promise is a tactical fix for a scenario the\nrouting fix eliminates.\n\n**Engineer:** `Fiber.first` + `T.recv` cancellation correct.\n`Promise.resolve` not idempotent — must guard. Defer to follow-up.\n\n### Round 2 (resolve_channels fix)\n\n**Theory critic:** Dropping readerless writer routes is\ncategorically sound: `f ; ⊤ = ⊤` (composition with terminal\nmorphism). The `(_, 0)` / `(0, _)` asymmetry is principled:\nunread channel is safe (data lost, nothing blocks); unwritten\nchannel is a liveness hazard (readers block). Suggests promoting\n`(0, _)` from warning to error. Recommends follow-up: desugarer\nemits explicit discard spawns for unconnected secondary outputs.\n\n**Architect:** Routing compiler is the right layer. Bridge already\nhandles missing sockets (confirmed by code trace). Mandatory:\nadd a routing test for the `(_, 0)` case. Do NOT add phantom\ndiscard in the desugarer — crosses layer boundary, consumes\nresources for data nobody wants.\n\n**Resolution:** Explicit discards in the algebra are correct —\nwithout them the program's boundary is malformed. Optimising\nthem away in the runtime (e.g., dropping routes as a safety net)\nis a valid runtime optimisation. Both layers are needed:\ndesugarer emits explicit discard (algebra), routing compiler\ndrops dangling routes (safety net).\n\n## Files to Modify\n\n| File | Change |\n|------|--------|\n| `lib/language/desugar.ml` | `patch_ctrl_out`: emit `spawn discard(ch)` for implicit ctrl_out |\n| `lib/runtime/routing.ml` | Safety net: `_, 0` drops routes instead of re-emitting |\n| `bin/plumb/main.ml` | Catch `Failure` from `Routing.compile` as exit 2 |\n| `test/test_routing.ml` | Add test: agent with ctrl_out compiles with discard in table |\n| Category B investigation | TBD — may need `bin/plumb/main.ml` or fixture changes |\n\n## Verification\n\n1. `nix develop --command dune build --root=.`\n2. `nix develop --command dune test --root=.` — expect ≤ 1 failure\n (pre-existing `wiring_telemetry_source`)\n3. Full test suite completes in < 20s\n4. Manual: identity, copy-merge, agent control all work\n5. Agent with empty stdin exits cleanly (no hang)\n"
}