← All work

Case study

When Agentic Retrieval Needed Determinism

Architecture work on the live portfolio RAG demo after an abstract engineering question failed, even though the evidence existed in the corpus.

AI WorkflowsPlatform Performance / Reliability

The challenge

The live portfolio demo failed a question it should have answered: "Is there an example of race condition handling, and what is its relation to idempotency?" The evidence already existed in the anti-overbooking project, but the retrieval path stayed stuck in abstract engineering language and never crossed into the project terms underneath it.

Architecture approach

  • ·Reproduced the failure against the live test backend instead of reasoning from the code alone. The agent spent all three retrieval turns on abstract rewrites and still surfaced only about-arup, not the anti-overbooking project.
  • ·Separated the problem into two parts: the model was still useful for reading and explaining evidence, but the retrieval path had no reliable bridge from abstract concepts to corpus-native language like double-bookings, retries, double-clicks, Redis locks, replay, and dedup.
  • ·Designed a deterministic concept-routing layer for a narrow class of questions: example-of, relation-between, where-did-you-use, and difference-between. For those classes, the system would expand concepts into project terms and likely project candidates before the bounded loop starts.
  • ·Added a companion guard at the design level: if the answer cites only routing documents like about-arup and no real project evidence, the system should not be allowed to conclude that the corpus lacks the answer.
  • ·Documented the trade-off openly and tied it back to public guidance from Anthropic and OpenAI, both of which recommend routing, guardrails, and deterministic control around the parts of agent systems that need to stay reliable.

Tech stack

BedrockPineconeNova ProTypeScriptEval harness

Results

  • The root cause is now clear: this was an abstract-to-concrete retrieval gap, not a missing-context problem.
  • The architecture direction is set before implementation: deterministic routing for this question class, bounded agentic synthesis after retrieval, and an anti-shortcut evidence guard.
  • The trade-off is now explicit in the portfolio instead of hidden in private notes: less autonomy in the retrieval control plane, more reliability in the public demo.
  • The decision is backed by current public guidance from Anthropic and OpenAI, so the reasoning is inspectable, not just asserted.

What I'd do differently at production scale

  • ·This is architecture work ahead of implementation, not a shipped retrieval fix. I wanted the trade-off recorded before the code changed, because that is the honest order: first understand the failure, then decide what should become deterministic.
  • ·Hybrid retrieval still matters, and I still want it. I just do not think hybrid alone solves this class of question. Relation-style prompts also need a bridge into the right projects.
  • ·The risk with deterministic routing is overfitting the concept map to today's corpus. The upside is that the map is small, visible, and testable, which is a better failure mode than letting the model improvise the bridge in public.
  • ·The point is not to make the system less agentic. The point is to make it selective about where autonomy helps and where it creates noise.

References