Reflecting on Agentic Workflows

A hybrid system

Reflecting on my work from the past year, I have achieved the primary goal: a modular system with support for human-in-the-loop steering. This allows for a hybrid workflow: autonomous for some tasks, but steerable when necessary.

My experience with fully autonomous modes highlights the following challenges:

Review Complexity: It is difficult to review a large diff for a finished product. I find incremental code reviews to be a much safer and effective approach for important projects.
Specification Drift: The original task must be incredibly well-specified. Otherwise, the agent might spend a long time building an incorrect implementation based on a minor misunderstanding.
Cost: Unsupervised trial-and-error is expensive. Long-running tasks that rely on inference to self-correct often cost more than a steered session.

Similar to robots using odometry encoders for localization—where small errors accrue until they lead to significant drift—early divergence in LLM planning leads to dead ends. Without the ability to intervene, the only recovery mechanism is often a brute-force restart. By sticking to my principles of Manual Steering and Observability, I can correct this drift in real-time.

Workflow

I will illustrate my workflow with a very simple example, just to get a flavor, and I will subsequently provide more complex examples to show how the process scales.

Here is our task:

1Move src/app/home/workspaceConfig.ts to src/lib/stores/

Note: In this example I will not use auto-approval (the Auto Critic is unchecked); instead, I will go step-by-step to show the process in detail.

The Planner generates:

1const sourceFile = '/source/fe/src/app/home/workspaceConfig.ts';
2const targetDir = '/source/fe/src/lib/stores/';
3
4// 1. Get the content and exported symbols of the source file.
5const symbols = await extractRelevantTextLocation(context, "Find all exported symbols", sourceFile);
6await memento(symbols, '#exported_symbols');
7
8await setCodingGoal(`Identify all references to symbols in ${sourceFile} and prepare for moving it to ${targetDir}`);

After that executes, we will have in the Memento the location of the symbol definitions that need to be updated.

Because the workflow allows a human-in-the-loop, it is resilient to permanent logical flaws in the LLM response.

In the next turn the Planner can see what symbols are exported from that file and it generates:

 1const sourceFile = '/source/fe/src/app/home/workspaceConfig.ts';
 2const targetDir = '/source/fe/src/lib/stores/';
 3
 4// 1. Get definitions for the known exported symbol to get its precise location for reference searching
 5const symbolDefs = await findSymbolDefinition(context, ['useWorkspaceConfigStore']);
 6await memento(symbolDefs, '#useWorkspaceConfigStore_def');
 7
 8// 2. Find all references to these symbols to identify files that need import updates
 9for (const def of symbolDefs) {
10    const refs = await findSymbolReference(context, def);
11    await memento(refs, `#refs_${def.symbolName}`);
12}
13
14// 3. Move the file
15await executeShellCommand(context, `mkdir -p ${targetDir} && mv ${sourceFile} ${targetDir}`);
16
17await setCodingGoal(`Move ${sourceFile} to ${targetDir} and update imports in referenced files`);

Observations:

Instead of being a black-box, we can inspect the plan and if necessary we can intervene by adding TODOs.
Because we output code, note how we can chain multiple operations together, thus requiring fewer LLM prompts.

The Memento now contains pointers to all the file locations that are relevant for our task:

Coder output: Note how cost-effective this is. It costs only $0.03 because:

it is a single turn prompt
we send only relevant file Fragments (instead of full files)
the output uses vim commands.

All diffs:

Diff applied:

At the end of the loop the Planner evaluates the task as follows:

  %%{init: {'theme':'dark'}}%%
flowchart LR
    VO[Verify Objective - compile, test, etc] --> S[ Summarize]
    DIFF[Diff all changes] --> S
    S --> |Are we done?| D[Decider - yes/no]

  style VO fill:#4a9eff,stroke:#2d7ad4,stroke-width:2px,color:#fff
  style DIFF fill:#9b59b6,stroke:#7d3c98,stroke-width:2px,color:#fff
  classDef front fill:#2ecc71,stroke:#27ae60,stroke-width:2px,color:#fff
  class D front

The output of the Summarizer:

 1The coding task involved moving the `workspaceConfig.ts` file from `src/app/home/` to `src/lib/stores/` and updating all dependent references throughout the codebase.
 2
 3### Summary of Accomplishments
 4- *File Migration*: Successfully moved the logic for `useWorkspaceConfigStore` and its associated interfaces and logic to `src/lib/stores/workspaceConfig.ts`.
 5- *Import Updates*: Identified and updated multiple files that imported `useWorkspaceConfigStore`.
 6    - Updated relative and absolute imports in UI components: `commandBox.tsx`, `pinboard.tsx`, and the pinboard sub-components (`objectiveDiagnostic.tsx`, `objectives.tsx`, `recentlyClosedFiles.tsx`, `symbolDefinitions.tsx`).
 7    - Updated the main entry point at `src/app/home/page.tsx`.
 8    - Updated a dynamic `require` call in the backend-related logic in `src/lib/solver/solver.ts` to point to the new location.
 9
10### Evaluation
11- *Do the code changes solve the problem?*: *YES*. The file was moved, and all identified references were updated to reflect the new path.
12- *Are the code changes acceptable?*: *YES*. 
13    - The `next build` completed successfully, confirming that the main application and its dependency graph are intact.
14    - While there are pre-existing linting warnings and a TypeScript error in `terminalStore.ts`, these are unrelated to the changes made for this specific task (moving `workspaceConfig.ts`).
15    - The compilation of the service worker was successful.
16
17The migration is complete and the application remains functional.

Note how cost-effective the single-turn evaluations are; this one costs $0.0036:

As I mentioned before, observability is key for me, and the LLM log tab allows me to see the raw interactions with the LLMs, the costs, and even to modify conversations (e.g. drop messages) or edit messages. This is useful when designing new interactions where the LLM is almost right, but needs minor tweaks.

Mobile

Quality:

<< Previous Post

Next Post >>