Skip to main content

Why Do AI Agents Keep Making the Same Mistakes?

· 8 min read
Vadim Nicolai
Senior Software Engineer

Every Claude Code session leaves a trace — tool calls made, files read, edits applied, errors encountered, and ultimately a score reflecting how well the task was completed. Most systems discard this history. We built an agent that mines it.

The Trajectory Miner is the first agent in our six-agent autonomous self-improvement pipeline for nomadically.work, a remote EU job board aggregator. Its job: analyze past sessions, extract recurring patterns and reusable skills, and feed structured intelligence to the rest of the team. It writes no code. It produces raw material that other agents — the Codebase Auditor, Skill Evolver, and Code Improver — consume.

The design draws from four research papers, curated from the VoltAgent/awesome-ai-agent-papers collection. Here is what each paper contributes and how we translated academic ideas into a working system.

The Agent That Says No: Why Verification Beats Generation

· 8 min read
Vadim Nicolai
Senior Software Engineer

An autonomous improvement system without verification is just autonomous damage. The Code Improver can write fixes. The Skill Evolver can edit prompts. But neither should be trusted to judge its own work. That's why the Verification Gate exists.

The Verification Gate is the fifth agent in our six-agent autonomous self-improvement pipeline for nomadically.work. It validates every change made by the Skill Evolver and Code Improver before those changes are accepted. It never modifies code or skills — it only reads, checks, and reports a verdict.

Five research papers shaped its design, curated from the VoltAgent/awesome-ai-agent-papers collection. The common thread: autonomous systems need calibrated self-awareness about the quality of their own outputs.

How I Built a UX Team with Claude Code Agent Teams

· 16 min read
Vadim Nicolai
Senior Software Engineer
TL;DR

Set CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in .claude/settings.json. Write a command file in .claude/commands/ and spawn prompts in .claude/team-roles/. Type /ux-team and three agents — UX Lead, UX Researcher, UI Designer — run in parallel: researcher defines personas and journeys, designer builds the component system, lead synthesizes into a spec. File ownership is enforced by persona, not by filesystem. BMAD Method v6 provides the Sally persona and a quality-gate checklist that runs before the spec is marked complete.

BMAD Method + Langfuse + Claude Code Agent Teams in Production

· 16 min read
Vadim Nicolai
Senior Software Engineer

Running AI agents in a real codebase means solving three intertwined problems at once: planning and quality gates (so agents don't drift), observability (so you know what's working), and orchestration (so multiple agents divide work without clobbering each other). In nomadically.work — a remote EU job board with an AI classification and skill-extraction pipeline — these problems are solved by three complementary systems: BMAD v6, Langfuse, and Claude Code Agent Teams. This article explains how each works and how they compose.

Trigger.dev Deep Dive: Background Jobs, Queue Fan-Out, MCP, and Agent Skills

· 14 min read
Vadim Nicolai
Senior Software Engineer

Trigger.dev is a serverless background job platform that lets you run long-running tasks with no timeouts, automatic retries, queue-based concurrency control, and full observability. Unlike traditional job queues (BullMQ, Celery, Sidekiq), Trigger.dev manages the infrastructure — you write TypeScript tasks and deploy them like functions.

This article covers the platform end-to-end: architecture, task authoring, the queue fan-out pattern, MCP server integration for AI assistants, agent skills/rules, and a production case study of a TTS audio pipeline.

Building an Automated Architecture Reviewer with Claude Opus 4.6

· 9 min read
Vadim Nicolai
Senior Software Engineer

We built an Architect agent — a fully autonomous code reviewer powered by Claude Opus 4.6 — that explores a repository, runs audits, and produces a comprehensive architecture report. One command, zero human intervention, a professional-grade review in under 10 minutes.

This article covers how the agent is structured, how it leverages Anthropic's agentic tool-use loop, and what we learned shipping it.

Production-Ready AI Job Classification in Python with LangChain and Cloudflare Workers AI

· 10 min read
Vadim Nicolai
Senior Software Engineer

We needed a pipeline that ingests hundreds of job postings from ATS platforms (Greenhouse, Lever, Ashby), enriches each posting with structured data from their public APIs, and then classifies whether a job is a fully remote EU position — all running on Cloudflare's edge with zero GPU costs.

This article walks through the architecture and implementation of process-jobs, a Cloudflare Python Worker that combines langchain-cloudflare with Cloudflare Workers AI, D1, and Queues to build a production classification pipeline.