ianmcgraw 9 hours ago

Hey HN,

Lead engineer at Arthur AI here. For the last year, my team and I have been in the trenches with customers (in finance, airlines, etc. ) trying to get AI agents from a cool demo to a reliable product.

The problem we hit over and over is that Agentic AI is easy to get to a functionally complete state, but going from functionally complete to reliable is where most teams struggle.

We found that the traditional SDLC process doesn’t work for agentic systems since these systems are probabilistic. Agent development requires iteration and experimentation to align behavior with business objectives.

We needed a new methodology! Today, we're sharing the one we've developed and refined after putting this into practice: The Agent Development Lifecycle (ADLC).

The core of the ADLC is a shift from the linear SDLC to a continuous loop we call the Agent Development Flywheel. This flywheel allows us to methodically identify failure modes from live and simulated usage and add them to an evolving evaluation behavior suite. This suite then allows us to confidently experiment with new prompts or tools to improve the agent's performance without introducing new regressions.

I’d love to hear what you think. I'm here to answer any questions about what we've seen work (and not work) in production.

snpranav 9 hours ago

I like this! This seems like a cool way to think about building a continuous testing suite (or a CI system) for agents as they evolve.