AI
Collaborative AI Engineering: One Dev, Two Dozen Agents, Zero Alignment
After watching the video, I don’t know how the title applies. Most of the talk is about a beta application (ACE) GitHub is developing to make AI-enabled SDLC have less friction. It currently only runs in a micro VM hosted in the cloud. There were some other ideas I jotted down:
- Premise: One person with a fleet of a dozen agents will do the work of an entire team.
- Existing tools scale up the value of one individual. But software is a team sport.
- Implementation is rapidly becoming a solved problem. Agreeing on what to build is the new bottleneck.
- AI has made the cost of not being aligned as a team higher.
- Code is so cheap that we don’t properly stop to think before we prompt it.
- Plans are usually local and unshared with other people.
- Going fast without alignment -> wasted work (building the wrong thing) and coordination debt (cleanup cost of unaligned efforts, tons of PRs with little context for reviewers)
- Planning and building are no longer separate phases, they are separate cycles. Human context (business context, financial resources, political dynamics, product vision, user research insights, org history) is not in the codebase.
- The speed and volume of work makes it hard to keep up with what your coworkers are doing.
- Premise: Take the time given back to you by AI to focus on design, planning, and quality
- Quality is the new differentiator. “Craftsmanship will set you apart from vibe-coded slop.”
Building pi in a World of Slop (tagline: slow the fuck down)
- Acts I and II (first 12 minutes) are about why he built his own AI harness (pi).
- There has been noticeable decline in software quality (e.g., outages, bugs).
- Agents are emitting code with compounding mistakes, zero learning, no bottlenecks, and delayed pain for actual humans. AI generates far more code than anyone can reasonably review.
- AI was trained on “bad architecture decisions and cargo cult best practices” from code on the Internet.
- Every agent decision is local (i.e., it doesn’t see the full system or others’ work very well).
- A sufficiently detailed spec is called a program. If your spec has holes, the agent fills it with what it learned on the web.
- “But humans also…” make mistakes! Yes, but they learn. They screw up less often because they’re inherently slower than AI. Humans also feel pain and fix things. Your markdown files and complex memory system (and bigger context windows) will not save you.
- “I don’t even read the code anymore.” This means you don’t understand it, and because there’s so much of it, the agent isn’t much better at helping you.
- Properties of good agent tasks
- Scoped for minimal code consumption (modular codebases help)
- Closed loop: agent evaluates its own work
- Not mission critical
- Boring stuff, or things you don’t have time to try
- Reproduction cases from user issues
- Rubber ducking
- Learn to say no — fewer features, but the right ones, polished.
- What’s critical? Write architecture and APIs by hand, pair with agents (instead of outsourcing to them), friction builds understanding and taste and where you learn.
Software Engineering
It Doesn’t Help to Push AI into a Crappy Process
- The teams that are getting the most out of AI already have a solid engineering process.
- You’re not likely to simply assign AI to complete a ticket and have it be successful. Usually the ticket isn’t defined well enough, or the code is large and unwieldy.
- You can’t set up a good agentic loop if you can’t specify the outcome before the code exists. Defining what you want before you build it is a big part of TDD.
- The kind of tests AI writes are different than the tests you write with TDD. Also, those tests are likely tied to the implementation making it difficult to create non-brittle tests.
- When applying AI without intention, you’re likely still doing code-driven development, just faster.
- You need to understand the problem and come up with concrete examples. AI can help here (e.g., prototyping ideas for stakeholders). Next, make out a development plan that has a test list. Then pick on slice/scenario and have AI apply a TDD workflow. See if there are opportunities to refactor, then commit. (Aim for pushes every 10 minutes.)