avoiding hype-trainwrecks

An admission, and a significant one given my employment: I've been lowkey dubious about the "use AI for coding trend" which has swept the industry.

Why wouldn't I be? I mean, it's just multi-vendor multi-backend high-performance network storage, often at petabyte scale, on Kubernetes, Slurm, and who-knows-what else, with filesystems and object-storage alike, in new clusters with new partners and new and pending acquisitions in new clouds and neoclouds, partners internal and external, on short notice... and it needs to work, consistently, or no one's compute job is computing and the customer's capital is sitting there depreciating, which is incredibly expensive. Just throw the AI at it and it should be easy, right? Yeah, no.

I am personally incredibly competent in a bunch of different areas, do a great job of coding, and I haven't ever needed AI to do my work. Moreover, I've been on the receiving end of some low-quality AI code -- not often, but it was quite maddening -- and I am as professionally, aesthetically, and existentially threatened by Slop™ as just about anyone in this field. I never felt much interest in assisting a struggling tool when I could do it myself. Oh, I wasn't completely dead-set against it; I earnestly tried the officially supported integrations, but one-shot code generators were weak, and spicy autocomplete was more of a distraction than a help.

so why are you writing code with AI now?

My employer is, as you might imagine, somewhat invested in the idea that AI is a tool people will use because it is better than other tools.

being serious

When my boss said we should be serious about using AI on a regular basis on our team, I gave it the old college try once more. 1 As it turns out, code generation in a loop -- particularly a structured loop that can load up additional information, and take notes, invoke sub-agents and run tests -- is substantially better, and the tools are actually at a point where I can take their output seriously.

It's not that big of a surprise. If Adrian Thompson can produce An evolved circuit, intrinsic in silicon, entwined with physics in 1996, surely the option to evolve software is on the table.

But if it works well now, and will work better in the future, as people seem to expect, that raises a lot of questions, which I must also take seriously.

Because I'm taking it seriously, I note that this doesn't necessarily mean that the critics are all wrong about everything. Some criticism is, in fact, incredibly valuable.

"code was never the hard part," so what is?

"Code was never the hard part," the AI critics say, and rightly so. "Okay, so it writes code fast — who cares? Measuring lines of code was always a terrible metric."

Because we are taking AI seriously, we will consider this objection of its critics as well, rather than brushing them off. What is the hard part, then, if not code? A few ideas to start:

  • Quality and security (these two go hand in hand)
  • Predictability and consistency 
(in terms of operations, in terms of performance characteristics)
  • Graceful failure, and resilience
  • End-to-end system performance
  • Systems that are easy to reason about
  • Systems that are elegant and extensible
  • Applying engineering standards consistently
  • Ensuring humans have a deep understanding of the systems they build

I could probably go on, but these should make an excellent starting point.

okay, so what does this actually mean in practice?

For my current role, I think the answer is clear: double down on Design, and on best practices, and all of the things you can't rely on a coding agent to do for you.

And from my experience this far, all of this is also something that really helps the AI when it is doing the code as well! The AI can code faster than you for longer, and can probably fit a several more things in its working memory at a time -— but once it runs out of room in a session, all the knowledge it gained falls right back out, and it is blind and lost again. (Even when it's merely getting crowded, they say, it has some trouble, and thus much of context engineering is finding ways to keep a hierarchy of context where each piece is more digestible). It's a savant, with a bad case of anterograde amnesia, and it's prone to tunnel vision. It works because it can break tasks into smaller pieces, it can take notes as it goes along to mitigate this... but none of this means it can get through a pile of mud without a long slog. It really benefits from your structure being idiomatic, elegant, predictable. Also, much like a human developer, it is often at its best when it can get meaningful feedback as it goes along.

Concrete recommendations thus far

  • Put your security first. Sandboxes, VMs, locked-down accounts, network namespaces, limited permissions, dependency pinning, workload authentication. Don't tell me any of that is too much effort! Ask the AI to set it up!

    • There's a lot of third-party tools out there, and using them is often quite reasonable -- but running something as npx packagename is simply begging to be exploited.
  • Feed your enterprise context into the bots. I mean: give them access to systems like Jira, Github/Gitlab, etc. (With limited permissions, of course, unless you're foolish and want to be hacked; lock down those systems with your strongest locks or expect the hackers to some day start exfiltrating everything.) Have your bots read through your current jira-Initiatives, and the Epics underneath them, and the Story that you've written about what's going down, while they are in their planning stages. Link them to the design documents and roadmaps. If you don't trust them with access to these systems, use a tool that you do, to mirror the content selectively and feed it to them.

    • Corollary: You need to actually write good Jira stories and design documents, not just slop them out. Consult the AI regularly if you wish (especially to check that your view of reality is consistent across documents), but strongly consider producing the design document without having the AI generate anything at all.
      • Likewise, READMEs are not generally something I ask my bot to do -- but whether or not you use a bot for this or other documents, actually think about the information architecture of what you are trying to write. A README for humans will put the most important and interesting parts up front, laying the foundation for understanding, pointing out surprises and pitfalls, and then giving you a quickstart. A README generated by a bot, with today's default settings and no additional instruction, will typically read like a low-effort regurgitation of what might otherwise be API documentation, mixed with a blog about all the minutae of the development process, including all the things it didn't tell you to do -- and then present a power set of all the combinations of settings that literally no one will ever read. (It'll have overproduced Markdown, though.)
      • I've asked my bots not to save many of their remarks on design for their commit messages, and to draw attention to certain details -- of key decisions, surprises, or anything potentially questionable -- by posting comments on the merge requests they open.
      • I've also asked my bots sometimes to mark their contributions to code comments and the like with a symbol (e.g. ˚) to help indicate that they are AI-generated, and don't necessarily mean a conscious human design decision. In other places, like when they are helping move information between Jira and design documents and a separate bug tracking system, I've instructed it to label its thought as "BOT ANALYSIS:" to make that clear as well.
  • It is as important as ever that the systems you design should be simple and robust. 
(This is, of course, much harder than designing systems that are complex and fragile.)

  • Besides the design of systems, invest in the structure of the code itself. Apply SOLID principles and other design philosophies that you feel are appropriate for your field. Write the key code yourself, demonstrating how the pieces should fit together, and let the AI run from there. (This supports any number of other best practices around quality.)

    • And again, good engineering definitely isn't about adding more abstractions: it's about using the right abstractions, and making the abstractions you do use light. Simplicity beats out complexity any day of the week.
  • Invest in tools. Start with the existing tools: linters, code formatters, vulnerability scanners, static analysis, duplication finders, fuzzers, test coverage tools, the adversarial tools that try to break your code in a way that leaves the tests passing... if a tool exists in your language's ecosystem, you should be running it in your build pipelines. Enforce limits on cyclomatic complexity! Expose it all to your agent's coding loop. Have the AI run fuzzers on your code regularly. The excuse that it is "too expensive" to do these things and do things right is gone, right?

    • In some cases, your tools will work against you and you'll be stuck with them.. Helm, in particular, is built all wrong (text templating?! text templating, of such lovely well-structured schema-oriented APIs!?!? Philistines!) and is unsound (it can produce output that is structurally invalid -- it's quite easy, actually). The primary value it delivers is its ubiquity — and that's the stinger, because if your ops team or customers have processes built around Helm charts, you might not have great options to ship alternatives. This is a grave misfortune, and as much as I appreciate being at a powerful company, I don't think I'm personally in a position to dictate new tools to all the other teams here, let alone to the whole industry.

      What I might be able to do instead is to design tools to mitigate these flaws. Is there some new process I could imagine that lets me enforce the quality I want, while still dealing with Helm charts and upstream-vendor Helm charts? Because if I can imagine it, writing it is cheaper than ever.

  • Switch to programming languages that offer better tools to enforce quality. Type systems are entry-level stakes now, as are schemas for your data interchange formats. Unit-testing your code is an absolute must. If your blind savant AI happens to break something, you'll both be happier learning of what that something is before it breaks production.

    • Have you considered Haskell? ... actually, I hear from the people who actively use it (and like it) that the tooling for running it in production is somewhat limited and behind where they'd like for it to be, and if you wanted to avoid that, fair. Nevertheless, consider: a programming language that can provide language-level guarantees that an arbitrary method call does not erase the production database.

      Could you use its power to help your AI deliver better code?

    • Have you heard about a programming language called Rust that is really fast, and also makes really hard to write several important forms of invalid code, like memory errors? Probably you have -- but maybe it's a little bit of a learning curve, and you have an existing software base, and your engineers may be less familiar.

      If you're using AI to write your code, is using it feasible now? What does the team adopting it look like now? Can you figure out how to get AI to do a rewrite of your existing codebase, in a manner that you'd trust?

    • How about Erlang? OCaml? Have you heard of SPARK, a dialect of Ada? Lean, Rocq/Coq, Idris? If you're sticking to something more conventional, how about Prusti and Flux with Rust? I have never in my career worked for an employer that does formal verification for anything more than a very small subset of their code, if that.

      Maybe now is the time for people to start.

    • If you're doing web development, or have an existing NodeJS codebase, and you're using plain ES6 instead of TypeScript: I'm just gonna come out and say that it's probably time to change that.

    • For heaven's sake, stop having your AIs write so many shell scripts. You can do better. You're working on your build system and you want to do SemVer tagging of the repository? Yes, you can in fact write that in shell. But you could just as well have your AI do it all in rust, or golang, or what-have-you, and use the in-language Git API instead of shelling out, and have really, really nice unit tests.

      • Were you going for Bash as the least-common-denominator because it's hard to distribute this tool to your development environments? Okay, I understand that pain, but... Nix fixes this. Speaking of which...
  • Switch to better build systems. Did you want to switch to Nix or Bazel and immutable builds, but you were worried it would take too long and be too hard? You have an AI for this now. You don't need a month of your engineer's time to do the busy work. Give your agents deep in VMs and sandboxes the same tools you'd give your developers and your build pipeline.

  • Think really, really hard about your people-driven processes, and the incentives your people are facing. If your firm rewards people who foist off slop on the rest of the team, then you're asking everyone to play a game where the winner is the one who makes the sloppiest slop the fastest, and you will be competing against other firms that do the same, and the next thing you know people are making fun of your code for always crashing, or you're looking at a single nine of uptime if you're lucky.

    • In the past, some companies have hired smart engineers, and frustrated them by rewarding the mavericks and giving short shrift to the people who actually kept things running after they left. Everything bad about this dynamic is going to be worse with AI. You can't expect to have the engineers save management from all their own worst instincts anymore. You will get what you pay for; you will get it good and hard. If you will not listen to your engineers, you will hear the news from the market.

    • If you do not focus like a laser on a culture of excellence, you will end up with a culture of slop. The drift towards slop will only accelerate -- at least until such time as the AI is doing absolutely everything without you anyway. Today, quality either goes first, every time, or slop and indifference will win, and eventually everything will fall apart.

      Say no to cut corners and shortcuts. After all, you have AI now, you should have the velocity to achieve this without gross compromises.

      • The true mark of excellence in your engineers will be that they care about understanding the systems and getting it right.

        Your firm will either have capable leadership reward them, or you will let them go for not being sloppy enough, and reap the consequences.

    • Normalizing deviance was always bad, and now it is worse than ever. Build failures on main and test failures should be rare, and always an occasion to take action. Warning outputs? Likewise. It is hard enough for humans to reason about the state of a system that's half-broken.

      Use your AI to clean up those warnings so that it's not sifting through them later while looking for something else under time pressure.

    • Code review, design review, and quality assurance are the defining parts of your team's processes going forward. Design everything your team does around these parts; throw out and remake any process that gives them short shrift.

  • Broadly speaking: think about the big picture, and be ambitious. A lot of people are approaching AI merely as "a tool that your developers interact with in their IDE." This is tunnel vision; this is yesterday's model. Think autonomous. You may well have a devil of a time making it all happen, but keep it on your roadmap, let it inform your plans.

    If you're having trouble imagining, start with a vision of an agent that watches your build pipelines, your alert systems, your logs (if they're trustable, i.e. attackers can't put strings there), anything automated — and as soon as there's a problem, it starts work on a bugfix, and opens an MR. That is table stakes for the future.

Contraindications

  • A caveat for my list of recommendations: I currently work on software that is distributed to customers via binaries, containers, Helm charts, and the like, where you can't upgrade it easily, and which (generally speaking) does not run on systems owned by my team. Failure is costly.

    • If you are working on software that is hosted, running as a service, there may be other valid approaches to all this; I am given to understand (from my reading articles online) that Anthropic hooks up some pretty autonomous agents to a variety of operational metrics, dashboards, and alerts -- all of which are their own form of best practice -- and possibly cares a good bit less than I need to care about the specifics of the code in question.

    • I can reason a little about how quality would work in the world of services, but I'm not in the thick of things right now. Staged rollouts, blue-green deploys, canary environments, qualification of new code on mirrors of production data-streams, AI chaos-monkeys trying to hack your site...

  • There remains significant economic merit in simply being first to market, and growing the fastest, even if corners are cut -- and simply cleaning up afterwards. It's a lot harder to clean up a system than to keep a system clean, but success means you can afford it, and maybe AI makes that cheaper too.

    On the other hand, it's really, really hard to make that kind of transformation culturally, and really hard to have humans understand systems that no human understands. As long as we have people, the people-problems will always be the biggest challenge. (And if AIs some day have this discipline themselves, and make software virtually free, you won't have any margins anyway.)

in summary

Code is cheap now? Act like it, and focus on other things that matter more -- like team culture, QA, and real engineering.