Tools that Work Together
This is an attempt at an optimistic story of how, to my big surprise, the current AI push might lead to better software. Before we get to the optimistic part, we shall start by outlining some of my frustrations with the software ecosystem.
In science fiction many things are possible that technology has not enabled us to do yet in real life. We have all trained to suspend our disbelief when the story calls for faster than light travel, anti-gravity fields, or inertial dampeners. But one little thing keeps breaking the immersion for me: Computers that just seamlessly work together. Just throw that video onto the big screen. Cross-reference the data from the alien transponder with the sensor array in orbit. Have a video conference between a military spaceship and a civilian academic sitting in his office.
Of course it would distract from the main story if the characters first had to figure out how to export the bad guys’ financial transaction records as a CSV and then had to mull over how to clean up the data in pandas because the bank had funny ideas about how to deal with missing fields. But whenever I see depictions of technology that just seamlessly works together, I get a little sad about the state of software today.
The dream of interoperability is almost as old as computers. On a computer running Unix you could glue together programs to form sophisticated pipelines and automations with a little bit of scripting knowledge. In the Unix philosophy software is supposed to consist of minimal, composable, and reusable pieces. You can still see this philosophy in action today sometimes if you are a Linux poweruser or a developer.
But most modern computer users interact with software in a very different way: Computers run isolated applications with graphical user interfaces that operate on documents. Sometimes documents aren’t even presented as files in a shared file system but are locked as a concept purely inside of the application that can not be interacted with directly by other software. Export functions aren’t always present, and when they are their use is typically inconvenient or the exported data incomplete.
This makes sense as a business model: locking your user’s data into one platform increases switching costs. If you provide a suite of applications that work together but can not be integrated with third-party software, you can nicely (ab)use your market dominance to protect your individual tools from competition. When you haven’t quite established a sufficient monopoly to pull this off, you can engage selectively in partnerships with other companies to create an in-group of interoperability.
At first it appeared as if AI would make this situation even worse: platforms feared losing their data moat and were incentivised to remove, restrict, or paywall their APIs. Every software company rushed to introduce AI to their product, but you couldn’t bring your own AI subscriptions and had to use theirs. At least this provided an opportunity for some startups to get funding to build nice open source software since they could always point at optional integrated AI subscriptions as a monetisation strategy.
I promised in the beginning that this story was a hopeful one. That hopeful part begins here. Recently the tech hype cycle has moved in the direction of AI agents as people have started to use AI coding platforms to coordinate tools via MCP. These tools are no longer restricted to be relevant just to coding, but now include graphic design tools and office suites. I have seen traditional GUI applications being glued together in one pipeline. An even more recent trend ditches MCP and emphasises command line interfaces. An industry and user base that until very recently has dismissed the Unix philosophy now seems to rush to implement it. I was very surprised at first when Google published a cli tool to access data in Google workspaces, but it made total sense in light of this.
If this trend continues, the incentive structures are now aligned more with interoperability and automatability. Even if your software lacks some features in comparison with your competitors’, if your software can work with my AI agent I may be able to supplement the missing features through integration with other tools. If you are quick enough now, you can get funding from your favourite VC to reimplement any traditional application to be “agent first”. This will hopefully create sufficient zugzwang for incumbents to open up their APIs as well.
So why is this happening now? Easy to use APIs and CLI tools were valuable long before AI came along and everyone with a bit of scripting ability was able to capitalise on this. I have a bunch of non-technical friends who have automated their entire tedious data entry job ten years ago after watching a few tutorials on Python. The cynic answer is that money and clout currently sticks to everything AI, and this phase of open interoperability will inevitably be enshittified or reversed outright once the spotlight of attention moves to the next thing.
The optimistic answer is that AI makes integrating tools easier to an extent that triggers a phase shift in usability and public adoption. While I was certainly able to produce scripts that coordinated various tools, I often didn’t because it still was rather tedious and not worth the time for small jobs. This cost/benefit calculation is even more disfavourable for the majority of the population that can not program. Now that you can vibe code small scripts easily, the user base has sufficiently expanded that your software being used as part of a script has become a realistic usage scenario. If everything goes well, the AI hype will be able to shift the market into a new stable equilibrium in which you have to enable integrations to compete.
This is not the first time when it looked like the industry was moving towards better software integration. The semantic web hype promised very similar things but vanished into obscurity as quickly as it came. I can’t predict whether this time will be different, and if you are reading this article in the future you might laugh at how wrong it turned out to be. What seems to speak for a more durable change is that LLMs can sufficiently blur the boundaries between automation and ordinary usage in a novel way. Most users won’t really notice if you stop offering SOAP endpoints or RSS feeds silently fade away, but they will notice when you no longer can tell the computer in plain English to “produce a slide deck from the conversation I had with John via email last tuesday”.
Some AI companies have pushed the idea that the user interface of the future is a single textbox (ideally the textbox that they provide), while the user experience crowd rightfully has pushed back against this. The scenario in which we retain individuated applications but can glue them together with AI could address the user experience design concerns. It also appears to have a more realistic migration story: adding an MCP server or CLI to existing software is a simple incremental change in comparison to the fundamental redesign of computer interfaces required for the single textbox vision. For companies that develop applications this could be a worthwhile compromise solution, preferrable to obsolescence.
A second difference to earlier forays into integratable software lies in a potential for robustness of LLMs in the face of format changes. While LLMs are certainly not paragons of reliability, they could fare much better with changing protocols than explicitly codified APIs. Renaming a field in your JSON HTTP API is a breaking change, traditionally requiring manual intervention. Using AI to interpret changing schemas reduces the cost of providing APIs as you can ease off backwards compatibility a little and make migrations of clients easier if not entirely transparent. This robustness towards schema change could be achieved in two ways: If the AI agent interprets the tool responses directly, it is not beholden to parse a strict grammar but can semantically interpolate between the provided data and the request. Alternatively, when AI is used to generate traditional client code from a natural language use case, it can regenerate the code from the updated documentation and the original underspecified user intent when the API version changes. I would favour the latter approach as it appears to be more effective to have agents generate code than deal with data directly, but it remains to be seen how it can be made transparent to the non-technical user.
It was a nice experience to try optimism for a change. There are still many ways this could go wrong. When Anthrophic released MCP as an open protocol they commoditised their complement. For now it is advantageous for the big AI companies to use their market power in increasing the usefulness of AI in general, and that entails that everyone else makes their tools available to AI agents. In the future we could see the ecosystem closing again in two ways: If a monopoly emerges among AI providers, the interface used for integration could become closed again to everyone but the remaining AI provider. This could also happen locally in platform silos, such as Apple’s or Google’s platforms. In an alternative scenario, if application companies find their footing again, they could double down on their previous integration-as-the-moat strategy. This time the moat could be even deeper if all tools by one provider can work together via their AI agent but other vendors’ tools can’t cross into the walled garden. Given that Apple and Google provide both platforms and applications, and given that antitrust enforcement is a essentially non-existent, we could even have both of these failure modes simultaneously.
To leave you on a positive note, I believe that the campaign for AI agents has brought along some quality of life improvements for us developers even if you never actually use AI. CLI tools can be called by your manually written software. And to enable AI agents to work on code bases or to use APIs effectively, developers have started to provide nice self contained usage instructions and design documents. Because of the current inability of AI to deal with large contexts without deteriorating, these descriptions tend to be very concise. If you provided a contemporary AI with the same convoluted and incomplete instructions as are typically produced for human consumption, nothing would get done, and so the AGENTS.md files in the wild are surprisingly helpful.