A new piece of agent workflow research argues that HTML may be more than a web format. It could become a practical creative medium for AI systems that generate graphics and other visual media.
The experiment comes from 12 Grams of Carbon, which previously described how coding agents could automate slide deck creation. In that earlier work, the authors said agents helped reduce the time needed to build presentations by taking over repetitive chores such as layout adjustments, text alignment, and figure creation. They said the resulting decks were often more polished and accurate than what they could produce manually in the same amount of time.
Building on that approach, the latest post asks how far the same method can be pushed. The authors set out to test whether Claude Code could create an entire video, using HTML and related tools as the foundation for the workflow.
The project is framed as an experiment in pushing coding agents to produce visual media, rather than as a finished product or product announcement. The authors say they were impressed by how the test turned out and noted that they documented a skill for others who want to try a similar process.
Their broader point is that agents may be useful not just for writing code or answering questions, but for handling the labor that often slows down creative production. In their view, many of the most time-consuming parts of making presentations or visuals are not the core creative decisions. Instead, they are the mechanical tasks, such as nudging elements into place, collecting source material, and formatting content into a coherent layout.
By shifting those tasks to an agent, the authors suggest that the human creator can focus more on the message and less on the production details. That idea has already been applied to presentation workflows in their earlier writing, and the new experiment extends the same logic to richer media.
The research note is notable for its claim that HTML can serve as the central medium for agentic creative work. Rather than treating HTML only as a web page language, the authors present it as a flexible format that agents can use to assemble graphics and structured visual outputs.
That framing may matter because HTML is widely supported, easy to inspect, and familiar to many developers. It also gives agents a structured way to create content that can be refined by humans after the fact. The post does not claim HTML is the only possible option, but it does argue that it is sufficient for a surprisingly wide range of visual tasks.
The experiment adds to a growing set of examples showing how AI agents are being used in workflow automation beyond traditional coding assistance. As agents become more capable, some practitioners are exploring whether they can take on more of the creative production pipeline, from slides to video.
For now, the post reads less like a formal benchmark and more like a proof of concept. Even so, it reflects a broader shift in how some users are thinking about agent tools, not as assistants for isolated prompts, but as systems that can help build complete media assets from a lightweight, code-based foundation.