Why HTML Beats Images for AI-Generated Slides
A technical case for HTML as the source format for AI-generated slides: precise edits, accessibility, diffs, search, exports, and where images still fit.
Author: Variant Team. Variant is built by a small team working on HTML-native presentation tools, MCP workflows, and agent-editable decks.
Short answer: images are fine as assets, but they are a poor source format for AI-generated slides. Once a slide is flattened into pixels, small edits become regeneration, masking, or manual reconstruction. HTML keeps the text, layout, and styles addressable.
That addressability is the whole advantage. A model can patch a heading. A human can fix a sentence. Git can show a diff. A browser can render the result. The slide remains a document instead of becoming a screenshot of a document.
This article makes the technical case with concrete tradeoffs: editability, accessibility, file size, version control, search, export, and the situations where raster images still belong inside the deck. Here's the companion example: download the HTML vs image slide format deck.
#Quick answer
- HTML slides are editable documents; image slides are flattened renderings.
- AI agents can patch HTML precisely because text, layout, and styles are addressable.
- HTML is better for accessibility, search, version control, and find-and-replace.
- Images are still useful for photos, generated artwork, backgrounds, and final raster exports.
- For AI-generated decks, use HTML as the source and images as assets inside the deck.
#What the two formats actually are
An image slide is a rendered bitmap. PNG, JPEG, WebP, sometimes SVG. The model decides what should be on the slide, hands the prompt to a diffusion model or a layout model, and returns pixels. The pixels look like a slide. They are not a slide.
An HTML slide is a document. A <section> or <div> with text nodes, headings, lists, images, SVGs, and CSS that arranges them. The model writes markup the same way it writes any other code. The slide is composed of named, addressable parts.
The difference shows up the moment you want to change one word.
#Editability: the one that matters most
Pretend a teammate sends you a deck and the title slide says "Q3 Roadmap" but you're presenting in Q4. With image slides, your options are:
- Regenerate the slide and hope nothing else moves.
- Open it in Photoshop or Figma and try to match the font.
- Recreate the slide from scratch.
None of those scale. Regeneration is the worst of the three because diffusion-style outputs are non-deterministic. You'll change "Q3" to "Q4" and the chart will shift, the headline font will swap, and your color accent will drift one shade. You'll spend more time fixing the regeneration than you saved by generating the deck in the first place.
With HTML, you open the file and change Q3 to Q4. You're done. If you want the agent to do it, you say "change Q3 to Q4 on slide 2" and it makes a one-character edit. The rest of the slide is byte-for-byte identical because nothing else got touched.

This is the strongest argument for HTML and it's not close. AI-generated content is only as good as how easy it is to fix. A format that punishes small edits punishes the whole workflow.
#Accessibility
Screen readers don't read images. They read alt text, and alt text is a poor proxy for the ten lines of content actually on the slide. Image-based decks are essentially opaque to assistive tech.
HTML slides ship semantic markup for free. Headings are headings. Lists are lists. Tables have row and column headers. A blind colleague using NVDA or VoiceOver can navigate the deck the same way they navigate a webpage. You can add ARIA roles where you need them. You can specify reading order without redrawing anything.
If you ever need to publish a deck publicly (conference talks, internal training, customer-facing pitches), this matters. If your company has any accessibility commitments at all, HTML is the only honest choice.
#File size and rendering cost
A reasonably designed HTML slide is small. A few KB of markup, a few KB of CSS, maybe a referenced font and an SVG icon. A whole 30-slide deck can come in under a megabyte if you're careful with images.
A 30-slide image deck at 1920x1080 PNG is tens to hundreds of megabytes. Email it to someone and they'll notice. Stick it in a git repo and your repo size doubles overnight. Put it on a slow conference Wi-Fi and you'll watch the spinner instead of your slides.
There's also the rendering side. HTML renders in any browser at any resolution. The same deck looks crisp on a phone, on a 4K display, and on the projector that's always set to a slightly weird aspect ratio. Image decks are baked at one resolution. Resize them and you get either pixelation or blur.
This compounds with retina displays, multi-monitor setups, and anyone using zoom for accessibility. HTML scales. Images don't.
#Version control
Drop a folder of PNG slides into git and git diff shows you "binary files differ." That's it. You cannot review what changed. You cannot blame a line. You cannot merge two people's edits. The history is a row of opaque blobs with timestamps.
HTML diffs cleanly. You can see that someone changed a heading, added a bullet, swapped a color. A pull request review for a deck change actually means something. You can revert one slide without touching the rest. You can branch, merge, and rebase the deck the same way you do any other code.

If your team treats decks as artifacts of work (and most engineering and product teams do), this is huge. The deck becomes part of the same review and history surface as the rest of the codebase. No more "Pitch_v3_FINAL_FINAL_actually_final.pptx".
#Search, find-and-replace, and grep
You can grep -r "Q3" decks/ across an HTML deck repo and find every reference in seconds. You can run a script that updates every footer year on January 1st. You can search the deck for typos. You can extract all speaker notes into a transcript.
Image decks make all of this impossible without OCR, and OCR is wrong often enough to be untrustworthy at scale.
#Exports and interop
Here's the part that surprises people: HTML is the easier format to export from, not the harder one. Once you have a structured document, you can render it to anything.
In Variant the pipeline looks roughly like this:
- HTML stays as HTML for web sharing and the editor.
- Headless Chromium renders to PDF for printing.
- A PPTX exporter walks the DOM and emits OOXML for PowerPoint compatibility.
- A bundler inlines all assets to produce a single self-contained
.htmlfile you can email or host anywhere.

- The deck JSON is the canonical format for agents and APIs.
Going the other way is a nightmare. Once you have an image, you can convert it to a different image format, and that's about it. You cannot reliably extract text. You cannot reflow it. You cannot promote it back to a PPTX with editable shapes.
#Animation and interactivity
HTML slides can use CSS transitions, keyframe animations, and a sprinkle of JavaScript when you need it. You can fade in bullets, animate a chart, embed a live iframe with a demo, or include an interactive code snippet that the audience can play with after.
Image decks can do none of this. The closest you get is a video file, which is even less editable than an image.
I'm not saying every deck needs animation. Most don't. But the ceiling matters. With HTML you can stay simple and add interaction when the moment calls for it. With images you are stuck at "static picture" forever.
#How AI agents work with HTML slides
This is the part that drove the design decision for Variant. When an agent like Claude Code is editing a slide, it needs a representation it can reason about. Markup is exactly that. The model already speaks HTML and CSS fluently. It can:
- Add a slide by writing a new section.
- Change a heading by editing one line.
- Restyle the deck by updating CSS variables.
- Insert a chart by emitting SVG or a div with computed bars.
- Read the current slide back and reason about layout.
If the slide were an image, the agent would have to call a vision model just to know what's on it, then call an image generator to make a new one, then hope the new one matches the old one. That's a lot of round trips and a lot of randomness.
A typical Variant MCP config for Claude Code looks like this:
{
"mcpServers": {
"variant": {
"transport": "http",
"url": "https://app.variant.art/mcp",
"auth": {
"type": "oauth",
"authorization_server": "https://app.variant.art/oauth"
}
}
}
}
Once it's wired up, the agent can call slide.edit with a targeted change. "On slide 4, replace the second bullet with X." It edits the markup and the canvas updates. No regeneration. No drift.
If you'd rather use a scoped token instead of OAuth:
{
"mcpServers": {
"variant": {
"transport": "http",
"url": "https://app.variant.art/mcp",
"headers": {
"Authorization": "Bearer ss_..."
}
}
}
}
The MCP surface includes tools like deck.create, slide.edit, slide.preview, slide.duplicate, deck.export, and a handful of others. They all assume the slide is structured, not pixelated.
#A concrete workflow
Here's what a real session can look like, end to end:
- In Claude Code, you say: "Make a 6-slide investor update for Q4: hook, traction, product, team, ask, thanks. Use our brand colors."
- The agent calls
deck.createwith a generated outline. Variant returns slide previews. - You open the deck in the canvas. The hero slide looks fine. The traction chart needs a different baseline year.
- You click the chart and adjust the year inline, or you tell the agent "change the chart baseline to 2023."
- The agent calls
slide.editon slide 2. The chart updates. Nothing else moves. - You ask for a code-tab tweak: "make the headline font slightly tighter on slide 1." You can do it visually or in the code tab. Both stay in sync.
- You export. PDF for the email. PPTX for the colleague who lives in PowerPoint. A single-file HTML for the link share.
You never regenerate a slide to fix a single thing. That's the workflow HTML enables and images can't.
#When images are actually the right call
I want to be honest about this because the post otherwise reads like HTML is always better. It mostly is, but not always.
- Hero imagery and photography. A photo is a photo. Don't try to recreate a beach sunset in CSS. Embed the image inside the HTML slide.
- Generated illustrations. If you want a stylized graphic to anchor a slide, generate the image and place it. The slide is still HTML; the image is a citizen of it.
- Charts you genuinely don't need to edit. A static export from a BI tool may be fine as a PNG, especially if the source of truth lives elsewhere.
- Slides screenshotted from somewhere else. A screenshot of a tweet or a UI mock is naturally an image. Fine. Embed it.
The pattern: images belong inside HTML slides as content, not as the slide format itself. You want a structured document that can hold images, not a stack of images pretending to be a document.
#Tradeoffs of HTML you should know about
Nothing is free. A few honest caveats:
- Pixel-perfect typography across renderers takes care. A slide that looks great in Chrome may look slightly different in Safari or in PowerPoint after PPTX export. Not catastrophic, but worth checking.
- Custom fonts need to be embedded or hosted. Otherwise your single-file HTML export will fall back to a system font.
- Some legacy environments only accept PPTX or PDF. Plan for export. Variant gives you both.
- Complex animations can hurt performance on low-end machines. Keep them tasteful.
These are real, but they're all manageable. The image format has its own caveats and most of them are unfixable.
#Related reading
- How to use Codex to generate editable presentation decks
- Building agent-editable presentation decks with MCP
#FAQ
Will HTML slides look as good as image slides from a diffusion model?
For typography-heavy and content-heavy decks, HTML usually looks better, because the type renders crisply at any zoom and the layout responds to content length. For dreamlike hero images, diffusion still wins. The good news is you can use both: structured HTML slides with generated imagery embedded.
Can I still use PowerPoint after editing in an HTML-based tool?
Yes. Variant exports to PPTX. The export is a compatibility format, not a perfect mirror of the browser: simple text, images, and shapes translate best; complex CSS may become approximated or flattened. That's normal. If the deck needs to keep living in PowerPoint, check the PPTX before you send it.
How does this play with version control?
HTML decks diff cleanly in git. You can review changes in pull requests, blame specific lines, and merge edits from multiple people. Image decks are binary blobs and behave the same way any binary asset does.
Doesn't HTML mean I have to know HTML?
No. Variant has a visual canvas. You can drag, click, type, and never touch the markup. The HTML is there when you want it (or when an agent wants it). Most users edit visually most of the time.
What about accessibility for the audience watching the presentation?
HTML slides can include proper headings, alt text on images, semantic lists, and ARIA roles. Screen readers and high-contrast modes work the way they would on a webpage. Image-based decks lose all of this.
Can the agent really edit one word without redoing the slide?
Yes. With Variant's MCP slide.edit tool, the agent applies a targeted patch to the slide markup. The rest of the slide is untouched. That's the whole point of using a structured format.
#Wrapping up
The image-vs-HTML debate isn't really about file formats. It's about whether your slides are documents or pictures. Documents you can edit, search, version, export, and reason about. Pictures you can only replace.
If you're working with AI agents on decks, especially with Claude Code or any MCP-capable assistant, you want documents. That's what Variant is built around: every slide is real HTML and CSS, the canvas and the code stay in sync, and the agent edits the actual structure instead of regenerating pixels.
If you want to try the workflow described above, you can wire Variant into Claude Code with the MCP config snippet earlier in the post and ask it to build you a deck. It's the fastest way to feel the difference between editing a structured slide and regenerating a flat one.
Whatever tool you use, ask it one question: when I want to change a single word, what does that take? The answer tells you everything about the format underneath.