We worked with Toonsutra and Google to help bring comics to life, literally.
This project introduces an automated technical approach to generate immersive audio for digital comics. A multi-agent LLM architecture, built on Gemini 2.5 Pro, comprising specialized agents, collaboratively processes comic images and transcripts. This system analyzes context, aligns dialogue, composes music, maps voiceover audio to panels, and generates a final synchronized spatial metadata file to build immersive experiences.