Audio-to-Video with LTX-2

Name: Audio-to-Video with LTX-2
Availability: InStock
Author: runshouse

by runshouse·Mar 2, 2026·23 points·2 comments

Visit Project View on HN

AI Analysis

●MidCrowd PleaserEye Candy

Audio-to-video is solved by Runway, Synthesia, and D-ID; this adds no clear differentiation.

Strengths

•Foley sound generation works reliably and maps to a genuine creator pain (quick B-roll from sound).
•Free tier with no signup lowers friction for trial; clean, friendly landing page.

Weaknesses

•LTX-2 open-weights model produces visually inferior output to Sora 2, Seedance 2, Veo 3.1—no compensating advantage.
•Gemini prompt enhancement is a wrapper on an existing model; audio-to-video as a category is crowded and well-solved.

Post Description

LTX-2 is an open-source diffusion model that combines video and audio.

Visually it's not at the level of Seedance 2.0, Veo 3.1, or Sora 2, but it’s open-weights, so anyone can play with it.

I wanted to see how good it is at generating video from just audio.

Off-the-shelf, it's not very good, but I found that if you run the audio through Gemini to generate a prompt, then feed that into LTX-2, in addition to the audio, the output matches the audio much more often.

Foley sounds work particularly well, and one fun use case is uploading audio of yourself to see what AI thinks you look like.

Limitations:

- Doesn't know real people, so a famous person's voice just gets a generic person

- Sometimes gets gender wrong if the voice is more androgynous

- In dialogue with similar voices, it can render the same person saying both lines