• Home
  • Cinema
  • Studio
  • Digital
  • Royalty Free Music
  • web agency
  • blog
  • Home
  • Cinema
  • Studio
  • Digital
  • Royalty Free Music
  • web agency
  • blog

BLOG! BLOG! BLOG! BLOG! BLOG! BLOG! BLOG! BLOG! BLOG! BLOG! BLOG! BLOG!

blog

articoli, news, update

[AUD+VID]🎧LTX-2.3: From Audio + Image to Long-Form, Lip-Synced full video🎬

bla bla bla

  • Marzo 27, 2026

Hi folks, this is CCS.

First of all — thank you. Genuinely. Every share and subscription helps me keep pushing this work forward. You give me real reasons to keep going.

As you already know, each of my posts introduces something new from a nodes standpoint — not just a workflow to copy-paste, but an actual technical advancement designed for filmmakers first, and content creators in general. This one is no different.

In the introduction video of this post, you’ve already been able to see a video generated with these workflows — a musical famous in the world of ideas and projects that were never truly born 😉



What this is for

This pipeline (and the new nodes ;)) are designed for anyone who wants a complete, end-to-end production workflow for music videos and short films — and needs longer clips to actually work with.

More room to edit. More room to grade. More room to play.

With a 3-segment setup like this — which is exactly what both workflows shared here implement — you can reach up to 60 seconds of generated video in a single run, fully audio-synchronized, in one shot. And yes: depending on your VRAM and RAM, you can push it to 60 seconds at 1920×1080 with enough headroom.

That said — and I mean this — start at 1280×720.

Understand how the segments connect, how the audio math works, how the extensions chain together before pushing the resolution. This pipeline rewards understanding before brute force.

First of all: update IAMCCS-nodes: https://github.com/IAMCCS/IAMCCS-nodes.git


Here’s a short demo (kept brief for clarity):

The workflow: LTX 2.3 — 3-Segment Audio+Video Extension

The base workflow is a 3-segment LTX 2.3 image-to-video pipeline with full audio integration. You feed in a starting image, a full audio track, and the workflow generates three consecutive video segments that are stitched together temporally — all while keeping the audio correctly sliced, positioned, and reassembled for each generation pass.

Each segment knows where it sits on the audio timeline. Each segment knows exactly how much of the track it needs to condition on. And at the end, the three audio slices are assembled back into a single coherent track that matches the video timeline precisely.

This is not audio simply concatenated at the end. The audio drives the generation from within.

For a quick overview of the new nodes (IAMCCS-nodes version 1.4.0), I’ll be publishing a dedicated post asap.

And of course, for a complete understanding of how they work, I highly recommend checking out the supporters post! 😉


The pipeline — quick walkthrough

The structure is readable and intentional:

  1. Load your LTX 2.3 model, VAE (video + audio), and CLIP — once, shared across all three segments

  2. Resize your starting image once via ResizeImagesByLongerEdge

  3. Segment 0: feed the starting image into LTXVPreprocess → LTXVImgToVideoInplace, encode audio with the correct audio slice for this segment (from AudioExtensionMath + AudioExtender), set noise mask, and sample

  4. Separate audio and video latents (LTXVSeparateAVLatent), decode the video latent with IAMCCS_VAEDecodeTiledSafe

  5. Repeat for Segments 1 and 2 — each receiving the tail frames of the previous segment as its new starting image via LTXVConcatAVLatent, with the audio cursor tracking forward automatically

  6. Final assembly: the three decoded video pieces and the assembled audio are combined into the final output video


The Low RAM variant

There’s a second workflow included: LOW_RAM. The logic is identical to the standard version, but it adds one key difference in the decode step: instead of keeping all decoded frames in memory, it uses IAMCCS_VAEDecodeToDisk to write frames directly to disk as they are decoded, chunk by chunk.

This means the full image tensor never needs to exist in RAM all at once. For long segments at higher resolutions on machines with limited system RAM, this is what makes the pipeline actually runnable.

I won’t go into the internal mechanics here — tiling strategy, seam handling, chunk sizing — that’s covered in the supporter post. But if you’re on a machine with 16–24 GB of RAM and you see memory errors during the decode phase, the Low RAM workflow is where to start.


What you need installed

  • IAMCCS_nodes (updated) — for all the nodes described above

  • ComfyUI-LTXVideo — for LTXVPreprocess, LTXVImgToVideoInplace, LTXVAudioVAEEncode, and the rest of the LTX 2.3 native nodes

  • ComfyUI-KJNodes — for VAELoaderKJ and a few helpers used in the graph


Where to go from here

This post gives you everything you need to load the workflow, understand each block, and start generating.

Start at 1280×720, learn how the segments stitch together, then scale up.

If you want the full technical breakdown — — how VAEDecodeToDisk and the GlobalPlanner node automatically configure clip settings based on your input, handle seams between latent chunks, how to tune overlap and frame math for different FPS configurations, and the reasoning behind every key widget — that’s in the supporter post.

Supporting is also the most direct way to keep this work moving — it funds the time to build, test, document, and share real production-tested pipelines back to the open-source community.

More soon.

Share:

Categorie

  • Cinema

  • Digital

  • Musica

  • Web design

Altri post

[AU+VID] 🎧 LTX-2.3: Turn Audio + Image into Lip-Synced Long Video 🎬 + Smart Settings for Video & Monologue – Preview PREMIUM & VIP supporters Advanced Workflows

Read More »

[VID+AUD] Directing LTX-2.3: From Audio-Guided Lipsync to Full Video Pipeline (Patreon supporters)

Read More »

[VID+AUD] 🎬 Directing with Sound: LTX-2.3 Audio-Guided Performance & Lipsync 👄

Read More »

[VID] LTX-2.3 PROMPTING MASTERCLASS — DIRECTING THE LTX-2.3 MODEL (Patreon supporters)

Read More »

FAIDENBLASS
studio

Hai bisogno di video-editing, post-produzione per il tuo progetto audio-video? Contattaci!
CONTATTACI

FAIDENBLASS
digital

Utilizziamo gli ultimi sistemi di Stable diffusion in locale di generazione di immagini + video. Contattaci per ogni info o richiesta
CONTATTACI

FAIDENBLASS
agency

Costruiamo siti web dinamici e responsivi su wordpress
CONTATTACI

Send Us A Message

PrevPrevious[VID+AUD] Directing LTX-2.3: From Audio-Guided Lipsync to Full Video Pipeline (Patreon supporters)
Next[AU+VID] 🎧 LTX-2.3: Turn Audio + Image into Lip-Synced Long Video 🎬 + Smart Settings for Video & Monologue – Preview PREMIUM & VIP supporters Advanced WorkflowsNext

FAIDENBLASS

Faidenblass: il punto di incontro tra arte analogica e digitale.

links

  • Carmine Cristallo Scalzi
  • Mitologia Elfica
  • faidenblass web agency

pagine sito

  • Home
  • Cinema
  • Studio
  • Digital
  • Royalty Free Music
  • web agency
  • blog
  • Home
  • Cinema
  • Studio
  • Digital
  • Royalty Free Music
  • web agency
  • blog