Hi folks, this is CCS.
First of all — thank you. Genuinely. Every share and subscription helps me keep pushing this work forward. You give me real reasons to keep going.
As you already know, each of my posts introduces something new from a nodes standpoint — not just a workflow to copy-paste, but an actual technical advancement designed for filmmakers first, and content creators in general. This one is no different.
In the introduction video of this post, you’ve already been able to see a video generated with these workflows — a musical famous in the world of ideas and projects that were never truly born 😉
What this is for
This pipeline (and the new nodes ;)) are designed for anyone who wants a complete, end-to-end production workflow for music videos and short films — and needs longer clips to actually work with.
More room to edit. More room to grade. More room to play.
With a 3-segment setup like this — which is exactly what both workflows shared here implement — you can reach up to 60 seconds of generated video in a single run, fully audio-synchronized, in one shot. And yes: depending on your VRAM and RAM, you can push it to 60 seconds at 1920×1080 with enough headroom.
That said — and I mean this — start at 1280×720.
Understand how the segments connect, how the audio math works, how the extensions chain together before pushing the resolution. This pipeline rewards understanding before brute force.
Here’s a short demo (kept brief for clarity):
The workflow: LTX 2.3 — 3-Segment Audio+Video Extension
The base workflow is a 3-segment LTX 2.3 image-to-video pipeline with full audio integration. You feed in a starting image, a full audio track, and the workflow generates three consecutive video segments that are stitched together temporally — all while keeping the audio correctly sliced, positioned, and reassembled for each generation pass.
Each segment knows where it sits on the audio timeline. Each segment knows exactly how much of the track it needs to condition on. And at the end, the three audio slices are assembled back into a single coherent track that matches the video timeline precisely.
This is not audio simply concatenated at the end. The audio drives the generation from within.
For a quick overview of the new nodes (IAMCCS-nodes version 1.4.0), I’ll be publishing a dedicated post asap.
And of course, for a complete understanding of how they work, I highly recommend checking out the supporters post! 😉
The pipeline — quick walkthrough
The structure is readable and intentional:
-
Load your LTX 2.3 model, VAE (video + audio), and CLIP — once, shared across all three segments
-
Resize your starting image once via ResizeImagesByLongerEdge
-
Segment 0: feed the starting image into LTXVPreprocess → LTXVImgToVideoInplace, encode audio with the correct audio slice for this segment (from AudioExtensionMath + AudioExtender), set noise mask, and sample
-
Separate audio and video latents (LTXVSeparateAVLatent), decode the video latent with IAMCCS_VAEDecodeTiledSafe
-
Repeat for Segments 1 and 2 — each receiving the tail frames of the previous segment as its new starting image via LTXVConcatAVLatent, with the audio cursor tracking forward automatically
-
Final assembly: the three decoded video pieces and the assembled audio are combined into the final output video
The Low RAM variant
There’s a second workflow included: LOW_RAM. The logic is identical to the standard version, but it adds one key difference in the decode step: instead of keeping all decoded frames in memory, it uses IAMCCS_VAEDecodeToDisk to write frames directly to disk as they are decoded, chunk by chunk.
This means the full image tensor never needs to exist in RAM all at once. For long segments at higher resolutions on machines with limited system RAM, this is what makes the pipeline actually runnable.
I won’t go into the internal mechanics here — tiling strategy, seam handling, chunk sizing — that’s covered in the supporter post. But if you’re on a machine with 16–24 GB of RAM and you see memory errors during the decode phase, the Low RAM workflow is where to start.
What you need installed
-
IAMCCS_nodes (updated) — for all the nodes described above
-
ComfyUI-LTXVideo — for LTXVPreprocess, LTXVImgToVideoInplace, LTXVAudioVAEEncode, and the rest of the LTX 2.3 native nodes
-
ComfyUI-KJNodes — for VAELoaderKJ and a few helpers used in the graph
Where to go from here
This post gives you everything you need to load the workflow, understand each block, and start generating.
Start at 1280×720, learn how the segments stitch together, then scale up.
If you want the full technical breakdown — — how VAEDecodeToDisk and the GlobalPlanner node automatically configure clip settings based on your input, handle seams between latent chunks, how to tune overlap and frame math for different FPS configurations, and the reasoning behind every key widget — that’s in the supporter post.
Supporting is also the most direct way to keep this work moving — it funds the time to build, test, document, and share real production-tested pipelines back to the open-source community.
More soon.