Hi folks, this is CCS. Some news from open source video generation models.
Lightricks released the open weights of LTX-2, and this is a meaningful moment for anyone working seriously with AI video inside ComfyUI.
LTX-2 is not a showcase model but a real frontier system: a unified audio-video model capable of generating clips up to 4K resolution, 50 fps, and up to 20 seconds in length. It is heavy, demanding in VRAM, and clearly designed for research and advanced pipelines rather than casual use.
LTX-2 comes as a family of models rather than a single checkpoint. The base version supports both text-to-video and image-to-video and is the one intended for customization and training.
A distilled 8-step variant exists for fast iteration, while camera control LoRAs introduce explicit control over camera movement.
Latent upsamplers and depth, canny, and pose LoRAs extend the model toward more structured, compositional workflows. In practice, this makes LTX-2 one of the first open systems that can realistically be approached with a cinematic mindset instead of a purely illustrative one.
ControlNet and LTX-2 integration will be covered in a dedicated follow-up post.
Gemma is used for the encoding stage, but it currently tends to trigger a significant number of out-of-memory issues. For this reason, many of us are relying on a more stable FP8 Gemma variant.
The models are available in BF16, but the FP8 (NVFP8) versions are the most interesting in real-world use, reducing model size significantly and offering much faster performance on modern RTX GPUs.
Official LTX-2 FP8 checkpoint: https://huggingface.co/Lightricks/LTX-2/blob/main/ltx-2-19b-dev-fp8.safetensors
Even so, VRAM usage scales quickly with resolution, frame rate, clip length, and steps. Weight streaming in ComfyUI allows partial offloading to system memory when VRAM is exceeded, but with a clear performance penalty. The recommended approach is to iterate at short durations and moderate resolutions, then increase quality only once motion and structure are working.
For users with limited VRAM, dedicated low-VRAM loader nodes are available, ensuring correct execution order and aggressive offloading so generation can fit within a 32 GB VRAM budget.
LTX-2 is not effortless, but it is an important step toward open, high-end generative cinema tools that reward understanding and intentional use rather than presets.
The IAMCCS_workflows
Before diving into the workflows, make sure to download IAMCCS_annotate v2.0.0 to properly view and toggle all in-graph annotations, and update IAMCCS_nodes to access the new LTX-2 utility nodes designed to support and extend these pipelines.
github.com/IAMCCS/IAMCCS_annotate
github.com/IAMCCS/IAMCCS_nodes
You can refer to the previous post for node instructions and additional details.
https://www.patreon.com/posts/update-iamccs-2-148353829?utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=postshare_creator&utm_content=join_link
IAMCCS_LTX-2_TEXT 2 VIDEO V-1
This workflow is built as a clear, modular I2V pipeline for LTX-2 inside ComfyUI.
You first choose the VRAM path (high or low) to match your hardware and avoid unnecessary bottlenecks.
Prompting can happen in two ways: Qwen-VL instructions or a custom manual prompt, depending on control needs.
The reference image is uploaded once and used as the visual anchor for motion consistency.
Video parameters (duration, fps, resolution) are handled in a dedicated settings block.
The core LTX-2 nodes translate image + prompt into coherent motion over time.
Annotations guide each step but can be hidden instantly with ALT+S for a clean canvas.
The result is a flexible, readable workflow designed for real experimentation, not black-box generation.
Example 1:
Example 2:
IAMCCS_LTX-2_IMAGE 2 VIDEO V-1
This workflow follows the same core pipeline as the previous one, with a similar modular structure.
The first step is selecting the appropriate VRAM path (high or low) based on your hardware.
Prompting is handled either through Qwen-VL instructions or a custom manual prompt, just like before.
A single reference image is uploaded and used to anchor visual continuity across the clip.
Video parameters are isolated in a dedicated block for timing, resolution, and fps control.
The LTX-2 nodes convert the image-prompt pair into structured motion over time.
Annotations explain each section of the graph and the logic behind it.
Press ALT+S at any time to hide the notes and work on a clean canvas.
Example 1:
Example 2:
Final Notes
LTX-2 is opening a new phase for controllable image-to-video and text-to-video generation.
Its current limitations are expected to be smoothed out through upcoming updates and model refinements.
As the model becomes better understood, workflows and custom nodes will evolve into more precise and sophisticated tools.
What really matters is the direction: structured motion, temporal logic, and cinematic control.
The links:
LTX-Video
https://github.com/Lightricks/LTX-Video
GGUF LTX-2
unsloth/LTX-2-GGUF at main
LTX-2 List of files you need, all links in here:
https://github.com/Lightricks/LTX-2
Also more info about model files in Hugging Face
https://huggingface.co/collections/Lightricks/ltx-2
ComfyUI-LTXVideo Custom Node
https://github.com/Lightricks/ComfyUI-LTXVideo
🅛🅣🅧 Gemma 3 Model Loader
https://github.com/Lightricks/ComfyUI-LTXVideo?tab=readme-ov-file#required-models
Comfy-Org/ltx-2 – gemma_3_12B_it.safetensors
https://huggingface.co/Comfy-Org/ltx-2/tree/main/split_files/text_encoders
or the 4bit version:
https://huggingface.co/unsloth/gemma-3-12b-it-unsloth-bnb-4bit
For Res_2s sampling
https://github.com/ClownsharkBatwing/RES4LYF
A tip: for low vram users (also 12gb vram and at least 32 gb ram) go with the arguments:
–lowvram –cache-none –reserve-vram 4
and download the Gemma-3 fp8 model:
GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn at main
Grab the workflows and enjoy!
In the upcoming posts, I’ll explore LTX-2 workflows with ControlNet and image-driven lip-sync pipelines, and until then I warmly invite my dear supporters to experiment, take risks, and share your generations — this is where the language of the model really starts to take shape.