You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ComfyUI Custom Nodes for "TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching". This generates high-quality 44.1kHz audio up to 30 seconds using just a text prompt.
Models can be downloaded using the install.py script
Manual Download:
Download TangoFlux from here into models/tangoflux
Download text encoders from here into models/text_encoders/google-flan-t5-large
(Include Everything as shown in the screenshot above. Do Not Rename Anything)
The nodes can be found in "TangoFlux" category as TangoFluxLoader, TangoFluxSampler, TangoFluxVAEDecodeAndPlay.
If you are on low VRAM, try enabling offload_model_to_cpu in TangoFluxSampler.
The audio output of the TangoFluxVAEDecodeAndPlay can be used as audio input for theComfyUI-VideoHelperSuiteVideoCombine node. (This will not sync audio to the video)
TeaCache can speedup TangoFlux 2x without much audio quality degradation, in a training-free manner.
📈 Inference Latency Comparisons on a Single A800
TangoFlux
TeaCache (0.25)
TeaCache (0.4)
~4.08 s
~2.42 s
~1.95 s
Citation
@misc{hung2024tangofluxsuperfastfaithful,
title={TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization},
author={Chia-Yu Hung and Navonil Majumder and Zhifeng Kong and Ambuj Mehrish and Rafael Valle and Bryan Catanzaro and Soujanya Poria},
year={2024},
eprint={2412.21037},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2412.21037},
}
@article{liu2024timestep,
title={Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model},
author={Liu, Feng and Zhang, Shiwei and Wang, Xiaofeng and Wei, Yujie and Qiu, Haonan and Zhao, Yuzhong and Zhang, Yingya and Ye, Qixiang and Wan, Fang},
journal={arXiv preprint arXiv:2411.19108},
year={2024}
}
About
ComfyUI Custom Nodes for "TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching". This generates high-quality 44.1kHz audio up to 30 seconds using just a text prompt.