🚀🚀 Welcome to the repo of SALMONN!! The SALMONN model family consists of a series of advanced multi-modal large language models. For more details, please refer to the corresponding branches.
🔥 FAR leverages clean visual context without additional image-to-video fine-tuning: Unconditional pretraining on UCF-101 achieves state-of-the-art results in both video generation (context frame = 0) ...