How to use different text encoder from the default config? #1088 Closed Edwardmark opened on Jun 24 ...
Thanks for your great project. You use a BERT encoder instead of a CLIP encoder for text. Is there any specific reason for the choice? Thanks Sign up for free to join this conversation on GitHub.
MLLMs, or multimodal large language models, have been advancing lately. By incorporating images into large language models (LLMs) and harnessing the capabilities of LLMs, MLLMs demonstrate exceptional ...