Abstract: Natural language video description (NLVD) has recently received strong interest in the computer vision, natural language processing (NLP), multimedia, and autonomous robotics communities.
Abstract: I welcome you to the fourth issue of the IEEE Communications Surveys and Tutorials in 2021. This issue includes 23 papers covering different aspects of communication networks. In particular, ...
The official implementation of NarVid — a framework that enhances text-video retrieval by leveraging frame-level captions (narration) to improve semantic understanding and retrieval accuracy. NarVid ...
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
We are creating multimedia contents everyday and everywhere. While automatic content generation has played a fundamental challenge to multimedia community for decades, recent advances of deep learning ...
Automatic generation of video caption is a challenging task as video is an information-intensive media with complex variations. Most existing methods, either based on language templates or sequence ...