ECCV2022 SuperRetina, end-to-end trained keypoint detection and description for retinal image matching.
Cross-modal video retrieval
ECCV2022 LAFF, lightweight attentional feature fusion for text-to-video retrieval
ACMMM2022 Negation Learning, learning to answering video queries with negative cues.
TPAMI2021 Dual Encoding, dual encoding of video and text for text-video matching.
TMM2021 SEA, sentence encoder assembly for better query representation.
ACMMM2019 W2VV++, winning solution for the TRECVID 2018 ad-hoc video search task.
Data
AI for health
JBHI2022 MMC-AMD for multi-modal categorization of age-related macular degeneration.
ICPR2020 Retinal-Lesions for retinal lesion segmentation and DR grading.
ACCV2008 Fundus10K for training and evaluating laser scar detection algorithms.
Cross-lingual / multi-lingual multimedia tasks
ACMMM2023 ChinaOpen, a new video dataset targeted at open-world multimodal learning, with raw data gathered from Bilibili.
TMM2019 COCO-CN, a bilingual image description dataset enriching MS-COCO with manually written Chinese sentences and tags, used for multiple tasks including image tagging, captioning and retrieval, all in a cross-lingual setting.
Cross-modal video retrieval
TV22 V3C1-PC, auto-generated video description dataset for pre-training text-video retrieval models.