Code

Retinal image matching

  • ECCV2022 SuperRetina, end-to-end trained keypoint detection and description for retinal image matching.

Cross-modal video retrieval

  • ECCV2022 LAFF, lightweight attentional feature fusion for text-to-video retrieval
  • ACMMM2022 Negation Learning, learning to answering video queries with negative cues.
  • TPAMI2021 Dual Encoding, dual encoding of video and text for text-video matching.
  • TMM2021 SEA, sentence encoder assembly for better query representation.
  • ACMMM2019 W2VV++, winning solution for the TRECVID 2018 ad-hoc video search task.

Data

AI for health

  • JBHI2022 MMC-AMD for multi-modal categorization of age-related macular degeneration.
  • ICPR2020 Retinal-Lesions for retinal lesion segmentation and DR grading.
  • ACCV2008 Fundus10K for training and evaluating laser scar detection algorithms.

Cross-lingual / multi-lingual multimedia tasks

  • ACMMM2023 ChinaOpen, a new video dataset targeted at open-world multimodal learning, with raw data gathered from Bilibili.
  • TMM2019 COCO-CN, a bilingual image description dataset enriching MS-COCO with manually written Chinese sentences and tags, used for multiple tasks including image tagging, captioning and retrieval, all in a cross-lingual setting.

Cross-modal video retrieval

  • TV22 V3C1-PC, auto-generated video description dataset for pre-training text-video retrieval models.