ChinaOpen Dataset

Overview

ChinaOpen is a new video dataset targeted at open-world multimodal learning, with raw data gathered from Bilibili, a popular Chinese video-sharing website. The dataset has a large webly annotated training set of videos (associated with user-generated titles and tags) and a smaller manually annotated test set of videos (with manually checked user titles / tags, manually written captions, and manual labels describing what visual objects / actions / scenes shown in the visual content).

[Research paper] [Source code]

Milestones

[2023/12/08] ChinaOpen-1k (videos + annotations) is available at Hugging Face

[2023/07/27] The ChinaOpen paper accepted to the main track of ACMMM 2023!

[2023/05/31] Release the Generative Video-to-text Transformer (GVT) demo (Colab | Github) and checkpoints.

[2023/05/09] ChinaOpen-v1.0: ChinaOpen-50k for training + ChinaOpen-1k for test.

Examples

ChinaOpen-50K

A well-dressed young guy with gelled red hair glides across a room and scans it with his eyes.

User title:
西瓜快乐桶清凉解暑幸福感爆棚的夏日小冰桶

User tags:
美食, 制作教程, 美食vlog, 初夏美味发现家

User title:
幼儿园小孩，的日常生活

User tags:
生活记录, 可爱, 萌娃, 幼儿园, 小朋友, 宝宝

User title:
小海豹冲我爬过来了

User tags:
可爱, 萌, 卖萌, 日常, 动物圈, 小海豹

User title:
事实证明，泡面可以修复一切

User tags:
搞笑, 创意, 西瓜, 魔性, 有趣, 水果, 逗比

User title:
柯基之歌，感觉被洗脑了

User tags:
萌宠, 汪星人, 柯基, 动物圈

User title:
胖橘被卡在床沿，看了下自己肚子后，竟选择原路返回，笑喷了

User tags:
萌宠, 喵星人, 猫咪, 日常, 猫, 萌宠vlog

ChinaOpen-1K

User title:
新孩子摸弓的第二天。今天，咱们练三指体系的开弓延迟滞停。
The second day the new child touched the bow. Today, we are practicing the three finger system with delayed start and stop.

Manual caption:
一个男生正在练习射箭。
A boy is practicing archery.

Labels:
Object: 弓箭(bow and arrow)，靶子(target)，男人(man)
Action: 射箭(archery)
Scene: 射箭场(archery range)
User-tag: 弓(arch)，弓箭(bow and arrow)，训练(training)，射箭(archery)

User title:
小鹿太治愈了，被一群小鹿包围了
The deer is too pleasurable: surrounded by a group of deer

Manual caption:
一个人正在森林里喂鹿。
A person is feeding deer in the forest.

Labels:
Object: 鹿(deer)，男人(man)
Action: 喂食(feeding)
Scene: 动物园(zoo)
User-tag: 动物(animal)，可爱(lovely)，小鹿(fawn)

User title:
训练腹肌时为什么会腰痛？
Why does it cause back pain when training abdominal muscles?

Manual caption:
一个戴着水手帽的男人躺在斜板凳上训练腹肌。
A man wearing a sailor's hat was lying on a sloping bench training his abdominal muscles.

Labels:
Object: 男人(man)，凳子(stool)
Action: 仰卧起坐(sit-up)
Scene: 健身房(gym)
User-tag: 健身(bodybuilding)，运动(exercise)

User title:
老外作死“用电钻吃玉米”结果悲剧了, 他的牙齿还好吗-_超清
The foreigner made a fool of himself by "eating corn with an electric drill" and it turned out to be a tragedy. Are his teeth okay?-_ Ultraclear

Manual caption:
一个男人用电钻吃玉米棒子。
A man eats corn cobs with an electric drill.

Labels:
Object: 玉米(corn)，男人(man)，摄像机(video camera)，番茄酱(ketchup)
Action: 吃玉米(eating corn)
Scene: 室外(outdoor)
User-tag: 搞笑(funny)

User title:
好久没见过这么敬业的猫了，好可怜的老鼠
I haven't seen such a dedicated cat in a long time, what a pitiful mouse

Manual caption:
一只橙色猫爬上树，抓老鼠。
An orange cat climbed up a tree and caught mice.

Labels:
Object: 猫(cat)，树(tree)，老鼠(mouse)
Action: 爬树(climbing tree)
Scene: 野外(field)
User-tag: 猫(cat)，老鼠(mouse)

User title:
大象的饭量确实不小
Elephants do have big appetites.

Manual caption:
一个人正在给一头大象喂香蕉。
A person is feeding bananas to an elephant.

Labels:
Object: 大象(elephant)，香蕉(banana)，饲养员(breeder)
Action: 喂大象(feeding the elephant)，吃香蕉(eating bananas)
Scene: 动物园(zoo)
User-tag: 大象(elephant)，动物(animal)

Overview

Download

Examples

ChinaOpen-50K

ChinaOpen-1K