ACL 2023 Multi‑lingual Document‑grounded Dialogue Competition Overview
The ACL 2023 Multi‑lingual Document‑grounded Dialogue Competition, hosted by Alibaba DAMO Academy and Nanjing University, introduces the first multilingual document‑dialogue dataset, provides a baseline system, offers a $7,000 prize pool, and invites participants to submit papers to the Doc2dial Workshop for Best Paper awards.
The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023) will be held in Toronto, Canada from July 9‑14, 2023.
The ACL 2023 Multi‑lingual Document‑grounded Dialogue Competition, organized by Alibaba DAMO Academy’s Dialogue Intelligence Team and co‑hosted by Nanjing University, opens the first multilingual document‑dialogue dataset, provides a baseline model, and offers a $7,000 prize pool. Winners will submit papers to the ACL 2023 Doc2dial Workshop and compete for Best Paper and Best Student Paper awards.
Competition details can be found at: https://tianchi.aliyun.com/competition/entrance/532063/information
According to Gartner 2020, over 80% of enterprise data is unstructured, with documents (e.g., manuals, specifications, policies, regulations) being the most prevalent. Enabling dialogue systems to effectively retrieve and use knowledge from such documents is a critical challenge for intelligent information services.
Existing document‑dialogue research has focused mainly on English (EMNLP 2020, 2021) and Chinese (EMNLP 2022), leaving other languages under‑explored. This competition addresses the gap by releasing Vietnamese and French document‑dialogue data (6,954 dialogue turns) and aggregating existing Chinese and English data (32,266 turns), encouraging participants to leverage cross‑lingual similarities.
Dataset and Baseline : A baseline method splits the task into three stages—retrieval, ranking, and generation. The retrieval module selects the top‑N candidate documents based on dialogue history; the ranking module picks the K most relevant documents; the generation module produces the response. Pre‑trained models for each module are provided for four languages (Chinese, English, French, Vietnamese). Evaluation uses the sum of token‑level F1, SacreBLEU, and ROUGE‑L (max 300 points); the baseline scores 156, leaving ample room for improvement.
Prizes :
1st place: $3000
2nd place: $1600
3rd place: $1000
4th place: $800
5th place: $600
The top five teams will submit papers to the Doc2dial Workshop and be eligible for the workshop’s Best Paper and Best Student Paper awards.
Contact :
DingTalk group (QR code provided in original announcement)
WeChat group (QR code provided)
Google Group: https://groups.google.com/g/dialdoc
Workshop website: https://doc2dial.github.io/workshop2023/#shared-task
Organizers :
Yu Haiyang, Algorithm Expert, Alibaba DAMO Academy
Cam‑Tu Nguyen, Associate Professor, Nanjing University
Yu Bowen, Algorithm Expert, Alibaba DAMO Academy
Li Yongbin, Senior Algorithm Expert, Alibaba DAMO Academy
Huang Fei, Researcher, Alibaba DAMO Academy
Sponsor : ModelScope Community (https://modelscope.cn/), the first Chinese AI model open‑source community jointly launched by DAMO Academy and the CCF Open‑Source Development Committee, offering over 535 state‑of‑the‑art models and datasets for AI research.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.