ACM SIGCAS Conference on Computing and Sustainable Societies | 2021

mTransDial: Multilingual Dataset for Transport Domain Dialog Systems (Poster)

 
 

Abstract


Task oriented virtual assistants or dialogue systems are being popular for different domains such as restaurant booking, weather update, flight booking etc. The efforts are supported by availability of large scale annotated conversational datasets for such domains. However, the same is not true for transport domain dialogue systems. Moreover, for such systems to be useful, they should be able to handle natural queries submitted by users. For countries like India where most of the people communicate in regional languages, it is important to have such systems support the regional languages. The existing datasets for transport domain are mostly monolingual in nature and support only English language. For countries like India, where people tend to speak multiple languages and have code-mixed conversations the existing systems and the datasets won’t be of much use. To the best of our knowledge, there is no multilingual code-mixed dataset available for designing public transport related conversation systems. In this paper, we propose a code-mixed English-Hindi dataset to accelerate the development of transport domain conversational systems suitable for countries like India. Our dataset has multiple intents like: route finding, bus/train/cab finding, nearby place search, traffic alert queries, out of domain queries. We also provide initial baseline results for user intent identification using existing state of the art models on our dataset and a prototype to show the usability of the work. Extended version for this paper can be found at https://iith.ac.in/~maunendra/papers/COMPASS21-mTransDial.pdf

Volume None
Pages None
DOI 10.1145/3460112.3471977
Language English
Journal ACM SIGCAS Conference on Computing and Sustainable Societies

Full Text