Innovations in Computer Science and Engineering | 2021

Multilingual Crawling Strategies for Information Retrieval from BRICS Academic Websites

 
 
 
 
 

Abstract


This paper proposes a web crawler for finding details of Indian origin academicians working in foreign academic institutions. While collecting the data of Indian origin academicians, we came across BRICS nations. In BRICS, except South Africa, all other countries have university websites in native languages. Even if the English version is available, it is with lesser data that can’t make the decision of whether an academician is of Indian origin or not. This paper proposes a translation method of the data from the main website in the native language to English language. It is to be noted that google translation on such website does not give output in the desired manner. We discover the area of translation using various APIs as well as other techniques available for the same like UNL, NER (provides a supportive role for translation), NMT, etc. Also, we will explore Stanford NER and segmenter for these operations.

Volume None
Pages None
DOI 10.1007/978-981-33-4543-0_17
Language English
Journal Innovations in Computer Science and Engineering

Full Text