Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marco Lui is active.

Publication


Featured researches published by Marco Lui.


Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM) | 2014

Accurate Language Identification of Twitter Messages

Marco Lui; Timothy Baldwin

We present an evaluation of “off-theshelf” language identification systems as applied to microblog messages from Twitter. A key challenge is the lack of an adequate corpus of messages annotated for language that reflects the linguistic diversity present on Twitter. We overcome this through a “mostly-automated” approach to gathering language-labeled Twitter messages for evaluating language identification. We present the method to construct this dataset, as well as empirical results over existing datasets and off-theshelf language identifiers. We also test techniques that have been proposed in the literature to boost language identification performance over Twitter messages. We find that simple voting over three specific systems consistently outperforms any specific system, and achieves state-of-the-art accuracy on the task.


international conference on computational linguistics | 2014

Exploring Methods and Resources for Discriminating Similar Languages

Marco Lui; Ned Letcher; Oliver Adams; Long Duong; Paul Cook; Timothy Baldwin

The Discriminating between Similar Languages (DSL) shared task at VarDial challenged participants to build an automatic language identification system to discriminate between 13 languages in 6 groups of highly-similar languages (or national varieties of the same language). In this paper, we describe the submissions made by team UniMelb-NLP, which took part in both the closed and open categories. We present the text representations and modeling techniques used, including cross-lingual POS tagging as well as fine-grained tags extracted from a deep grammar of English, and discuss additional data we collected for the open submissions, utilizing custombuilt web corpora based on top-level domains as well as existing corpora.


meeting of the association for computational linguistics | 2012

langid.py: An Off-the-shelf Language Identification Tool

Marco Lui; Timothy Baldwin


international joint conference on natural language processing | 2013

How Noisy Social Media Text, How Diffrnt Social Media Sources?

Timothy Baldwin; Paul Cook; Marco Lui; Andrew MacKinlay; Li Wang


north american chapter of the association for computational linguistics | 2010

Language Identification: The Long and the Short of the Matter

Timothy Baldwin; Marco Lui


international joint conference on natural language processing | 2011

Cross-domain Feature Selection for Language Identification

Marco Lui; Timothy Baldwin


Transactions of the Association for Computational Linguistics | 2014

Automatic Detection and Language Identification of Multilingual Documents

Marco Lui; Jey Han Lau; Timothy Baldwin


empirical methods in natural language processing | 2011

Predicting Thread Discourse Structure over Technical Web Forums

Li Wang; Marco Lui; Su Nam Kim; Joakim Nivre; Timothy Baldwin


Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013) | 2013

Classifying English Documents by National Dialect

Marco Lui; Paul Cook


north american chapter of the association for computational linguistics | 2010

Intelligent Linux Information Access by Data Mining: the ILIAD Project

Timothy Baldwin; David Martinez; Richard B. Penman; Su Nam Kim; Marco Lui; Li Wang; Andrew MacKinlay

Collaboration


Dive into the Marco Lui's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Paul Cook

University of Melbourne

View shared research outputs
Top Co-Authors

Avatar

Li Wang

University of Melbourne

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Su Nam Kim

University of Melbourne

View shared research outputs
Top Co-Authors

Avatar

Bahar Salehi

University of Melbourne

View shared research outputs
Top Co-Authors

Avatar

Bo Han

University of Melbourne

View shared research outputs
Top Co-Authors

Avatar

Jey Han Lau

University of Melbourne

View shared research outputs
Top Co-Authors

Avatar

Karl Grieser

University of Melbourne

View shared research outputs
Top Co-Authors

Avatar

Long Duong

University of Melbourne

View shared research outputs
Researchain Logo
Decentralizing Knowledge