Long Duong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Long Duong is active.

Explore More

Publication

Featured researches published by Long Duong.

international joint conference on natural language processing | 2015

Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser

Long Duong; Trevor Cohn; Steven Bird; Paul Cook

Training a high-accuracy dependency parser requires a large treebank. However, these are costly and time-consuming to build. We propose a learning method that needs less data, based on the observation that there are underlying shared structures across languages. We exploit cues from a different source language in order to guide the learning process. Our model saves at least half of the annotation effort to reach the same accuracy compared with using the purely supervised method.

north american chapter of the association for computational linguistics | 2016

An Attentional Model for Speech Translation Without Transcription.

Long Duong; Antonios Anastasopoulos; David Chiang; Steven Bird; Trevor Cohn

For many low-resource languages, spoken language resources are more likely to be annotated with translations than transcriptions. This bilingual speech data can be used for word-spotting, spoken document retrieval, and even for documentation of endangered languages. We experiment with the neural, attentional model applied to this data. On phoneto-word alignment and translation reranking tasks, we achieve large improvements relative to several baselines. On the more challenging speech-to-word alignment task, our model nearly matches GIZA++’s performance on gold transcriptions, but without recourse to transcriptions or to a lexicon.

empirical methods in natural language processing | 2016

Learning Crosslingual Word Embeddings without Bilingual Corpora

Long Duong; Hiroshi Kanayama; Tengfei Ma; Steven Bird; Trevor Cohn

Crosslingual word embeddings represent lexical items from different languages in the same vector space, enabling transfer of NLP tools. However, previous attempts had expensive resource requirements, difficulty incorporating monolingual data or were unable to handle polysemy. We address these drawbacks in our method which takes advantage of a high coverage dictionary in an EM style training algorithm over monolingual corpora in two languages. Our model achieves state-of-the-art performance on bilingual lexicon induction task exceeding models using large bilingual corpora, and competitive results on the monolingual word similarity and cross-lingual document classification task.

empirical methods in natural language processing | 2014

What Can We Get From 1000 Tokens? A Case Study of Multilingual POS Tagging For Resource-Poor Languages

Long Duong; Trevor Cohn; Karin Verspoor; Steven Bird; Paul Cook

In this paper we address the problem of multilingual part-of-speech tagging for resource-poor languages. We use parallel data to transfer part-of-speech information from resource-rich to resourcepoor languages. Additionally, we use a small amount of annotated data to learn to “correct” errors from projected approach such as tagset mismatch between languages, achieving state-of-the-art performance (91.3%) across 8 languages. Our approach is based on modest data requirements, and uses minimum divergence classification. For situations where no universal tagset mapping is available, we propose an alternate method, resulting in state-of-the-art 85.6% accuracy on the resource-poor language Malagasy.

empirical methods in natural language processing | 2015

A Neural Network Model for Low-Resource Universal Dependency Parsing

Long Duong; Trevor Cohn; Steven Bird; Paul Cook

Accurate dependency parsing requires large treebanks, which are only available for a few languages. We propose a method that takes advantage of shared structure across languages to build a mature parser using less training data. We propose a model for learning a shared “universal” parser that operates over an interlingual continuous representation of language, along with language-specific mapping components. Compared with supervised learning, our methods give a consistent 8-10% improvement across several treebanks in low-resource simulations.

empirical methods in natural language processing | 2016

An Unsupervised Probability Model for Speech-to-Translation Alignment of Low-Resource Languages

Antonios Anastasopoulos; David Chiang; Long Duong

For many low-resource languages, spoken language resources are more likely to be annotated with translations than with transcriptions. Translated speech data is potentially valuable for documenting endangered languages or for training speech translation systems. A first step towards making use of such data would be to automatically align spoken words with their translations. We present a model that combines Dyer et al.s reparameterization of IBM Model 2 (fast-align) and k-means clustering using Dynamic Time Warping as a distance metric. The two components are trained jointly using expectation-maximization. In an extremely low-resource scenario, our model performs significantly better than both a neural model and a strong baseline.

conference on computational natural language learning | 2015

Cross-lingual Transfer for Unsupervised Dependency Parsing Without Parallel Data

Long Duong; Trevor Cohn; Steven Bird; Paul Cook

Cross-lingual transfer has been shown to produce good results for dependency parsing of resource-poor languages. Although this avoids the need for a target language treebank, most approaches have still used large parallel corpora. However, parallel data is scarce for low-resource languages, and we report a new method that does not need parallel data. Our method learns syntactic word embeddings that generalise over the syntactic contexts of a bilingual vocabulary, and incorporates these into a neural network parser. We show empirical improvements over a baseline delexicalised parser on both the CoNLL and Universal Dependency Treebank datasets. We analyse the importance of the source languages, and show that combining multiple source-languages leads to a substantial improvement.

international conference on computational linguistics | 2014

Exploring Methods and Resources for Discriminating Similar Languages

Marco Lui; Ned Letcher; Oliver Adams; Long Duong; Paul Cook; Timothy Baldwin

The Discriminating between Similar Languages (DSL) shared task at VarDial challenged participants to build an automatic language identification system to discriminate between 13 languages in 6 groups of highly-similar languages (or national varieties of the same language). In this paper, we describe the submissions made by team UniMelb-NLP, which took part in both the closed and open categories. We present the text representations and modeling techniques used, including cross-lingual POS tagging as well as fine-grained tags extracted from a deep grammar of English, and discuss additional data we collected for the open submissions, utilizing custombuilt web corpora based on top-level domains as well as existing corpora.

Journal of Ecohydraulics | 2016

Two decades of ecohydraulics: trends of an emerging interdiscipline

Roser Casas-Mulet; Elise King; Doris Hoogeveen; Long Duong; Garima Lakhanpal; Timothy Baldwin; Michael J. Stewardson; J. Angus Webb

ABSTRACT We assessed how the emerging field of ecohydraulics research has changed over two decades by examining the proceedings of the biennial International Symposium on Ecohydraulics. By using Natural Language Processing (NLP) in word usage, this paper provides a deep analysis of a longitudinal dataset and enables us to test more detailed questions than previous snapshots of the ecohydraulics literature. We formulated three main hypotheses related to the degree of multidisciplinarity, interdisciplinarity and transdisciplinarity within ecohydraulics. We investigated temporal changes in author affiliation patterns and identified dominant topics of research. The total number of proceeding papers has increased over time and the field is becoming increasingly global. This and the identification of 10 distinctive macro-topics suggest well-developed multidisciplinarity in ecohydraulics. There has been reasonable stability in individual topics across time, except for 11 (out of 51) significant trends within the macro-topics of Fish responses, Hydraulic modelling, Water quality, Physical habitat modelling and Social responses, suggesting some increase in interdisciplinarity. The proportion of practitioners collaborating with researchers has surprisingly not changed greatly over time, indicating ecohydraulics has been transdisciplinary to some extent from its inception. Our results arguably provide an opportunity to better integrate fundamental understanding into practical applications in water management.

north american chapter of the association for computational linguistics | 2016

UniMelb at SemEval-2016 Task 3: Identifying Similar Questions by Combining a CNN with String Similarity Measures

Timothy Baldwin; Huizhi Liang; Bahar Salehi; Doris Hoogeveen; Yitong Li; Long Duong

This paper describes the results of the participation of The University of Melbourne in the community question-answering (CQA) task of SemEval 2016 (Task 3-B). We obtained a MAP score of 70.2% on the test set, by combining three classifiers: a NaiveBayes classifier and a support vector machine (SVM) each trained over lexical similarity features, and a convolutional neural network (CNN). The CNN uses word embeddings and machine translation evaluation scores as features.

Explore More