Is this you? Create Your Porfile

Lance De Vine

Queensland University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lance De Vine is active.

Explore More

Publication

Featured researches published by Lance De Vine.

conference on information and knowledge management | 2014

Medical Semantic Similarity with a Neural Language Model

Lance De Vine; Guido Zuccon; Bevan Koopman; Laurianne Sitbon; Peter D. Bruza

Advances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather than directly from terms in free-text. This model is employed for the task of measuring semantic similarity between medical concepts, a task that is central to a number of techniques in medical informatics and information retrieval. The model is built with two medical corpora (journal abstracts and patient records) and empirically validated on two ground-truth datasets of human-judged concept pairs assessed by medical professionals. Empirically, our approach correlates closely with expert human assessors (≈0.9) and outperforms a number of state-of-the-art benchmarks for medical semantic similarity. The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).

QI'12 Proceedings of the 6th international conference on Quantum Interaction | 2012

Many paths lead to discovery: analogical retrieval of cancer therapies

Trevor Cohen; Dominic Widdows; Lance De Vine; Roger W. Schvaneveldt; Thomas C. Rindflesch

This paper addresses the issue of analogical inference, and its potential role as the mediator of new therapeutic discoveries, by using disjunction operators based on quantum connectives to combine many potential reasoning pathways into a single search expression. In it, we extend our previous work in which we developed an approach to analogical retrieval using the Predication-based Semantic Indexing (PSI) model, which encodes both concepts and the relationships between them in high-dimensional vector space. As in our previous work, we leverage the ability of PSI to infer predicate pathways connecting two example concepts, in this case comprising of known therapeutic relationships. For example, given that drug x TREATS disease z, we might infer the predicate pathway drug x INTERACTS_WITH gene y ASSOCIATED_WITH disease z, and use this pathway to search for drugs related to another disease in similar ways. As biological systems tend to be characterized by networks of relationships, we evaluate the ability of quantum-inspired operators to mediate inference and retrieval across multiple relations, by testing the ability of different approaches to recover known therapeutic relationships. In addition, we introduce a novel complex vector based implementation of PSI, based on Plates Circular Holographic Reduced Representations, which we utilize for all experiments in addition to the binary vector based approach we have applied in our previous research.

international world wide web conferences | 2015

Parallel Streaming Signature EM-tree: A Clustering Algorithm for Web Scale Applications

Christopher M. De Vries; Lance De Vine; Shlomo Geva; Richi Nayak

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

Journal of the Association for Information Science and Technology | 2017

Clinical Information Extraction Using Small Data: An Active Learning Approach Based on Sequence Representations and Word Embeddings

Mahnoosh Kholghi; Lance De Vine; Laurianne Sitbon; Guido Zuccon; Anthony Nguyen

This article demonstrates the benefits of using sequence representations based on word embeddings to inform the seed selection and sample selection processes in an active learning pipeline for clinical information extraction. Seed selection refers to choosing an initial sample set to label to form an initial learning model. Sample selection refers to selecting informative samples to update the model at each iteration of the active learning process. Compared to supervised machine learning approaches, active learning offers the opportunity to build statistical classifiers with a reduced amount of training samples that require manual annotation. Reducing the manual annotation effort can support automating the clinical information extraction process. This is particularly beneficial in the clinical domain, where manual annotation is a time‐consuming and costly task, as it requires extensive labor from clinical experts. Our empirical findings demonstrate that (a) using sequence representations along with the length of sequence for seed selection shows potential towards more effective initial models, and (b) using sequence representations for sample selection leads to significantly lower manual annotation efforts, with up to 3% and 6% fewer tokens and concepts requiring annotation, respectively, compared to state‐of‐the‐art query strategies.

australasian document computing symposium | 2015

A Study on the Use of Word Embeddings and PageRank for Vietnamese Text Summarization

Viet Phung; Lance De Vine

Automatic text summarization is the process of automatically reducing the length of documents without losing the primary ideas. Due to the flood of digital text-based information, there is a great demand for summarization systems. In this paper, we investigate a number of word-embedding based approaches for sentence representation which are combined with the PageRank algorithm to select sentences for summary construction. We compare these new methods with a range of other current approaches to summarization. While the same summarization approaches can generally be applied across different languages, we target Vietnamese because of the relative lack of previous work in this space and also because it provides a good example of a language which generally requires word segmentation. Our experiments find that a word-embedding and graph based approach is an effective strategy for Vietnamese summarization and that word segmentation is not necessary for achieving good summarization results.

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval | 2009

Clustering with random indexing K-tree and XML structure

Christopher M. De Vries; Shlomo Geva; Lance De Vine

This paper describes the approach taken to the clustering task at INEX 2009 by a group at the Queensland University of Technology. The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX 2009 Wikipedia collection. The RI K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies.

australasian document computing symposium | 2017

Efficient Analogy Completion with Word Embedding Clusters

Lance De Vine; Shlomo Geva; Peter D. Bruza

Word embeddings have attracted much attention in recent years and have been heavily applied to many tasks in information retrieval, natural language processing and knowledge base construction. One of the most well noted aspects of word embeddings is their ability to capture relations between terms via simple vector offsets. This ability is often examined via the use of proportional analogy completion tasks. This task requires that the correct single term be returned by a system when prompted with the three other terms of a proportional analogy. This task usually involves a scan of all stored word embeddings which may be a relatively expensive operation when used as part of a larger system. In some preliminary experiments we show that it is quite easy to achieve significant speedups for analogy completion, with little loss of accuracy, via the use of a simple clustering solution of word vectors. Cluster trees have long been used to improve the efficiency of nearest neighbour retrieval but analogy completion has some additional structure due to the use of analogy completion formulas. We evaluate several configurations of word embedding clusters. Due to its relationship to tasks such as link prediction and knowledge base completion we believe that these results may be of interest to a wider group of people.

international conference on computational linguistics | 2014

Predicting sense convergence with distributional semantics: an application to the CogaLex 2014 shared task

Laurianne Sitbon; Lance De Vine

This paper presents our system to address the CogALex-IV 2014 shared task of identifying a single word most semantically related to a group of 5 words (queries). Our system uses an implementation of a neural language model and identifies the answer word by finding the most semantically similar word representation to the sum of the query representations. It is a fully unsupervised system which learns on around 20% of the UkWaC corpus. It correctly identifies 85 exact correct targets out of 2,000 queries, 285 approximate targets in lists of 5 suggestions.

australasian document computing symposium | 2013

ADCS reaches adulthood: an analysis of the conference and its community over the last eighteen years

Bevan Koopman; Guido Zuccon; Lance De Vine; Aneesha Bakharia; Peter D. Bruza; Laurianne Sitbon; Andrew Gibson

How influential is the Australian Document Computing Symposium (ADCS)? What do ADCS articles speak about and who cites them? Who is the ADCS community and how has it evolved? This paper considers eighteen years of ADCS, investigating both the conference and its community. A content analysis of the proceedings uncovers the diversity of topics covered in ADCS and how these have changed over the years. Citation analysis reveals the impact of the papers. The number of authors and where they originate from reveal who has contributed to the conference. Finally, we generate co-author networks which reveal the collaborations within the community. These networks show how clusters of researchers form, the effect geographic location has on collaboration, and how these have evolved over time.

national conference on artificial intelligence | 2010