Is this you? Create Your Porfile

Trong Nhan Phan

Johannes Kepler University of Linz

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Trong Nhan Phan is active.

Explore More

Publication

Featured researches published by Trong Nhan Phan.

international conference on data management in grid and p2p systems | 2014

An Elastic Approximate Similarity Search in Very Large Datasets with MapReduce

Trong Nhan Phan; Josef Küng; Tran Khanh Dang

The outbreak of data brings an era of big data and more challenges than ever before to traditional similarity search which has been spread to a wide range of applications. Furthermore, an unprecedented scale of data being processed may be infeasible or may lead to the paralysis of systems due to the slow performance and high overheads. Dealing with such an unstoppable data growth paves the way not only to similarity search consolidates but also to new trends of data-intensive applications. Aiming at scalability, we propose an elastic approximate similarity search that efficiently works in very large datasets. Moreover, our proposed scheme effectively adapts itself to the well-known similarity searches with pairwise documents, pivot document, range query, and k-nearest neighbour query. Last but not least, these methods, together with our filtering strategies, are implemented and verified by experiments on real large data collections in Hadoop showing their promising effectiveness and efficiency.

international conference on ubiquitous information management and communication | 2012

An open design privacy-enhancing platform supporting location-based applications

Tran Khanh Dang; Chan Nam Ngo; Trong Nhan Phan; Nguyen Nhat Minh Ngo

The world of location-based services (LBS) has been becoming more diversifying and amazing with its rapid growth in recent years. Moreover, the development has spread to many aspects in all walks of life and got powerful promotion from advanced information and communication technologies. Its pervasive moves, however, leave great concerns behind, which can cause roadblocks in the path of its prosperity. Three of them, identified as heterogeneity, user privacy, and context-awareness, have called for much attention and investigation in both research and industry community world-wide. In response to the call, we propose an elastic and open design platform named OpenLS Privacy-aware Middleware (OPM) for location-based applications as a unified solution to these issues.

advanced information networking and applications | 2016

eHSim: An Efficient Hybrid Similarity Search with MapReduce

Trong Nhan Phan; Josef Küng; Tran Khanh Dang

In this paper, we study the problems of scalability and performance for similarity search by proposing eHSim, an efficient hybrid similarity search with MapReduce. More specifically, we introduce clustering schemes that partition objects into different groups by their length. Additionally, we equip our proposed schemes with pruning strategies that quickly discard irrelevant objects before truly computing their similarity. Moreover, we design a hybrid MapReduce architecture that deals with challenges from big data. Furthermore, we implement our proposed methods with MapReduce and make them compatible with the hybrid MapReduce architecture. Last but not least, we evaluate the proposed methods with real datasets. Empirical experiments show that our approach is considerably more efficient than state-of-the-arts in terms of query processing, batch processing, and data storage.

database and expert systems applications | 2015

Range-Based Clustering Supporting Similarity Search in Big Data

Trong Nhan Phan; Markus Jäger; Stefan Nadschläger; Josef Küng

Thanks to state-of-the-art technologies, we have more and more modern infrastructures as well as automatic processes supporting the agricultural domain. Data collected from parcels by these systems and remote sensors for further analysis result in facing the three main challenges which are known as big volume, big variety, and big velocity, in the era of big data. In terms of similarity search, we propose a range-based clustering method that finds objects which are the most similar compared to the given object in a large-scale computing with Map Reduce. The proposed method groups objects into different clusters which are considered as pivots to perform pre-checking before computing similarity. Furthermore, we conduct some basic experiments to evaluate the performance of the proposed method and observe the influences of the clusters in similarity search.

FDSE 2015 Proceedings of the Second International Conference on Future Data and Security Engineering - Volume 9446 | 2015

An Efficient Document Indexing-Based Similarity Search in Large Datasets

Trong Nhan Phan; Markus Jäger; Stefan Nadschläger; Josef Küng; Tran Khanh Dang

In this paper, we principally devote our effort to proposing a novel MapReduce-based approach for efficient similarity search in big data. Specifically, we address the drawbacks of using inverted index in similarity search with MapReduce and then propose a simple yet efficient redundancy-free MapReduce scheme, which not only takes advantages over the baseline inverted index-based procedures but also adapts to various similarity measures and similarity searches. Additionally, we present other strategic methods in order to potentially contribute to eliminating unnecessary data and computations. Last but not least, empirical evaluations are intensively conducted with real massive datasets and Hadoop framework in the cluster of commodity machines to verify the proposed methods, whose promising results show how much beneficial they are when dealing with big data.

International Conference on Future Data and Security Engineering | 2014

An Efficient Similarity Search in Large Data Collections with MapReduce

Trong Nhan Phan; Josef Küng; Tran Khanh Dang

The era of big data has been calling for many innovations on improving similarity search computing. Such unstoppable large amounts of data threaten both processing capacity and performance of existing information systems. Joining the challenges on scalability, we propose an efficient similarity search in large data collections with MapReduce. In addition, we make the best use of the proposed scheme for widespread similarity search cases including pairwise similarity, search by example, range query, and k-Nearest Neighbor query. Moreover, collaborative strategic refinements are utilized to effectively eliminate unnecessary computations and efficiently speed up the whole process. Last but not least, our methods are enhanced by experiments, along with a previous work, on real large datasets, which shows how well these methods are verified.

International Conference on Future Data and Security Engineering | 2016

Incorporating Trust, Certainty and Importance of Information into Knowledge Processing Systems – An Approach

Markus Jäger; Trong Nhan Phan; Christian Huber; Josef Küng

The origin of data (data provenance), should always be measured or categorized within the context of trusting the source of data. Can we be sure that the information we receive is trustworthy and reliable? Is the source trustable? Is the data certain? And how important is the received data the our current and next step of processing? We face these questions in the context of knowledge processing systems by developing a convenient approach to bring all these questions and values – trustability, certainty, importance – into a computable, measurable, and comparable way of expression. Not yet facing the question “How to compute trust or certainty?”, but how to incorporate and process their measured values in knowledge processing systems to receive a representative view on the whole environment and its output.

database and expert systems applications | 2015

Data, Information & Knowledge Sources in the Agricultural Domain

Markus Jäger; Stefan Nadschläger; Trong Nhan Phan; Josef Küng

We try to make a first step towards merging sources in the agricultural domain with experts and methods from the IT sector. The result should help people in this domain to profit from a better and more productive way of using existing experiences by sharing and making them easier accessible. After a short definition of several knowledge-related terms we present existing and possibly useful standards for sources in the agricultural domain. Based on the standards, we give a short overview on existing sources and present a way for automated extraction of information and knowledge from selected sources. Finally we show the usage of some sources, which are implemented in our current research work.

Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIII - Volume 9480 | 2015

An Adaptive Similarity Search in Massive Datasets

Trong Nhan Phan; Josef Küng; Tran Khanh Dang

Similarity search is an important task engaging in different fields of studies as well as in various application domains. The era of big data, however, has been posing challenges on existing information systems in general and on similarity search in particular. Aiming at large-scale data processing, we propose an adaptive similarity search in massive datasets with MapReduce. Additionally, our proposed scheme is both applicable and adaptable to popular similarity search cases such as pairwise similarity, search-by-example, range queries, and k-Nearest Neighbour queries. Moreover, we embed our collaborative refinements to effectively minimize irrelevant data objects as well as unnecessary computations. Furthermore, we experience our proposed methods with the two different document models known as shingles and terms. Last but not least, we conduct intensive empirical experiments not only to verify these methods themselves but also to compare them with a previous related work on real datasets. The results, after all, confirm the effectiveness of our proposed methods and show that they outperform the previous work in terms of query processing.

Archive | 2018

An Efficient Batch Similarity Processing with MapReduce

Trong Nhan Phan; Tran Khanh Dang

In this paper, we study an efficient way for batch similarity processing with MapReduce. With the inverted index as a backbone, we embed metadata inside the indexes to minimize redundant data so as to build lightweight indexes from the data sources. In addition, we propose a general query batch processing scheme that not only handles a single query but also deals with sets of query in an incremental manner. Moreover, we build the indexes in an ordered fashion so that we can perform quick pruning discarding unnecessary objects and supporting the performance of similarity search. Last but not least, we measure our proposed solution by conducting empirical experiments on real datasets. The results verify the efficiency of our method when we do similarity search with query batches, especially when both query sets and data sets are large.

Explore More