Is this you? Create Your Porfile

Vo Thi Ngoc Chau

Ho Chi Minh City University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vo Thi Ngoc Chau is active.

Explore More

Publication

Featured researches published by Vo Thi Ngoc Chau.

Applied Intelligence | 2017

Fuzzy C-means for english sentiment classification in a distributed system

Vo Ngoc Phu; Nguyen Duy Dat; Vo Thi Ngoc Tran; Vo Thi Ngoc Chau; Tuan A. Nguyen

Sentiment classification plays a significant role in everyday life, in political activities, in activities relating to commodity production, and commercial activities. Finding a solution for the accurate and timely classification of emotions is a challenging task. In this research, we propose a new model for big data sentiment classification in the parallel network environment. Our proposed model uses the Fuzzy C-Means (FCM) method for English sentiment classification with Hadoop MAP (M) /REDUCE (R) in Cloudera. Cloudera is a parallel network environment. Our proposed model can classify the sentiments of millions of English documents in the parallel network environment. We tested our model using the testing data set (which comprised 25,000 English reviews, 12,500 being positive and 12,500 negative) and achieved 60.2 % accuracy. Our English training data set has 60,000 English sentences, comprising 30,000 positive English sentences and 30,000 negative English sentences.

International Journal of Pattern Recognition and Artificial Intelligence | 2017

STING Algorithm Used English Sentiment Classification in a Parallel Environment

Nguyen Duy Dat; Vo Ngoc Phu; Vo Thi Ngoc Tran; Vo Thi Ngoc Chau; Tuan A. Nguyen

Sentiment classification is significant in everyday life of everyone, in political activities, activities of commodity production, commercial activities. In this research, we propose a new model for Big Data sentiment classification in the parallel network environment. Our new model uses STING Algorithm (SA) (in the data mining field) for English document-level sentiment classification with Hadoop Map (M)/Reduce (R) based on the 90,000 English sentences of the training data set in a Cloudera parallel network environment — a distributed system. In the world there is not any scientific study which is similar to this survey. Our new model can classify sentiment of millions of English documents with the shortest execution time in the parallel network environment. We test our new model on the 25,000 English documents of the testing data set and achieved on 61.2% accuracy. Our English training data set includes 45,000 positive English sentences and 45,000 negative English sentences.

Artificial Intelligence Review | 2018

A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics

Vo Ngoc Phu; Vo Thi Ngoc Chau; Vo Thi Ngoc Tran; Nguyen Duy Dat

Abstract Emotion classification is used in many commercial applications and research applications. The semantic classification models (or sentiment classification methods) are based on the vocabulary of the emotion dictionary being studied and being used very much to this day. In this study, a Vietnamese sentiment dictionary includes Vietnamese terms (Vietnamese nouns, Vietnamese verbs, Vietnamese adjectives, etc.) which the valences (and polarities) are calculated by using Ochiai measure through Google search engine and many Vietnamese adjective phrases which the valences (and polarities) are identified based on Vietnamese language characteristics. The Vietnamese adjectives often bear emotion which values (or semantic scores) are not fixed and are changed when they appear in different contexts of these phrases. Therefore, if the Vietnamese adjectives bring sentiment and their semantic values (or their sentiment scores) are not changed in any context, then the results of the emotion classification are not high accuracy. We propose many rules based on Vietnamese language characteristics to determine the emotional values of the Vietnamese adjective phrases bearing sentiment in specific contexts. Our Vietnamese sentiment adjective dictionary is widely used in applications and researches of the Vietnamese semantic classification.

Knowledge and Information Systems | 2017

A valences-totaling model for English sentiment classification

Vo Ngoc Phu; Vo Thi Ngoc Chau; Nguyen Duy Dat; Vo Thi Ngoc Tran; Tuan A. Nguyen

Sentiment classification plays an important role in everyday life, in political activities, activities of commodity production and commercial activities. Finding a time-effective and highly accurate solution to the classification of emotions is challenging. Today, there are many models (or methods) to classify the sentiment of documents. Sentiment classification has been studied for many years and is used widely in many different fields. We propose a new model, which is called the valences-totaling model (VTM), by using cosine measure (CM) to classify the sentiment of English documents. VTM is a new model for English sentiment classification. In this study, CM is a measure of similarity between two words and is used to calculate the valence (and polarity) of English semantic lexicons. We prove that CM is able to identify the sentiment valence and the sentiment polarity of the English sentiment lexicons online in combination with the Google search engine with AND operator and OR operator. VTM uses many English semantic lexicons. These English sentiment lexicons are calculated online and are based on the Internet. We present a full range of English sentences; thus, the emotion expressed in the English text is classified with more precision. Our new model is not dependent on a special domain and training data set—it is a domain-independent classifier. We test our new model on the Internet data in English. The calculated valence (and polarity) of English semantic words in this model is based on many documents on millions of English Web sites and English social networks.

International Journal of Speech Technology | 2017

SVM for English semantic classification in parallel environment

Vo Ngoc Phu; Vo Thi Ngoc Chau; Vo Thi Ngoc Tran

Semantic analysis is very important and very helpful for many researches and many applications for a long time. SVM is a famous algorithm which is used in the researches and applications in many different fields. In this study, we propose a new model using a SVM algorithm with Hadoop Map (M)/Reduce (R) for English document-level emotional classification in the Cloudera parallel network environment. Cloudera is also a distributed system. Our English testing data set has 25,000 English documents, including 12,500 English positive reviews and 12,500 English negative reviews. Our English training data set has 90,000 English sentences, including 45,000 English positive sentences and 45,000 English negative sentences. Our new model is tested on the English testing data set and we achieve 63.7% accuracy of sentiment classification on this English testing data set.

International Journal of Speech Technology | 2017

A decision tree using ID3 algorithm for English semantic analysis

Vo Ngoc Phu; Vo Thi Ngoc Tran; Vo Thi Ngoc Chau; Nguyen Duy Dat; Khanh Ly Doan Duy

Natural language processing has been studied for many years, and it has been applied to many researches and commercial applications. A new model is proposed in this paper, and is used in the English document-level emotional classification. In this survey, we proposed a new model by using an ID3 algorithm of a decision tree to classify semantics (positive, negative, and neutral) for the English documents. The semantic classification of our model is based on many rules which are generated by applying the ID3 algorithm to 115,000 English sentences of our English training data set. We test our new model on the English testing data set including 25,000 English documents, and achieve 63.6% accuracy of sentiment classification results.

International Journal of Speech Technology | 2017

Shifting semantic values of English phrases for classification

Vo Ngoc Phu; Vo Thi Ngoc Chau; Vo Thi Ngoc Tran

The researches of semantics (positive, negative, neutral) are performed for a long time and they are very important for many commercial applications, many scientific works, etc. In this paper we propose a new model to calculate the emotional values (or semantic scores) of English terms (English verbs, English nouns, English adjectives, English adverbs, etc.) as follows: firstly, we create our basis English emotional dictionary (called bEED) by using Sorensen measure (Sorensen coefficient, called SM) through Google search engine with AND operator and OR operator and secondly, many English adjective phrases, English adverb phrases and English verb phrases are created based on the English grammars (the English characteristics) by combining the English adverbs of degree with the English adjectives, the English adverbs and English verbs; finally, the valences of the English adverb phrases are identified by their specific contexts. The English phrases often bring the semantics which the values (or emotional scores) are not fixed and are changed when they appear in their different contexts. Therefore, the results of the sentiment classification are not high accuracy if the English phrases bring the emotions and their semantic values (or their sentiment scores) are not changed in any context. For those reasons, we propose many rules based on English language grammars to calculate the sentimental values of the English phrases bearing emotion in their specific contexts. The results of this work are widely used in applications and researches of the English semantic classification.

2016 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) | 2016

Automatic de-identification of medical records with a multilevel hybrid semi-supervised learning approach

Nguyen Dong Phuong; Vo Thi Ngoc Chau

In recent years, sharing electronic medical records (EMRs) for more researchers outside the associated institutions is significant. For privacy preservation of the corresponding patients and the associated institutions, a de-identification task on the EMRs to be shared is a must. Although the deidentification task has been considered with positive research outcomes worldwide, especially those from the i2b2 (Informatics for Integrating Biology and the Bedside) shared tasks in 2006 and 2014, the task has not yet been a solved problem and still needs more investigation realistically. In this paper, we propose an automatic de-identification solution in a multilevel hybrid semi-supervised learning paradigm with a key focus on correctly identifying protected health information (PHI) in the EMRs. Similar to the existing works, our work defines a hybrid approach by combining a machine learning-based method with a conditional random fields model and a rule-based method in a post-processing phase to handle the PHI types with disambiguity. Nevertheless, our work is more general and practical. First, it considers the structure complexity of each EMR so that each section can be treated properly for more correct PHI identification up to its structure complexity: structured, semi-structured, or un-structured. Second, each EMR is then examined in our approach at three different levels of granularity such as a token level in the supervised learning phase, an entity level in the rule-based post-processing phase, and a section level along with the structure complexity in the semi-supervised learning phase. Many various detail levels will give our approach a deeper look at each EMR for more effectiveness. Third, our solution is conducted in a self-training manner so that it can get started with a small annotated data set in practice and get more effective with new EMRs over time. Evaluated with the i2b2 data set in comparison with the related works, our solution is effective with better F-measure values for the AGE, LOCATION, and PHONE PHI types and comparable for the other PHI types.

national foundation for science and technology development conference on information and computer science | 2015

A robust and effective algorithmic framework for incomplete educational data clustering

Vo Thi Ngoc Chau; Nguyen Hua Phung; Vo Thi Ngoc Tran

Data clustering is one of the popular tasks recently used in the educational data mining arena for grouping similar students by several aspects such as study performance, behavior, skill, etc. Many well-known clustering algorithms such as k-means, expectation-maximization, spectral clustering, etc. were employed in the related works. None of them has taken into consideration the incompleteness of the educational data gathered in an academic credit system. If just a few records have missing values, we might ignore them in the mining task. However, as there are a large number of missing values, ignoring them may lead to the data insufficiency and ineffectiveness of the mining task. Hence, we define a robust and effective algorithmic framework for incomplete educational data clustering using the nearest prototype strategy. Within the framework, we propose two novel incomplete educational data clustering algorithms K_nps and S_nps based on the k-means algorithm and the self-organizing map, respectively. Experimental results have shown that the clusters from our proposed algorithms have better cluster quality as compared to the different existing approaches.

knowledge and systems engineering | 2014

Frequent Temporal Inter-object Pattern Mining in Time Series

Nguyen Thanh Vu; Vo Thi Ngoc Chau

Nowadays, time series is present in many various domains such as finance, medicine, geology, meteorology, etc. Mining time series for useful hidden knowledge is very significant in those domains to help users get fascinating insights into important temporal relationships of objects/phenomena along the time. Hence, in this paper, we introduce a notion of frequent temporal inter-object pattern and accordingly propose two frequent temporal pattern mining algorithms on a set of different time series. As compared to frequent sequential patterns, frequent temporal inter-object patterns are more informative with explicit and exact temporal information automatically discovered from many various time series. The two proposed algorithms which are brute-force and tree-based are efficiently defined in a level-wise bottom-up approach dealing with the combinatorial explosion problem. As shown in experiments on real financial time series, our work can be further used to efficiently enhance the temporal rule mining process on time series.

Explore More