Is this you? Create Your Porfile

Hoang-Quoc Nguyen-Son

Graduate University for Advanced Studies

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hoang-Quoc Nguyen-Son is active.

Explore More

Publication

Featured researches published by Hoang-Quoc Nguyen-Son.

availability, reliability and security | 2012

Automatic Anonymization of Natural Languages Texts Posted on Social Networking Services and Automatic Detection of Disclosure

Hoang-Quoc Nguyen-Son; Quoc-Binh Nguyen; Minh-Triet Tran; Dinh-Thuc Nguyen; Hiroshi Yoshiura; Isao Echizen

One approach to overcoming the problem of too much information about a user being disclosed on social networking services (by the user or by the users friends) through natural language texts (blogs, comments, status updates, etc.) is to anonymize the texts. However, determining which information is sensitive and should thus be anonymized is a challenging problem. Sensitive information is any information about a user that could be used to identify the user. We have developed an algorithm that anonymizes sensitive information in text to be posted by generalization. Synonyms for the anonymized information are used as fingerprints for detecting a discloser of the information. The fingerprints are quantified using the modified discernability metric to enable an appropriate level of anonymity to be used for each group of the users friends. The fingerprints cannot be converted into another one to incorrectly identify a person who has revealed sensitive information. Use of the algorithm to control the disclosure of information on Facebook demonstrated that it works well not only in social networking but also in other areas (health, religion, politics, military, etc.) that store sensitive information.

international conference on digital forensics | 2012

Automatic anonymous fingerprinting of text posted on social networking services

Hoang-Quoc Nguyen-Son; Minh-Triet Tran; Dung Tran Tien; Hiroshi Yoshiura; Noboru Sonehara; Isao Echizen

Social networking services (SNSs) support communication among people via the Internet. However, sensitive information about a user can be disclosed by the users SNS friends. This makes it unsafe for a user to share information with friends in different groups. Moreover, a friend who has disclosed a users information is difficult to identify. One approach to overcoming this problem is to anonymize the sensitive information in text to be posted by generalization, but most methods proposed for this approach are for information in a database. Another approach is to create different fingerprints for certain sensitive information by using various synonyms. However, the methods proposed for doing this do not anonymize the information. We have developed an algorithm for automatically creating enough anonymous fingerprints to cover most cases of SNSs containing sensitive phrases. The fingerprints are created using both generalization and synonymization. A different fingerprinted version of sensitive information is created for each friend that will receive the posted text. The fingerprints not only anonymize a users sensitive information but also can be used to identify a person who has disclosed sensitive information about the user. Fingerprints are quantified using a modified discernability metric to ensure that an appropriate level of privacy is used for each group to receive the posted text. The use of synonyms ensures that an appropriate level of privacy is used for each group to receive the posted text. Moreover, a fingerprint cannot be converted by an attacker into one that causes the algorithm to incorrectly identify a person who has revealed sensitive information. The algorithm was demonstrated by using it in an application for controlling the disclosure of information on Facebook.

international workshop on digital watermarking | 2015

Discriminating Between Computer-Generated Facial Images and Natural Ones Using Smoothness Property and Local Entropy

Huy H. Nguyen; Hoang-Quoc Nguyen-Son; Thuc Dinh Nguyen; Isao Echizen

Discriminating between computer-generated images and natural ones is a crucial problem in digital image forensics. Facial images belong to a special case of this problem. Advances in technology have made it possible for computers to generate realistic multimedia contents that are very difficult to distinguish from non-computer generated contents. This could lead to undesired applications such as face spoofing to bypass authentication systems and distributing harmful unreal images or videos on social media. We have created a method for identifying computer-generated facial images that works effectively for both frontal and angled images. It can also be applied to extracted video frames. This method is based on smoothness property of the faces presented by edges and human skin’s characteristic via local entropy. Experiments demonstrated that performance of the proposed method is better than that of state-of-the-art approaches.

international workshop on digital watermarking | 2013

Anonymizing Temporal Phrases in Natural Language Text to be Posted on Social Networking Services

Hoang-Quoc Nguyen-Son; Anh-Tu Hoang; Minh-Triet Tran; Hiroshi Yoshiura; Noboru Sonehara; Isao Echizen

Time-related information in text posted on-line is one type of personal information targeted by attackers, one reason that sharing information online can be risky. Therefore, time information should be anonymized before it is posted on social networking services. One approach to anonymizing information is to replace sensitive phrases with anonymous phrases, but attackers can usually spot such anonymization due to its unnaturalness. Another approach is to detect temporal passages in the text, but removal of these passages can make the meaning of the text unnatural. We have developed an algorithm that can be used to anonymize time-related personal information by removing the temporal passages when doing so will not change the natural meaning of the message. The temporal phrases are detected by using machine-learned patterns, which are represented by a subtree of the sentence parsing tree. The temporal phrases in the parsing tree are distinguished from other parts of the tree by using temporal taggers integrated into the algorithm. In an experiment with 4008 sentences posted on a social network, 84.53 % of them were anonymized without changing their intended meaning. This is significantly better than the 72.88 % rate of the best previous temporal phrase detection algorithm. Of the learned patterns, the top ten most common ones were used to detect 87.78 % the temporal phrases. This means that only some of the most common patterns can be used to the anonymize temporal phrases in most messages to be posted on an SNS. The algorithm works well not only for temporal phrases in text posted on social networks but also for other types of phrases (such as location and objective ones), other areas (religion, politics, military, etc.), and other languages.

Archive | 2012

New Approach to Anonymity of User Information on Social Networking Services

Hoang-Quoc Nguyen-Son; Quoc-Binh Nguyen; Minh-Triet Tran; Dinh-Thuc Nguyen; Hiroshi Yoshiura; Isao Echizen

Users often share same text information for friends in different level groups on social networking services (SNSs). Moreover, it does not identify a person who has revealed the text. Some approaches overcome this problem by using anonymity text, but most methods for doing this has focused on databases. However, information about a user in SNSs is generally conveyed in sensitive phrases. Therefore, we developed the algorithm for automatically generalizing sensitive phrases. The generalized phrases are quantified by a precision metric to ensure that an appropriate level of privacy is used for each group. The algorithm then automatically creates synonyms for the generalized phrases for use in detecting disclosure. An application using the algorithm was implemented for controlling the posting of information on Facebook.

availability, reliability and security | 2018

Modular Convolutional Neural Network for Discriminating between Computer-Generated Images and Photographic Images

Huy H. Nguyen; Ngoc-Dung T. Tieu; Hoang-Quoc Nguyen-Son; Vincent Nozick; Junichi Yamagishi; Isao Echizen

Discriminating between computer-generated images (CGIs) and photographic images (PIs) is not a new problem in digital image forensics. However, with advances in rendering techniques supported by strong hardware and in generative adversarial networks, CGIs are becoming indistinguishable from PIs in both human and computer perception. This means that malicious actors can use CGIs for spoofing facial authentication systems, impersonating other people, and creating fake news to be spread on social networks. The methods developed for discriminating between CGIs and PIs quickly become outdated and must be regularly enhanced to be able to reduce these attack surfaces. Leveraging recent advances in deep convolutional networks, we have built a modular CGI--PI discriminator with a customized VGG-19 network as the feature extractor, statistical convolutional neural networks as the feature transformers, and a discriminator. We also devised a probabilistic patch aggregation strategy to deal with high-resolution images. This proposed method outperformed a state-of-the-art method and achieved accuracy up to 100%.

Archive | 2018

Vietnamese Paraphrase Identification Using Matching Duplicate Phrases and Similar Words

Hoang-Quoc Nguyen-Son; Nam-Phong Tran; Ngoc-Vien Pham; Minh-Triet Tran; Isao Echizen

Paraphrase identification is a core component for many significant tasks in natural language processing (e.g., text summarization, headline generation). A method suggested by Bach et al. for detecting Vietnamese paraphrase text using nine similarity metrics. The authors state that it is the first method for Vietnamese text. They evaluated the method on vnPara corpus with 3000 sentence pairs. However, this corpus is limited by collecting from few Vietnamese websites. Most other methods have focused on the English text. For instance, our previous method detected paraphrasing sentences by matching identical phrases and close words using Wordnet similarity. This method is unsuitable for Vietnamese due to the restriction of Wordnet corpora and morphological words in Vietnamese. Therefore, we extend the method to identify the paraphrase by proposing a SimVN metric which measures the similarity of two Vietnamese words. We evaluated the proposed method on the vnPara corpus. The result shows that the method achieves better accuracy (97.78%) comparing with the state-of-the-art method (accuracy = 89.10%). The proposed method then creates a high diversity paraphrase corpus with 3134 sentence pairs in eight main topics from the top fifteen popular Vietnamese news websites.

International Conference of the Pacific Association for Computational Linguistics | 2017

Detecting Computer-Generated Text Using Fluency and Noise Features

Hoang-Quoc Nguyen-Son; Isao Echizen

Computer-generated text plays a pivotal role in various applications, but the quality of the generated text is much lower than that of human-generated text. The use of artificially generated “machine text” can thus negatively affect such practical applications as website generation and text corpora collection. A method for distinguishing computer- and human-generated text is thus needed. Previous methods extract fluency features from a limited internal corpus and use them to identify the generated text. We have extended this approach to also estimate fluency using an enormous external corpus. We have also developed a method for extracting and distinguishing the noises characteristically created by a person or a machine. For example, people frequently use spoken noise words (2morrow, wanna, etc.) and misspelled ones (comin, hapy, etc.) while machines frequently generate incorrect expressions (such as untranslated phrases). A method combining these fluency and noise features was evaluated using 1000 original English messages and 1000 artificial English ones translated from Spanish. The results show that this combined method had the highest accuracy (80.35%) and the lowest equal error rate (19.44%) compared with one of state-of-the-art methods, which uses syntactic parser. Moreover, experiments using texts in other languages produced similar results, demonstrated that our proposed method works consistently across various languages.

business information systems | 2015

A Rule-Based Approach for Detecting Location Leaks of Short Text Messages

Hoang-Quoc Nguyen-Son; Minh-Triet Tran; Hiroshi Yoshiura; Noboru Sonehara; Isao Echizen

As of today, millions of people share messages via online social networks, some of which probably contain sensitive information. An adversary can collect these freely available messages and specifically analyze them for privacy leaks, such as the users’ location. Unlike other approaches that try to detect these leaks using complete message streams, we put forward a rule-based approach that works on single and very short messages to detect location leaks. We evaluated our approach based on 2817 tweets from the Tweets2011 data set. It scores significantly better (accuracy = 84.95 %) on detecting whenever a message reveals the user’s location than a baseline using machine learning and three extensions using heuristic. Advantages of our approach are not only to apply for online social network messages but also to extend for other areas (such as email, military, health) and for other languages.

availability, reliability and security | 2014

A System for Anonymizing Temporal Phrases of Message Posted in Online Social Networks and for Detecting Disclosure

Hoang-Quoc Nguyen-Son; Minh-Triet Tran; Hiroshi Yoshiura; Sonehara Noboru; Isao Echizen

Time-related information in message posted on-line is one type of sensitive information targeted by attackers, one reason that sharing information online can be risky. Therefore, time information should be anonymized before it is posted in online social networks (OSNs). One approach to reducing the risk is to anonymize the personal information by removing temporal phrases, but this makes the anonymous message loses too much information. We have proposed a system for creating anonymous fingerprints about temporal phrases to cover most of potential cases of OSN disclosure. The fingerprints not only anonymize time-related information but also can be used to identify a person who has disclosed information about the user. In experiment with 16,647 different temporal phrases extracted from about 16 million tweets, the average number of fingerprints created for an OSN message is 526.05 fingerprints. This is significantly better than the 409.58 fingerprints of the state-of-the-art previous detection temporal phrases algorithm. Fingerprints are quantified using a modified normalized certainty penalty metric to ensure that an appropriate level of information anonymity is used for each users friend. The algorithm works well not only for temporal phrases in message posted on social networks but also for other types of phrases (such as location and objective ones) or other areas (religion, politics, military, etc.).

Explore More