Fenglong Ma
University at Buffalo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fenglong Ma.
knowledge discovery and data mining | 2016
Houping Xiao; Jing Gao; Qi Li; Fenglong Ma; Lu Su; Yunlong Feng; Aidong Zhang
The demand for automatic extraction of true information (i.e., truths) from conflicting multi-source data has soared recently. A variety of truth discovery methods have witnessed great successes via jointly estimating source reliability and truths. All existing truth discovery methods focus on providing a point estimator for each objects truth, but in many real-world applications, confidence interval estimation of truths is more desirable, since confidence interval contains richer information. To address this challenge, in this paper, we propose a novel truth discovery method (ETCIBoot) to construct confidence interval estimates as well as identify truths, where the bootstrapping techniques are nicely integrated into the truth discovery procedure. Due to the properties of bootstrapping, the estimators obtained by ETCIBoot are more accurate and robust compared with the state-of-the-art truth discovery approaches. Theoretically, we prove the asymptotical consistency of the confidence interval obtained by ETCIBoot. Experimentally, we demonstrate that ETCIBoot is not only effective in constructing confidence intervals but also able to obtain better truth estimates.
knowledge discovery and data mining | 2017
Fenglong Ma; Chuishi Meng; Houping Xiao; Qi Li; Jing Gao; Lu Su; Aidong Zhang
Drug side-effects become a worldwide public health concern, which are the fourth leading cause of death in the United States. Pharmaceutical industry has paid tremendous effort to identify drug side-effects during the drug development. However, it is impossible and impractical to identify all of them. Fortunately, drug side-effects can also be reported on heterogeneous platforms (i.e., data sources), such as FDA Adverse Event Reporting System and various online communities. However, existing supervised and semi-supervised approaches are not practical as annotating labels are expensive in the medical field. In this paper, we propose a novel and effective unsupervised model Sifter to automatically discover drug side-effects. Sifter enhances the estimation on drug side-effects by learning from various online platforms and measuring platform-level and user-level quality simultaneously. In this way, Sifter demonstrates better performance compared with existing approaches in terms of correctly identifying drug side-effects. Experimental results on five real-world datasets show that Sifter can significantly improve the performance of identifying side-effects compared with the state-of-the-art approaches.
international conference on data mining | 2014
Xinyue Liu; Hua Shen; Fenglong Ma; Wenxin Liang
Topical Influential User Analysis (TIUA) is an important technique in Twitter. Existing techniques neglected relationship strength between users, which is a crucial aspect for TIUA. For modeling relationship strength, interaction frequency between users has not been considered in previous works. In this paper, we firstly introduce a poisson regression-based latent variable model to estimate relationship strength by utilizing interaction frequency. We then propose a novel TIUA framework which uses not only retweeting relationship but also relationship strength. Experimental results show that the proposed TIUA algorithm can greatly improve the precision and relevance on finding topical influential users in Twitter.
web search and data mining | 2016
Qi Li; Fenglong Ma; Jing Gao; Lu Su; Christopher J. Quinn
In the past decade, commercial crowdsourcing platforms have revolutionized the ways of classifying and annotating data, especially for large datasets. Obtaining labels for a single instance can be inexpensive, but for large datasets, it is important to allocate budgets wisely. With limited budgets, requesters must trade-off between the quantity of labeled instances and the quality of the final results. Existing budget allocation methods can achieve good quantity but cannot guarantee high quality of individual instances under a tight budget. However, in some scenarios, requesters may be willing to label fewer instances but of higher quality. Moreover, they may have different requirements on quality for different tasks. To address these challenges, we propose a flexible budget allocation framework called Requallo. Requallo allows requesters to set their specific requirements on the labeling quality and maximizes the number of labeled instances that achieve the quality requirement under a tight budget. The budget allocation problem is modeled as a Markov decision process and a sequential labeling policy is produced. The proposed policy greedily searches for the instance to query next as the one that can provide the maximum reward for the goal. The Requallo framework is further extended to consider worker reliability so that the budget can be better allocated. Experiments on two real-world crowdsourcing tasks as well as a simulated task demonstrate that when the budget is tight, the proposed Requallo framework outperforms existing state-of-the-art budget allocation methods from both quantity and quality aspects.
international conference on data mining | 2016
Guangxu Xun; Vishrawas Gopalakrishnan; Fenglong Ma; Yaliang Li; Jing Gao; Aidong Zhang
Discovering topics in short texts, such as news titles and tweets, has become an important task for many content analysis applications. However, due to the lack of rich context information in short texts, the performance of conventional topic models on short texts is usually unsatisfying. In this paper, we propose a novel topic model for short text corpus using word embeddings. Continuous space word embeddings, which is proven effective at capturing regularities in language, is incorporated into our model to provide additional semantics. Thus we model each short document as a Gaussian topic over word embeddings in the vector space. In addition, considering that background words in a short text are usually not semantically related, we introduce a discrete background mode over word types to complement the continuous Gaussian topics. We evaluate our model on news titles from data sources like abcnews, showing that our model is able to extract more coherent topics from short texts compared with the baseline methods and learn better topic representation for each short document.
knowledge discovery and data mining | 2018
Fenglong Ma; Jing Gao; Qiuling Suo; Quanzeng You; Jing Zhou; Aidong Zhang
Predicting the risk of potential diseases from Electronic Health Records (EHR) has attracted considerable attention in recent years, especially with the development of deep learning techniques. Compared with traditional machine learning models, deep learning based approaches achieve superior performance on risk prediction task. However, none of existing work explicitly takes prior medical knowledge (such as the relationships between diseases and corresponding risk factors) into account. In medical domain, knowledge is usually represented by discrete and arbitrary rules. Thus, how to integrate such medical rules into existing risk prediction models to improve the performance is a challenge. To tackle this challenge, we propose a novel and general framework called PRIME for risk prediction task, which can successfully incorporate discrete prior medical knowledge into all of the state-of-the-art predictive models using posterior regularization technique. Different from traditional posterior regularization, we do not need to manually set a bound for each piece of prior medical knowledge when modeling desired distribution of the target disease on patients. Moreover, the proposed PRIME can automatically learn the importance of different prior knowledge with a log-linear model.Experimental results on three real medical datasets demonstrate the effectiveness of the proposed framework for the task of risk prediction
knowledge discovery and data mining | 2018
Yaqing Wang; Fenglong Ma; Zhiwei Jin; Ye Yuan; Guangxu Xun; Kishlay Jha; Lu Su; Jing Gao
As news reading on social media becomes more and more popular, fake news becomes a major issue concerning the public and government. The fake news can take advantage of multimedia content to mislead readers and get dissemination, which can cause negative effects or even manipulate the public events. One of the unique challenges for fake news detection on social media is how to identify fake news on newly emerged events. Unfortunately, most of the existing approaches can hardly handle this challenge, since they tend to learn event-specific features that can not be transferred to unseen events. In order to address this issue, we propose an end-to-end framework named Event Adversarial Neural Network (EANN), which can derive event-invariant features and thus benefit the detection of fake news on newly arrived events. It consists of three main components: the multi-modal feature extractor, the fake news detector, and the event discriminator. The multi-modal feature extractor is responsible for extracting the textual and visual features from posts. It cooperates with the fake news detector to learn the discriminable representation for the detection of fake news. The role of event discriminator is to remove the event-specific features and keep shared features among events. Extensive experiments are conducted on multimedia datasets collected from Weibo and Twitter. The experimental results show our proposed EANN model can outperform the state-of-the-art methods, and learn transferable feature representations.
acm/ieee international conference on mobile computing and networking | 2018
Wenjun Jiang; Dimitrios Koutsonikolas; Wenyao Xu; Lu Su; Chenglin Miao; Fenglong Ma; Shuochao Yao; Yaqing Wang; Ye Yuan; Hongfei Xue; Chen Song; Xin Ma
Driven by a wide range of real-world applications, significant efforts have recently been made to explore device-free human activity recognition techniques that utilize the information collected by various wireless infrastructures to infer human activities without the need for the monitored subject to carry a dedicated device. Existing device free human activity recognition approaches and systems, though yielding reasonably good performance in certain cases, are faced with a major challenge. The wireless signals arriving at the receiving devices usually carry substantial information that is specific to the environment where the activities are recorded and the human subject who conducts the activities. Due to this reason, an activity recognition model that is trained on a specific subject in a specific environment typically does not work well when being applied to predict another subjects activities that are recorded in a different environment. To address this challenge, in this paper, we propose EI, a deep-learning based device free activity recognition framework that can remove the environment and subject specific information contained in the activity data and extract environment/subject-independent features shared by the data collected on different subjects under different environments. We conduct extensive experiments on four different device free activity recognition testbeds: WiFi, ultrasound, 60 GHz mmWave, and visible light. The experimental results demonstrate the superior effectiveness and generalizability of the proposed EI framework.
knowledge discovery and data mining | 2015
Fenglong Ma; Yaliang Li; Qi Li; Minghui Qiu; Jing Gao; Shi Zhi; Lu Su; Bo Zhao; Heng Ji; Jiawei Han
knowledge discovery and data mining | 2017
Fenglong Ma; Radha Chitta; Jing Zhou; Quanzeng You; Tong Sun; Jing Gao