Sara Javanmardi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sara Javanmardi is active.

Explore More

Publication

Featured researches published by Sara Javanmardi.

international symposium on wikis and open collaboration | 2011

Vandalism detection in Wikipedia: a high-performing, feature-rich model and its reduction through Lasso

Sara Javanmardi; David W. McDonald; Cristina Videira Lopes

User generated content (UGC) constitutes a significant fraction of the Web. However, some wiiki-based sites, such as Wikipedia, are so popular that they have become a favorite target of spammers and other vandals. In such popular sites, human vigilance is not enough to combat vandalism, and tools that detect possible vandalism and poor-quality contributions become a necessity. The application of machine learning techniques holds promise for developing efficient online algorithms for better tools to assist users in vandalism detection. We describe an efficient and accurate classifier that performs vandalism detection in UGC sites. We show the results of our classifier in the PAN Wikipedia dataset. We explore the effectiveness of a combination of 66 individual features that produce an AUC of 0.9553 on a test dataset -- the best result to our knowledge. Using Lasso optimization we then reduce our feature--rich model to a much smaller and more efficient model of 28 features that performs almost as well -- the drop in AUC being only 0.005. We describe how this approach can be generalized to other user generated content systems and describe several applications of this classifier to help users identify potential vandalism.

knowledge discovery and data mining | 2010

Statistical measure of quality in Wikipedia

Sara Javanmardi; Cristina Videira Lopes

Wikipedia is commonly viewed as the main online encyclopedia. Its content quality, however, has often been questioned due to the open nature of its editing model. A high--quality contribution by an expert may be followed by a low-quality contribution made by an amateur or a vandal; therefore the quality of each article may fluctuate over time as it goes through iterations of edits by different users. With the increasing use of Wikipedia, the need for a reliable assessment of the quality of the content is also rising. In this study, we model the evolution of content quality in Wikipedia articles in order to estimate the fraction of time during which articles retain high-quality status. To evaluate the model, we assess the quality of Wikipedias featured and non-featured articles. We show how the model reproduces consistent results with what is expected. As a case study, we use the model in a CalSWIM mashup the content of which is taken from both highly reliable sources and Wikipedia, which may be less so. Integrating CalSWIM with a trust management system enables it to use not only recency but also quality as its criteria, and thus filter out vandalized or poor-quality content.

collaborative computing | 2007

Modeling trust in collaborative information systems

Sara Javanmardi; Cristina Videira Lopes

Collaborative systems available on the Web allow millions of users to share information through a growing collection of tools and platforms such as wikis, blogs and shared forums. All of these systems contain information and resources with different degrees of sensitivity. However, the open nature of such infrastructures makes it difficult for users to determine the reliability of the available information and trustworthiness of information providers. Hence, integrating trust management systems to open collaborative systems can play a crucial role in the growth and popularity of open information repositories. In this paper, we present a trust model for collaborative systems, namely for platforms based on Wiki technology. This model, based on hidden Markov models, estimates the reputation of the contributors and the reliability of the content dynamically. The focus of this paper is on reputation estimation. Evaluation results based on a subset of Wikipedia shows that the model can effectively be used for identifying vandals, and users with high quality contributions.

Proceedings of the Third Workshop on Large Scale Data Mining | 2011

Distributed tuning of machine learning algorithms using MapReduce Clusters

Yasser Ganjisaffar; Thomas Debeauvais; Sara Javanmardi; Rich Caruana; Cristina Videira Lopes

Obtaining the best accuracy in machine learning usually requires carefully tuning learning algorithm parameters for each problem. Parameter optimization is computationally challenging for learning methods with many hyperparameters. In this paper we show that MapReduce Clusters are particularly well suited for parallel parameter optimization. We use MapReduce to optimize regularization parameters for boosted trees and random forests on several text problems: three retrieval ranking problems and a Wikipedia vandalism problem. We show how model accuracy improves as a function of the percent of parameter space explored, that accuracy can be hurt by exploring parameter space too aggressively, and that there can be significant interaction between parameters that appear to be independent. Our results suggest that MapReduce is a two-edged sword: it makes parameter optimization feasible on a massive scale that would have been unimaginable just a few years ago, but also creates a new opportunity for overfitting that can reduce accuracy and lead to inferior learning parameters.

international world wide web conferences | 2010

Optimizing two stage bigram language models for IR

Sara Javanmardi; Jianfeng Gao; Kuansan Wang

Although higher order language models (LMs) have shown benefit of capturing word dependencies for Information retrieval(IR), the tuning of the increased number of free parameters remains a formidable engineering challenge. Consequently,in many real world retrieval systems, applying higher order LMs is an exception rather than the rule. In this study, we address the parameter tuning problem using a framework based on a linear ranking model in which different component models are incorporated as features. Using unigram and bigram LMs with 2 stage smoothing as examples, we show that our method leads to a bigram LM that outperforms significantly its unigram counterpart and the well-tuned BM25 model.

international symposium on wikis and open collaboration | 2009

Leveraging crowdsourcing heuristics to improve search in Wikipedia

Yasser Ganjisaffar; Sara Javanmardi; Cristina Videira Lopes

Wikipedia articles are usually accompanied with history pages, categories and talk pages. The meta--data available in these pages can be analyzed to gain a better understanding of the content and quality of the articles. We analyze the quality of search results of the current major Web search engines (Google, Yahoo! and Live) in Wikipedia. We discuss how the rich meta--data available in wiki pages can be used to provide better search results in Wikipedia. We investigate the effect of incorporating the extent of review of an article into ranking of search results. The extent of review is measured by the number of distinct editors who have contributed to the articles and is extracted by processing Wikipedias history pages. Our experimental results show that re--ranking search results of the three major Web search engines, using the review feature, improves quality of their rankings for Wikipedia--specific searches.

Community-Built Databases | 2011

Trust in Online Collaborative IS

Sara Javanmardi; Cristina Videira Lopes

Collaborative systems available on the Web allow millions of users to share information through a growing collection of tools and platforms such as wikis, blogs, and shared forums. Simple editing interfaces encourage users to create and maintain repositories of shared content. Online information repositories such as wikis, forums, and blogs have increased the participation of the general public in the production of Web content through the notion of social software [1–3]. The open nature of these systems, however, makes it difficult for users to trust the quality of the available information and the reputation of its providers. Online information repositories, especially in the form of wikis, are widely used on the Web. Wikis were originally designed to hide the association between a wiki page and the authors who have produced it [4]. The main advantages of this feature are as follows: (a) it eliminates the social biases associated with group deliberation, thus contributing to the diversity of opinions and to the collective intelligence of the group, and (b) it directs authors toward group goals, rather than individual benefits [5]. In addition, one of the key characteristics of wiki software is its very low-cost collective content creation, requiring only a regular Web browser and a simple markup language. This feature makes wiki software a popular choice for content creation projects where minimizing overhead is of high priority, especially in creating new or editing already existing content. “Wikinomics” is a recent term that denotes the art and science of peer production when masses of people collaborate to create innovative knowledge resources [6].

Archive | 2013

Learning to Detect Vandalism in Social Content Systems: A Study on Wikipedia

Sara Javanmardi; David W. McDonald; Rich Caruana; Sholeh Forouzan; Cristina Videira Lopes

A challenge facing user generated content systems is vandalism, i.e. edits that damage content quality. The high visibility and easy access to social networks makes them popular targets for vandals. Detecting and removing vandalism is critical for these user generated content systems. Because vandalism can take many forms, there are many different kinds of features that are potentially useful for detecting it. The complex nature of vandalism, and the large number of potential features, make vandalism detection difficult and time consuming for human editors. Machine learning techniques hold promise for developing accurate, tunable, and maintainable models that can be incorporated into vandalism detection tools. We describe a method for training classifiers for vandalism detection that yields classifiers that are more accurate on the PAN 2010 corpus than others previously developed. Because of the high turnaround in social network systems, it is important for vandalism detection tools to run in real-time. To this aim, we use feature selection to find the minimal set of features consistent with high accuracy. In addition, because some features are more costly to compute than others, we use cost-sensitive feature selection to reduce the total computational cost of executing our models. In addition to the features previously used for spam detection, we introduce new features based on user action histories. The user history features contribute significantly to classifier performance. The approach we use is general and can easily be applied to other user generated content systems.

collaborative computing | 2008

CalSWIM: A Wiki–Based Data Sharing Platform

Yasser Ganjisaffar; Sara Javanmardi; Stanley B. Grant; Cristina Videira Lopes

Organizations increasingly create massive internal digital data repositories and are looking for technical advances in managing, exchanging and integrating explicit knowledge. While most of the enabling technologies for knowledge management have been used around for several years, the ability to cost effective data sharing, integration and analysis into a cohesive infrastructure evaded organizations until the advent of Web 2.0 applications. In this paper, we discuss our investigations into using a Wiki as a web–based interactive knowledge management system, which is integrated with some features for easy data access, data integration and analysis. Using the enhanced wiki, it possible to make organizational knowledge sustainable, expandable, outreaching and continually up–to–date. The wiki is currently under use as California Sustainable Watershed Information Manager. We evaluate our work according to the requirements of knowledge management systems. The result shows that our solution satisfies more requirements compared to other tools.

collaborative computing | 2009