Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Maria Soledad Pera is active.

Publication


Featured researches published by Maria Soledad Pera.


conference on information and knowledge management | 2007

Using word similarity to eradicate junk emails

Maria Soledad Pera; Yiu-Kai Ng

Emails are one of the most commonly used modern communication media these days; however, unsolicited emails obstruct this otherwise fast and convenient technology for information exchange and jeopardize the continuity of this popular communication tool. Waste of valuable resources and time and exposure to offensive content are only a few of the problems that arise as a result of junk emails. In addition, the monetary cost of processing junk emails reaches billions of dollars per year and is absorbed by public users and Internet service providers. Even though there has been extensive work in the past dedicated to eradicate junk emails, none of the existing junk email detection approaches has been highly successful in solving these problems, since spammers have been able to infiltrate existing detection techniques. In this paper, we present a new tool, JunEX, which relies on the content similarity of emails to eradicate junk emails. JunEX compares each incoming email to a core of emails marked as junk by each individual user to identify unwanted emails while reducing the number of legitimate emails treated as junk, which is critical. Conducted experiments on JunEX verify its high accuracy.


intelligent information systems | 2014

Exploiting the wisdom of social connections to make personalized recommendations on scholarly articles

Maria Soledad Pera; Yiu-Kai Ng

Existing scholarly publication recommenders were designed to aid researchers, as well as ordinary users, in discovering pertinent literature in diverse academic fields. These recommenders, however, often (i) depend on the availability of users’ historical data in the form of ratings or access patterns, (ii) generate recommendations pertaining to users’ (articles included in their) profiles, as oppose to their current research interests, or (iii) fail to analyze valuable user-generated data at social sites that can enhance their performance. To address these design issues, we propose PReSA, a personalized recommender on scholarly articles. PReSA recommends articles bookmarked by the connections of a user U on a social bookmarking site that are not only similar in content to a target publication P currently of interest to U but are also popular among U’s connections. PReSA (i) relies on the content-similarity measure to identify potential academic publications to be recommended and (ii) uses only information readily available on popular social bookmarking sites to make recommendations. Empirical studies conducted using data from CiteULike have verified the efficiency and effectiveness of (the recommendation and ranking strategies of) PReSA, which outperforms a number of existing (scholarly publication) recommenders.


Web Intelligence and Agent Systems: An International Journal | 2011

SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents

Maria Soledad Pera; Yiu-Kai Ng

Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by their legal owners. Unfortunately, plagiarism is getting worse due to the increasing number of on-line publications and easy access on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates a graphical view of sentences that are similar (or the same) in D1 and D2. Experimental results verify that SimPaD is highly accurate in detecting (non-)plagiarized documents and outperforms existing plagiarism-detection approaches.


web intelligence | 2008

Nowhere to Hide: Finding Plagiarized Documents Based on Sentence Similarity

Nathaniel Gustafson; Maria Soledad Pera; Yiu-Kai Ng

Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by authors (owners) of the original copies. Unfortunately, plagiarism is getting worse due to the increasing number of on-line publications on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates agraphical view of sentences that are similar (or the same) in D1 and D2. Experimental results verify that SimPaD is highly accurate in detecting (non-) plagiarized documents and outperforms existing plagiarism-detection approaches.


conference on recommender systems | 2013

What to read next?: making personalized book recommendations for K-12 users

Maria Soledad Pera; Yiu-Kai Ng

Finding books that children/teenagers are interested in these days is a non-trivial task due to the diversity of topics covered in huge volumes of books with varied readability levels. Even though K-12 readers can turn to book recommenders to look for books, the recommended books may not satisfy their personal needs, since they could be beyond/below their readability levels or fail to match their topics of interest. To address these problems, we introduce BReK12, a book recommender that makes personalized suggestions tailored to each K-12 user U based on books available on a social book-marking site that (i) are similar in content to the ones that are known to be of interest to U, (ii) have been bookmarked by users with reading patterns similar to Us, and (iii) can be comprehended by U. BReK12 is an asset to its users, since it suggests books that are appealing to its users and at grade levels that they can cope with, which can increase their reading selection choices and motivate them to read. We have also developed ReLAT, the readability analysis tool employed by BReK12 to determine the grade level of books. ReLAT is novel, compared with existing readability formulas, since it can predict the grade level of a book even if an excerpt of the book is not available. We have conducted empirical studies which have verified the accuracy of ReLAT in predicting the grade level of a book and the effectiveness of BReK12 over existing baseline recommendation systems.


International Journal on Artificial Intelligence Tools | 2010

A NAÏVE BAYES CLASSIFIER FOR WEB DOCUMENT SUMMARIES CREATED BY USING WORD SIMILARITY AND SIGNIFICANT FACTORS

Maria Soledad Pera; Yiu-Kai Ng

Text classification categorizes web documents in large collections into predefined classes based on their contents. Unfortunately, the classification process can be time-consuming and users are still required to spend considerable amount of time scanning through the classified web documents to identify the ones with contents that satisfy their information needs. In solving this problem, we first introduce CorSum, an extractive single-document summarization approach, which is simple and effective in performing the summarization task, since it only relies on word similarity to generate high-quality summaries. We further enhance CorSum by considering the significance factor of sentences in documents, in addition to using word-correlation factors, for document summarization. We denote the enhanced approach CorSum-SF and use the summaries generated by CorSum-SF to train a Multinomial Naive Bayes classifier for categorizing web document summaries into predefined classes. Experimental results on the DUC-2002 and 20 Newsgroups datasets show that CorSum-SF outperforms other extractive summarization methods, and classification time (accuracy, respectively) is significantly reduced (compatible, respectively) using CorSum-SF generated summaries compared with using the entire documents. More importantly, browsing summaries, instead of entire documents, which are assigned to predefined categories, facilitates the information search process on the Web.


acm conference on hypertext | 2015

Analyzing Book-Related Features to Recommend Books for Emergent Readers

Maria Soledad Pera; Yiu-Kai Ng

We recognize that emergent literacy forms a foundation upon which children will gage their future reading. It is imperative to motivate young readers to read by offering them appealing books to read so that they can enjoy reading and gradually establish a reading habit during their formative years that can aid in promoting their good reading habits. However, with the huge volume of existing and newly-published books, it is a challenge for parents/educators (young readers, respectively) to find the right ones that match childrens interests and their read-ability levels. In response to the needs, we have developed K3Rec, a recommender which applies a multi-dimensional approach to suggest books that simultaneously match the interests/preferences and reading abilities of emergent (i.e., K-3) readers. K3Rec considers the grade levels, contents, illustrations, and topics, besides using special properties, such as length and writing style, to distinguish K-3 books from other books targeting more mature readers. K3Rec is novel, since it adopts an unsupervised strategy to suggest books for K-3 readers which does not rely on the existence of personal social media data, such as personal tags and ratings, that are seldom, if ever, created by emergent readers. Further-more, unlike existing book recommenders, K3Rec explicitly analyzes book illustrations, which is of special significance for emergent readers, since illustrations assist these readers in understanding the contents of books. K3Rec focuses on a niche group of readers that has not been explicitly targeted by existing book recommenders. Empirical studies conducted using data from BiblioNasium.com and Amazons Mechanical Turk have verified the effectiveness of K3Rec in making book recommendations for emergent readers.


international acm sigir conference on research and development in information retrieval | 2012

BReK12: a book recommender for K-12 users

Maria Soledad Pera; Yiu-Kai Ng

Ideally, students in K-12 grade levels can turn to book recommenders to locate books that match their interests. Existing book recommenders, however, fail to take into account the readability levels of their users, and hence their recommendations may be unsuitable for the users. To address this issue, we introduce BReK12, a recommender that targets K-12 users and prioritizes the reading level of its users in suggesting books of interest. Empirical studies conducted using the Bookcrossing dataset show that BReK12 outperforms a number of existing recommenders (developed for general users) in identifying books appealing to K-12 users.


international conference on advanced learning technologies | 2014

SOLE-R: A Semantic and Linguistic Approach for Book Recommendations

Angel Luis Garrido; Maria Soledad Pera; Sergio Ilarri

Reading is a fundamental skill that each person needs to develop during early childhood and continue to enhance into adulthood. While children/teenagers depend on this skill to advance academically and become educated individuals, adults are expected to acquire a certain level of proficiency in reading so that they can engage in social/civic activities and successfully participate in the workforce. A step towards assisting individuals to become lifelong readers is to provide them adequate reading selections which can cultivate their intellectual and emotional growth. With that in mind, we have developed SOLE-R, a topic map-based tool that yields book recommendations. SOLE-R takes advantage of lexical and semantic resources to infer the likes/dislikes of a reader and thus is not restricted by the syntactic constraints imposed on existing recommenders. Furthermore, SOLE-R relies on publicly-accessible data on books to perform an in-depth analysis of the preferences of a reader that goes beyond book content or reading patterns explored by existing recommenders. We have verified the correctness of SOLE-R using a popular benchmark dataset. In addition, we have compared its performance with (state-of-the-art) recommendation strategies to further demonstrate the effectiveness of SOLE-R.


Information Systems | 2013

Web-based closed-domain data extraction on online advertisements

Maria Soledad Pera; Rani Qumsiyeh; Yiu-Kai Ng

Taking advantage of the popularity of the web, online marketplaces such as Ebay (.com), advertisements (ads for short) websites such as Craigslist(.org), and commercial websites such as Carmax(.com) (allow users to) post ads on a variety of products and services. Instead of browsing through numerous websites to locate ads of interest, web users would benefit from the existence of a single, fully integrated database (DB) with ads in multiple domains, such as Cars-for-Sale and Job-Postings, populated from various online sources so that ads of interest could be retrieved at a centralized site. Since existing ads websites impose their own structures and formats for storing and accessing ads, generating a uniform, integrated ads repository is not a trivial task. The challenges include (i) identifying ads domains, (ii) dealing with the diversity in structures of ads in various ads domains, and (iii) analyzing data with different meanings in each ads domain. To handle these problems, we introduce ADEx, a tool that relies on various machine learning approaches to automate the process of extracting (un-/semi-/fully- structured) data from online ads to create ads records archived in an underlying DB through domain classification, keyword tagging, and identification of valid attribute values. Experimental results generated using a dataset of 18,000 online ads originated from Craigslist, Ebay, and KSL(.com) show that ADEx is superior in performance compared with existing text classification, keyword labeling, and data extraction approaches. Further evaluations verify that ADEx either outperforms or performs at least as good as current state-of-the-art information extractors in mapping data from unstructured or (semi-)structured sources into DB records.

Collaboration


Dive into the Maria Soledad Pera's collaboration.

Top Co-Authors

Avatar

Yiu-Kai Ng

Brigham Young University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rani Qumsiyeh

Brigham Young University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mucun Tian

Boise State University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge