Suleyman Cetintas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Suleyman Cetintas is active.

Explore More

Publication

Featured researches published by Suleyman Cetintas.

international acm sigir conference on research and development in information retrieval | 2011

Identifying similar people in professional social networks with discriminative probabilistic models

Suleyman Cetintas; Monica Rogati; Luo Si; Yi Fang

Identifying similar professionals is an important task for many core services in professional social networks. Information about users can be obtained from heterogeneous information sources, and different sources provide different insights on user similarity. This paper proposes a discriminative probabilistic model that identifies latent content and graph classes for people with similar profile content and social graph similarity patterns, and learns a specialized similarity model for each latent class. To the best of our knowledge, this is the first work on identifying similar professionals in professional social networks, and the first work that identifies latent classes to learn a separate similarity model for each latent class. Experiments on a real-world dataset demonstrate the effectiveness of the proposed discriminative learning model.

conference on information and knowledge management | 2009

Learning from past queries for resource selection

Suleyman Cetintas; Luo Si; Hao Yuan

Federated text search provides a unified search interface for multiple search engines of distributed text information sources. Resource selection is an important component for federated text search, which selects a small number of information sources that contain the largest number of relevant documents for a user query. Most prior research of resource selection focused on selecting information sources by analyzing static information of available information sources that is sampled in the offline manner. On the other hand, most prior research ignored a large amount of valuable information like the results from past queries. This paper proposes a new resource selection technique (which is called qSim) that utilizes the search results of past queries for estimating the utilities of available information sources for a specific user query. Experiment results demonstrate the effectiveness of the new resource selection algorithm.

Journal of the Association for Information Science and Technology | 2012

Effective query generation and postprocessing strategies for prior art patent search

Suleyman Cetintas; Luo Si

Rapid increase in global competition demands increased protection of intellectual property rights and underlines the importance of patents as major intellectual property documents. Prior art patent search is the task of identifying related patents for a given patent file, and is an essential step in judging the validity of a patent application. This article proposes an automated query generation and postprocessing method for prior art patent search. The proposed approach first constructs structured queries by combining terms extracted from different fields of a query patent and then reranks the retrieved patents by utilizing the International Patent Classification (IPC) code similarities between the query patent and the retrieved patents along with the retrieval score. An extensive set of empirical results carried out on a large-scale, real-world dataset shows that utilizing 20 or 30 query terms extracted from all fields of an original query patent according to their log(tf)idf values helps form a representative search query out of the query patent and is found to be more effective than is using any number of query terms from any single field. It is shown that combining terms extracted from different fields of the query patent by giving higher importance to terms extracted from the abstract, claims, and description fields than to terms extracted from the title field is more effective than treating all extracted terms equally while forming the search query. Finally, utilizing the similarities between the IPC codes of the query patent and retrieved patents is shown to be beneficial to improve the effectiveness of the prior art search.

Information Retrieval | 2008

An effective and efficient results merging strategy for multilingual information retrieval in federated search environments

Luo Si; Jamie Callan; Suleyman Cetintas; Hao Yuan

Multilingual information retrieval is generally understood to mean the retrieval of relevant information in multiple target languages in response to a user query in a single source language. In a multilingual federated search environment, different information sources contain documents in different languages. A general search strategy in multilingual federated search environments is to translate the user query to each language of the information sources and run a monolingual search in each information source. It is then necessary to obtain a single ranked document list by merging the individual ranked lists from the information sources that are in different languages. This is known as the results merging problem for multilingual information retrieval. Previous research has shown that the simple approach of normalizing source-specific document scores is not effective. On the other side, a more effective merging method was proposed to download and translate all retrieved documents into the source language and generate the final ranked list by running a monolingual search in the search client. The latter method is more effective but is associated with a large amount of online communication and computation costs. This paper proposes an effective and efficient approach for the results merging task of multilingual ranked lists. Particularly, it downloads only a small number of documents from the individual ranked lists of each user query to calculate comparable document scores by utilizing both the query-based translation method and the document-based translation method. Then, query-specific and source-specific transformation models can be trained for individual ranked lists by using the information of these downloaded documents. These transformation models are used to estimate comparable document scores for all retrieved documents and thus the documents can be sorted into a final ranked list. This merging approach is efficient as only a subset of the retrieved documents are downloaded and translated online. Furthermore, an extensive set of experiments on the Cross-Language Evaluation Forum (CLEF) (http://www.clef-campaign.org/) data has demonstrated the effectiveness of the query-specific and source-specific results merging algorithm against other alternatives. The new research in this paper proposes different variants of the query-specific and source-specific results merging algorithm with different transformation models. This paper also provides thorough experimental results as well as detailed analysis. All of the work substantially extends the preliminary research in (Si and Callan, in: Peters (ed.) Results of the cross-language evaluation forum-CLEF 2005, 2005).

conference on information and knowledge management | 2015

Tumblr Blog Recommendation with Boosted Inductive Matrix Completion

Donghyuk Shin; Suleyman Cetintas; Kuang-Chih Lee; Inderjit S. Dhillon

Popular microblogging sites such as Tumblr have attracted hundreds of millions of users as a content sharing platform, where users can create rich content in the form of posts that are shared with other users who follow them. Due to the sheer amount of posts created on such services, an important task is to make quality recommendations of blogs for users to follow. Apart from traditional recommender system settings where the follower graph is the main data source, additional side-information of users and blogs such as user activity (e.g., like and reblog) and rich content (e.g., text and images) are also available to be exploited for enhanced recommendation performance. In this paper, we propose a novel boosted inductive matrix completion method (BIMC) for blog recommendation. BIMC is an additive low-rank model for user-blog preferences consisting of two components; one component captures the low-rank structure of follow relationships and the other captures the latent structure using side-information. Our model formulation combines the power of the recently proposed inductive matrix completion (IMC) model (for side-information) together with a standard matrix completion (MC) model (for low-rank structure). Furthermore, we utilize recently developed deep learning techniques to obtain semantically rich feature representations of text and images that are incorporated in BIMC. Experiments on a large-scale real-world dataset from Tumblr illustrate the effectiveness of the proposed BIMC method.

Information Retrieval | 2013

Forecasting user visits for online display advertising

Suleyman Cetintas; Datong Chen; Luo Si

Online display advertising is a multi-billion dollar industry where advertisers promote their products to users by having publishers display their advertisements on popular Web pages. An important problem in online advertising is how to forecast the number of user visits for a Web page during a particular period of time. Prior research addressed the problem by using traditional time-series forecasting techniques on historical data of user visits; (e.g., via a single regression model built for forecasting based on historical data for all Web pages) and did not fully explore the fact that different types of Web pages and different time stamps have different patterns of user visits. In this paper, we propose a series of probabilistic latent class models to automatically learn the underlying user visit patterns among multiple Web pages and multiple time stamps. The last (and the most effective) proposed model identifies latent groups/classes of (i) Web pages and (ii) time stamps with similar user visit patterns, and learns a specialized forecast model for each latent Web page and time stamp class. Compared with a single regression model as well as several other baselines, the proposed latent class model approach has the capability of differentiating the importance of different types of information across different classes of Web pages and time stamps, and therefore has much better modeling flexibility. An extensive set of experiments along with detailed analysis carried out on real-world data from Yahoo! demonstrates the advantage of the proposed latent class models in forecasting online user visits in online display advertising.

conference on information and knowledge management | 2013

Probabilistic latent class models for predicting student performance

Suleyman Cetintas; Luo Si; Yan Ping Xin; Ron Tzur

Predicting student performance is an important task for many core problems in intelligent tutoring systems. This paper proposes a set of novel probabilistic latent class models for the task. The most effective probabilistic model utilizes all available information about the educational content and users/students to jointly identify hidden classes of students and educational content that share similar characteristics, and to learn a specialized and fine-grained regression model for each latent educational content and student class. Experiments carried out on large-scale real-world datasets demonstrate the advantages of the proposed probabilistic latent class models.

international acm sigir conference on research and development in information retrieval | 2007

Exploration of the tradeoff between effectiveness and efficiency for results merging in federated search

Suleyman Cetintas; Luo Si

Federated search is the task of retrieving relevant documents from different information resources. One of the main research problems in federated search is to combine the results from different sources into a single ranked list. Recent work proposed a regression based method to download some documents from each ranked list of the different sources, calculated comparable scores for the documents and estimated mapping functions that transform source-specific scores into comparable scores. Experiments have shown that downloading more documents improves the accuracy of results merging. However downloading more documents increases the computation and communication costs. This paper proposes a utility based optimization method that enables the system to automatically decide on the desired number of training documents to download according to the users need for effectiveness and efficiency.

intelligent tutoring systems | 2010

Learning to identify students' relevant and irrelevant questions in a micro-blogging supported classroom

Suleyman Cetintas; Luo Si; Sugato Chakravarty; Hans Peter Aagard; Kyle Bowen

This paper proposes a novel application of text categorization for two types questions asked in a micro-blogging supported classroom, namely relevant and irrelevant questions Empirical results and analysis show that utilizing the correlation between questions and available lecture materials in a lecture along with personalization and question text leads to significantly higher categorization accuracy than i) using personalization along with question text and ii) using question text alone.

international acm sigir conference on research and development in information retrieval | 2011

Forecasting counts of user visits for online display advertising with probabilistic latent class models

Suleyman Cetintas; Datong Chen; Luo Si; Bin Shen; Zhanibek Datbayev

Display advertising is a multi-billion dollar industry where advertisers promote their products to users by having publishers display their advertisements on popular Web pages. An important problem in online advertising is how to forecast the number of user visits for a Web page during a particular period of time. Prior research addressed the problem by using traditional time-series forecasting techniques on historical data of user visits; (e.g., via a single regression model built for forecasting based on historical data for all Web pages) and did not fully explore the fact that different types of Web pages have different patterns of user visits. In this paper we propose a probabilistic latent class model to automatically learn the underlying user visit patterns among multiple Web pages. Experiments carried out on real-world data demonstrate the advantage of using latent classes in forecasting online user visits.

Explore More