Fazli Can
Bilkent University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fazli Can.
ACM Transactions on Information Systems | 1993
Fazli Can
Clustering of very large document databases is useful for both searching and browsing. The periodic updating of clusters is required due to the dynamic nature of databases. An algorithm for incremental clustering is introduced. The complexity and cost analysis of the algorithm together with an investigation of its expected behavior are presented. Through empirical testing it is shown that the algorithm achieves cost effectiveness and generates statistically valid clusters that are compatible with those of reclustering. The experimental evidence shows that the algorithm creates an effective and efficient retrieval environment.
ACM Transactions on Database Systems | 1990
Fazli Can; Esen A. Ozkarahan
A new algorithm for document clustering is introduced. The base concept of the algorithm, the cover coefficient (CC) concept, provides a means of estimating the number of clusters within a document database and related indexing and clustering analytically. The CC concept is used also to identify the cluster seeds and to form clusters with these seeds. It is shown that the complexity of the clustering process is very low. The retrieval experiments show that the information-retrieval effectiveness of the algorithm is compatible with a very demanding complete linkage clustering method that is known to have good retrieval performance. The experiments also show that the algorithm is 15.1 to 63.5 (with an average of 47.5) percent better than four other clustering algorithms in cluster-based information retrieval. The experiments have validated the indexing-clustering relationships and the complexity of the algorithm and have shown improvements in retrieval effectiveness. In the experiments two document databases are used: TODS214 and INSPEC. The latter is a common database with 12,684 documents.
Computers and The Humanities | 2004
Fazli Can; Jon Patton
This study investigates the writing stylechange of two Turkish authors, Çetin Altanand Yaşar Kemal, in their old and newworks using respectively their newspapercolumns and novels. The style markers are thefrequencies of word lengths in both text andvocabulary, and the rate of usage of mostfrequent words. For both authors, t-tests andlogistic regressions show that the length ofthe words in new works is significantly longerthan that of the old. The principal componentanalyses graphically illustrate the separationbetween old and new texts. The works arecorrectly categorized as old or new with 75 to100% accuracy and 92% average accuracy usingdiscriminant analysis-based cross validation. The results imply higher time gap may havepositive impact in separation andcategorization. For Altan a regressionanalysis demonstrates a decrease in averageword length as the age of his column increases. One interesting observation is that for oneword each author has similar preference changesover time.
Information Processing and Management | 2004
Fazli Can; Rabia Nuray; Ayisigi B. Sevdik
Measuring the information retrieval effectiveness of World Wide Web search engines is costly because of human relevance judgments involved. However, both for business enterprises and people it is important to know the most effective Web search engines, since such search engines help their users find higher number of relevant Web pages with less effort. Furthermore, this information can be used for several practical purposes. In this study we introduce automatic Web search engine evaluation method as an efficient and effective assessment tool of such systems. The experiments based on eight Web search engines, 25 queries, and binary user relevance judgments show that our method provides results consistent with human-based evaluations. It is shown that the observed consistencies are statistically significant. This indicates that the new method can be successfully used in the evaluation of Web search engines.
Information Systems | 2004
Fazli Can; Ismail Sengor Altingovde; Engin Demir
Our research shows that for large databases, without considerable additional storage overhead, cluster-based retrieval (CBR) can compete with the time efficiency and effectiveness of the inverted index-based full search (FS). The proposed CBR method employs a storage structure that blends the cluster membership information into the inverted file posting lists. This approach significantly reduces the cost of similarity calculations for document ranking during query processing and improves efficiency. For example, in terms of in-memory computations, our new approach can reduce query processing time to 39% of FS. The experiments confirm that the approach is scalable and system performance improves with increasing database size. In the experiments, we use the cover coefficient-based clustering methodology (C3M), and the Financial Times database of TREC containing 210 158 documents of size 564 MB defined by 229 748 terms with total of 29 545 234 inverted index elements. This study provides CBR efficiency and effectiveness experiments using the largest corpus in an environment that employs no user interaction or user behavior assumption for clustering.
Lecture Notes in Computer Science | 2006
Tayfun Kucukyilmaz; Berkant Barla Cambazoglu; Cevdet Aykanat; Fazli Can
The aim of this paper is to investigate the feasibility of predicting the gender of a text documents author using linguistic evidence. For this purpose, term- and style-based classification techniques are evaluated over a large collection of chat messages. Prediction accuracies up to 84.2% are achieved, illustrating the applicability of these techniques to gender prediction. Moreover, the reverse problem is exploited, and the effect of gender on the writing style is discussed.
Information Sciences | 1995
Fazli Can; Edward A. Fox; Cory D. Snavely
Clustering of document databases is useful for both browsing and searching purposes; however, this can be a prohibitively expensive computational process for large collections. This problem is compounded when the clustering structure must reflect a constantly changing database. Therefore, efficient algorithms which maintain an existing clustering structure are desirable. This study provides the details of a large-scale implementation of the Cover-Coefficient-based Incremental Clustering Methodology (C2ICM). The experiments performed on a sample of the MARIAN database show that its resource requirements are within practical bounds for most platforms. Furthermore, C2ICM) offers considerable savings over reclustering. The results of this study will lead to an additional type of browsing and/or searching facility on the Virginia Tech-based MARIAN large online public access library catalog (OPAC) project.
international acm sigir conference on research and development in information retrieval | 1987
Fazli Can; Esen A. Ozkarahan
Partitioning by clustering of very large databases is a necessity to reduce the space/time complexity of retrieval operations. However, the contemporary and modern retrieval environments demand dynamic maintenance of clusters. A new cluster maintenance strategy is proposed and its similarity/stability characteristics, cost analysis, and retrieval behavior in comparison with unclustered and completely reclustered database environments have been examined by means of a series of experiments.
ACM Transactions on Information Systems | 2008
Ismail Sengor Altingovde; Engin Demir; Fazli Can; Özgür Ulusoy
We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we switch from term to term, the best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest are skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvement while yielding comparable, or sometimes better, effectiveness figures. Our experiments with various collections show that the incremental-CBR strategy using a compressed cluster-skipping inverted file significantly improves CPU time efficiency, regardless of query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size.
semantics and digital media technologies | 2008
Onur Küçüktunç; Sare Gul Sevil; A. Burak Tosun; Hilal Zitouni; Pinar Duygulu; Fazli Can
In this paper, we propose an automatic photo tag expansion system for the community photo collections, such as Flickr. Our aim is to suggest relevant tags for a target photograph uploaded to the system by a user, by incorporating the visual and textual cues from other related photographs. As the first step, the system requires the user to add only a few initial tags for each uploaded photo. These initial tags are used to retrieve related photos including the same tags in their tag lists. Then the set of candidate tags collected from a large pool of photos is weighted according to the similarity of the target photo to the retrieved photo including the tag. Finally, the tags in the highest rankings are used to automatically expand the tags of the target photo. The experimental results on Flickr photos show that, the use of visual similarity of semantically relevant photos to recommend tags improves the quality of suggested tags compared to only text-based systems.