Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Seyda Ertekin is active.

Publication


Featured researches published by Seyda Ertekin.


conference on information and knowledge management | 2007

Learning on the border: active learning in imbalanced data classification

Seyda Ertekin; Jian Huang; Léon Bottou; C. Lee Giles

This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. The problem occurs when there are significantly less number of observations of the target concept. Various real-world classification tasks, such as medical diagnosis, text categorization and fraud detection suffer from this phenomenon. The standard machine learning algorithms yield better prediction performance with balanced datasets. In this paper, we demonstrate that active learning is capable of solving the class imbalance problem by providing the learner more balanced classes. We also propose an efficient way of selecting informative instances from a smaller pool of samples for active learning which does not necessitate a search through the entire dataset. The proposed method yields an efficient querying system and allows active learning to be applied to very large datasets. Our experimental results show that with an early stopping criteria, active learning achieves a fast solution with competitive prediction performance in imbalanced data classification.


european conference on principles of data mining and knowledge discovery | 2006

Efficient name disambiguation for large-scale databases

Jian Huang; Seyda Ertekin; C. Lee Giles

Name disambiguation can occur when one is seeking a list of publications of an author who has used different name variations and when there are multiple other authors with the same name. We present an efficient integrative framework for solving the name disambiguation problem: a blocking method retrieves candidate classes of authors with similar names and a clustering method, DBSCAN, clusters papers by author. The distance metric between papers used in DBSCAN is calculated by an online active selection support vector machine algorithm (LASVM), yielding a simpler model, lower test errors and faster prediction time than a standard SVM. We prove that by recasting transitivity as density reachability in DBSCAN, transitivity is guaranteed for core points. For evaluation, we manually annotated 3,355 papers yielding 490 authors and achieved 90.6% pairwise-F1. For scalability, authors in the entire CiteSeer dataset, over 700,000 papers, were readily disambiguated.


international acm sigir conference on research and development in information retrieval | 2007

Active learning for class imbalance problem

Seyda Ertekin; Jian Huang; C. Lee Giles

The class imbalance problem has been known to hinder the learning performance of classification algorithms. Various real-world classification tasks such as text categorization suffer from this phenomenon. We demonstrate that active learning is capable of solving the problem.


acm/ieee joint conference on digital libraries | 2009

Finding topic trends in digital libraries

Levent Bolelli; Seyda Ertekin; Ding Zhou; C. Lee Giles

We propose a generative model based on latent Dirichlet allocation for mining distinct topics in document collections by integrating the temporal ordering of documents into the generative process. The document collection is divided into time segments where the discovered topics in each segment is propagated to influence the topic discovery in the subsequent time segments. We conduct experiments on the collection of academic papers from CiteSeer repository. We augment the text corpus with the addition of user queries and tags and integrate the citation graph to boost the weight of the topical terms. The experiment results show that segmented topic model can effectively detect distinct topics and their evolution over time.


european conference on principles of data mining and knowledge discovery | 2006

Clustering scientific literature using sparse citation graph analysis

Levent Bolelli; Seyda Ertekin; C. Lee Giles

It is well known that connectivity analysis of linked documents provides significant information about the structure of the document space for unsupervised learning tasks. However, the ability to identify distinct clusters of documents based on link graph analysis is proportional to the density of the graph and depends on the availability of the linking and/or linked documents in the collection. In this paper, we present an information theoretic approach towards measuring the significance of individual words based on the underlying link structure of the document collection. This enables us to generate a non-uniform weight distribution of the feature space which is used to augment the original corpus-based document similarities. The experimental results on the collection of scientific literature show that our method achieves better separation of distinct groups of documents, yielding improved clustering solutions.


international symposium on computer and information sciences | 2013

Adaptive Oversampling for Imbalanced Data Classification

Seyda Ertekin

Data imbalance is known to significantly hinder the generalization performance of supervised learning algorithms. A common strategy to overcome this challenge is synthetic oversampling, where synthetic minority class examples are generated to balance the distribution between the examples of the majority and minority classes. We present a novel adaptive oversampling algorithm, Virtual, that combines the benefits of oversampling and active learning. Unlike traditional resampling methods which require preprocessing of the data, Virtual generates synthetic examples for the minority class during the training process, therefore it removes the need for an extra preprocessing stage. In the context of learning with Support Vector Machines, we demonstrate that Virtual outperforms competitive oversampling techniques both in terms of generalization performance and computational complexity.


Journal of Machine Learning Research | 2005

Fast Kernel Classifiers with Online and Active Learning

Antoine Bordes; Seyda Ertekin; Jason Weston; Léon Bottou


Archive | 2000

The Shape of the Web and Its Implications for Searching the Web

Kemal Efe; Vijay V. Raghavan; Chee-Hung Henry Chu; Adrienne L. Broadwater; Levent Bolelli; Seyda Ertekin


siam international conference on data mining | 2007

Efficient Multiclass Boosting Classification with Active Learning.

Jian Huang; Seyda Ertekin; Yang Song; Hongyuan Zha; C. Lee Giles


arXiv: Social and Information Networks | 2012

LEARNING TO PREDICT THE WISDOM OF CROWDS

Seyda Ertekin; Haym Hirsh; Cynthia Rudin

Collaboration


Dive into the Seyda Ertekin's collaboration.

Top Co-Authors

Avatar

C. Lee Giles

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Levent Bolelli

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jian Huang

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Ding Zhou

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chee-Hung Henry Chu

University of Louisiana at Lafayette

View shared research outputs
Top Co-Authors

Avatar

Hongyuan Zha

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge