Tunga Güngör
Boğaziçi University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tunga Güngör.
international symposium on computer and information sciences | 2005
Arzucan Özgür; Levent Özgür; Tunga Güngör
In this paper, we examine the use of keywords in text categorization with SVM. In contrast to the usual belief, we reveal that using keywords instead of all words yields better performance both in terms of accuracy and time. Unlike the previous studies that focus on keyword selection metrics, we compare the two approaches for keyword selection. In corpus-based approach, a single set of keywords is selected for all classes. In class-based approach, a distinct set of keywords is selected for each class. We perform the experiments with the standard Reuters-21578 dataset, with both boolean and tf-idf weighting. Our results show that although tf-idf weighting performs better, boolean weighting can be used where time and space resources are limited. Corpus-based approach with 2000 keywords performs the best. However, for small number of keywords, class-based approach outperforms the corpus-based approach with the same number of keywords.
Archive | 2005
Pinar Yolum; Tunga Güngör; Fikret S. Gürgen; Can C. Özturan
Invited Speakers.- Keeping Viruses Under Control.- Online Auctions: Notes on Theory, Practice, and the Role of Agents.- Computer Networks.- A Unified Approach to Survivability of Connection-Oriented Networks.- SCTP Based Framework for Mobile Web Agent.- An Agent-Based Scheme for Efficient Multicast Application in Mobile Networks.- An Enhanced One Way Function Tree Rekey Protocol Based on Chinese Remainder Theorem.- Admission Control for Multicast Routing with Quality of Service in Ad Hoc Networks.- An Efficient On-line Job Admission Control Scheme to Guarantee Deadlines for QoS-Demanding Applications.- A Methodology of Resilient MPLS/VPN Path Management Under Multiple Link Failures.- Sensor and Satellite Networks.- Comparison of Hyper-DAG Based Task Mapping and Scheduling Heuristics for Wireless Sensor Networks.- A Markov-Based Model to Analyze the Temporal Evolution and Lifetime of a Sensor Network.- Power-Efficient Seamless Publishing and Subscribing in Wireless Sensor Networks.- Group-Oriented Channel Protection for Mobile Devices in Digital Multimedia Broadcasting.- IP Traffic Load Distribution in NGEO Broadband Satellite Networks - (Invited Paper).- Cross-Layer Management of Radio Resources in an Interactive DVB-RCS-Based Satellite Network-(Invited Paper).- Aggressive Back off Strategy in Congestion Management Algorithm for DBS-RCS - (Invited Paper).- TCP-Peach++: Enhancement of TCP-Peach+ for Satellite IP Networks with Asymmetrical Bandwidth and Persistent Fades-(Invited Paper).- Security and Cryptography.- Automatic Translation of Serial to Distributed Code Using CORBA Event Channels.- Fault Tolerant and Robust Mutual Exclusion Protocol for Synchronous Distributed Systems.- Exact Best-Case End-to-End Response Time Analysis for Hard Real-Time Distributed Systems.- A Formal Policy Specification Language for an 802.11 WLAN with Enhanced Security Network.- A Generic Policy-Conflict Handling Model.- A Truly Random Number Generator Based on a Continuous-Time Chaotic Oscillator for Applications in Cryptography.- A New Cryptanalytic Time-Memory Trade-Off for Stream Ciphers.- SVM Approach with a Genetic Algorithm for Network Intrusion Detection.- Performance Evaluation.- Modeling Access Control Lists with Discrete-Time Quasi Birth-Death Processes.- Stochastic Bounds on Partial Ordering: Application to Memory Overflows Due to Bursty Arrivals.- QoS Evaluation Method in Multimedia Applications Using a Fuzzy Genetic Rule-Based System.- Impact of Setup Message Processing and Optical Switch Configuration Times on the Performance of IP over Optical Burst Switching Networks.- Characterizing Gnutella Network Properties for Peer-to-Peer Network Simulation.- Computing Communities in Large Networks Using Random Walks.- Fame as an Effect of the Memory Size.- Keeping Viruses Under Control.- Distributed Evaluation Using Multi-agents.- Classification of Volatile Organic Compounds with Incremental SVMs and RBF Networks.- E-Commerce and Web Services.- Agent Based Dynamic Execution of BPEL Documents.- A Fair Multimedia Exchange Protocol.- A Pervasive Environment for Location-Aware and Semantic Matching Based Information Gathering.- A Web Service Platform for Web-Accessible Archaeological Databases.- A WSDL Extension for Performance-Enabled Description of Web Services.- A Novel Authorization Mechanism for Service-Oriented Virtual Organization.- Metrics, Methodology, and Tool for Performance-Considered Web Service Composition.- Brazilian Software Process Reference Model and Assessment Method.- Multiagent Systems.- A Secure Communication Framework for Mobile Agents.- A Novel Algorithm for the Coordination of Multiple Mobile Robots.- Multiagent Elite Search Strategy for Combinatorial Optimization Problems.- Managing Theories of Trust in Agent Based Systems.- Applying Semantic Capability Matching into Directory Service Structures of Multi Agent Systems.- Self-organizing Distribution of Agents over Hosts.- Machine Learning.- Evolutionary Design of Group Communication Schedules for Interconnection Networks.- Memetic Algorithms for Nurse Rostering.- Discretizing Continuous Attributes Using Information Theory.- System Identification Using Genetic Programming and Gene Expression Programming.- ARKAQ-Learning: Autonomous State Space Segmentation and Policy Generation.- Signature Verification Using Conic Section Function Neural Network.- Fusion of Rule-Based and Sample-Based Classifiers - Probabilistic Approach.- Construction of a Learning Automaton for Cycle Detection in Noisy Data Sequences.- Information Retrieval and Natural Language Processing.- A New Trend Heuristic Time-Variant Fuzzy Time Series Method for Forecasting Enrollments.- Using GARCH-GRNN Model to Forecast Financial Time Series.- Boosting Classifiers for Music Genre Classification.- Discriminating Biased Web Manipulations in Terms of Link Oriented Measures.- ORF-NT: An Object-Based Image Retrieval Framework Using Neighborhood Trees.- Text Categorization with Class-Based and Corpus-Based Keyword Selection.- Aligning Turkish and English Parallel Texts for Statistical Machine Translation.- The Effect of Windowing in Word Sense Disambiguation.- Pronunciation Disambiguation in Turkish.- Image and Speech Processing.- Acoustic Flow and Its Applications.- A DCOM-Based Turkish Speech Recognition System: TREN - Turkish Recognition ENgine.- Speaker Recognition in Unknown Mismatched Conditions Using Augmented PCA.- Real Time Isolated Turkish Sign Language Recognition from Video Using Hidden Markov Models with Global Features.- An Animation System for Fracturing of Rigid Objects.- 2D Shape Tracking Using Algebraic Curve Spaces.- A Multi-camera Vision System for Real-Time Tracking of Parcels Moving on a Conveyor Belt.- Selection and Extraction of Patch Descriptors for 3D Face Recognition.- Implementation of a Video Streaming System Using Scalable Extension of H.264.- Blotch Detection and Removal for Archive Video Restoration.- Performance Study of an Image Restoration Algorithm for Bursty Mobile Satellite Channels.- Algorithms and Database Systems.- Polymorphic Compression.- Efficient Adaptive Data Compression Using Fano Binary Search Trees.- Word-Based Fixed and Flexible List Compression.- Effective Early Termination Techniques for Text Similarity Join Operator.- Multimodal Video Database Modeling, Querying and Browsing.- Semantic Load Shedding for Prioritized Continuous Queries over Data Streams.- Probabilistic Point Queries over Network-Based Movements.- Effective Clustering by Iterative Approach.- Recursive Lists of Clusters: A Dynamic Data Structure for Range Queries in Metric Spaces.- Incremental Clustering Using a Core-Based Approach.- Indexing of Sequences of Sets for Efficient Exact and Similar Subsequence Matching.- An Investigation of the Course-Section Assignment Problem.- Crympix: Cryptographic Multiprecision Library.- Optimal Control for Real-Time Feedback Rate-Monotonic Schedulers.- Graphical User Interface Development on the Basis of Data Flows Specification.- Theory of Computing.- Generalizing Redundancy Elimination in Checking Sequences.- A Computable Version of Dinis Theorem for Topological Spaces.- Improved Simulation of Quantum Random Walks.- An Alternative Proof That Exact Inference Problem in Bayesian Belief Networks Is NP-Hard.- Recovering the Lattice of Repetitive Sub-functions.- Epilogue.- Erol Gelenbes Career and Contributions.
international conference natural language processing | 2008
Hasim Sak; Tunga Güngör; Murat Saraclar
In this paper, we propose a set of language resources for building Turkish language processing applications. Specifically, we present a finite-state implementation of a morphological parser, an averaged perceptron-based morphological disambiguator, and compilation of a web corpus. Turkish is an agglutinative language with a highly productive inflectional and derivational morphology. We present an implementation of a morphological parser based on two-level morphology. This parser is one of the most complete parsers for Turkish and it runs independent of any other external system such as PC-KIMMO in contrast to existing parsers. Due to complex phonology and morphology of Turkish, parsing introduces some ambiguous parses. We developed a morphological disambiguator with accuracy of about 98% using averaged perceptron algorithm. We also present our efforts to build a Turkish web corpus of about 423 million words.
Computer Speech & Language | 2009
Hasan Mesut Meral; Bülent Sankur; A. Sumru Özsoy; Tunga Güngör; Emre Sevinç
We develop a morphosyntax-based natural language watermarking scheme. In this scheme, a text is first transformed into a syntactic tree diagram where the hierarchies and the functional dependencies are made explicit. The watermarking software then operates on the sentences in syntax tree format and executes binary changes under control of Wordnet and Dictionary to avoid semantic drops. A certain level of security is provided via key-controlled randomization of morphosyntactic tools and the insertion of void watermark. The security aspects and payload aspects are evaluated statistically while the imperceptibility is measured using edit-hit counts based on human judgments. It is observed that agglutinative languages are somewhat more amenable to morphosyntax-based natural language watermarking and the free word order property of a language, like Turkish, is an extra bonus.
international conference on computational linguistics | 2009
Hasim Sak; Tunga Güngör; Murat Saraclar
This paper describes the application of the perceptron algorithm to the morphological disambiguation of Turkish text. Turkish has a productive derivational morphology. Due to the ambiguity caused by complex morphology, a word may have multiple morphological parses, each with a different stem or sequence of morphemes. The methodology employed is based on ranking with perceptron algorithm which has been successful in some NLP tasks in English. We use a baseline statistical trigram-based model of a previous work to enumerate an n-best list of candidate morphological parse sequences for each sentence. We then apply the perceptron algorithm to rerank the n-best list using a set of 23 features. The perceptron trained to do morphological disambiguation improves the accuracy of the baseline model from 93.61% to 96.80%. When we train the perceptron as a POS tagger, the accuracy is 98.27%. Turkish morphological disambiguation and POS tagging results that we obtained is the best reported so far.
Pattern Recognition Letters | 2004
Levent Özgür; Tunga Güngör; Fikret S. Gürgen
We propose anti-spare filtering methods for agglutinative languages in general and for Turkish in particular. The methods are dynamic and are based on Artificial Neural Networks (ANN) and Bayesian Networks. The developed algorithms are user-specific and adapt themselves with the characteristics of the incoming e-mails. The algorithms have two main components. The first one deals with the morphology of the words and the second one classifies the e-mails by using the roots of the words extracted by the morphological analysis. Two ANN structures, single layer perceptron and multi-layer perceptron, are considered and the inputs to the networks are determined using binary model and probabilistic model. Similarly, for Bayesian classification, three different approaches are employed: binary model, probabilistic model, and advanced probabilistic model. In the experiments, a total of 750 e-mails (410 spare and 340 normal) were used and a success rate of about 90% was achieved.
conference on security steganography and watermarking of multimedia contents | 2007
Hasan Mesut Meral; Emre Sevinç; Ersin Ünkar; Bülent Sankur; A. Sumru Özsoy; Tunga Güngör
This paper explores the morphosyntactic tools for text watermarking and develops a syntax-based natural language watermarking scheme. Turkish, an agglutinative language, provides a good ground for the syntax-based natural language watermarking with its relatively free word order possibilities and rich repertoire of morphosyntactic structures. The unmarked text is first transformed into a syntactic tree diagram in which the syntactic hierarchies and the functional dependencies are coded. The watermarking software then operates on the sentences in syntax tree format and executes binary changes under control of Wordnet to avoid semantic drops. The key-controlled randomization of morphosyntactic tool order and the insertion of void watermark provide a certain level of security. The embedding capacity is calculated statistically, and the imperceptibility is measured using edit hit counts.
language resources and evaluation | 2011
Hasim Sak; Tunga Güngör; Murat Saraclar
We present a set of language resources and tools—a morphological parser, a morphological disambiguator, and a text corpus—for exploiting Turkish morphology in natural language processing applications. The morphological parser is a state-of-the-art finite-state transducer-based implementation of Turkish morphology. The disambiguator is based on the averaged perceptron algorithm and has the best accuracy reported for Turkish in the literature. The text corpus has been compiled from the web and contains about 500 million tokens. This is the largest Turkish web corpus published.
Expert Systems With Applications | 2013
Şerafettin Taşcı; Tunga Güngör
Text categorization is the task of automatically assigning unlabeled text documents to some predefined category labels by means of an induction algorithm. Since the data in text categorization are high-dimensional, often feature selection is used for reducing the dimensionality. In this paper, we make an evaluation and comparison of the feature selection policies used in text categorization by employing some of the popular feature selection metrics. For the experiments, we use datasets which vary in size, complexity, and skewness. We use support vector machine as the classifier and tf-idf weighting for weighting the terms. In addition to the evaluation of the policies, we propose new feature selection metrics which show high success rates especially with low number of keywords. These metrics are two-sided local metrics and are based on the difference of the distributions of a term in the documents belonging to a class and in the documents not belonging to that class. Moreover, we propose a keyword selection framework called adaptive keyword selection. It is based on selecting different number of terms for each class and it shows significant improvement on skewed datasets that have a limited number of training instances for some of the classes.
Pattern Recognition Letters | 2008
Ali Çıltık; Tunga Güngör
In this paper, we propose spam e-mail filtering methods having high accuracies and low time complexities. The methods are based on the n-gram approach and a heuristics which is referred to as the first n-words heuristics. We develop two models, a class general model and an e-mail specific model, and test the methods under these models. The models are then combined in such a way that the latter one is activated for the cases the first model falls short. Though the approach proposed and the methods developed are general and can be applied to any language, we mainly apply them to Turkish, which is an agglutinative language, and examine some properties of the language. Extensive tests were performed and success rates about 98% for Turkish and 99% for English were obtained. It has been shown that the time complexities can be reduced significantly without sacrificing performance.