Mustafa Sert
Başkent University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mustafa Sert.
advances in multimedia | 2009
Ebru Doğan; Mustafa Sert; Adnan Yazici
This paper describes the development of a generated solution for classification and segmentation of broadcast news audio. A sound stream is segmented by classifying each sub-segment into silence, pure speech, music, environmental sound, speech over music, and speech over environmental sound classes in multiple steps. Support Vector Machines and Hidden Markov Models are employed for classification and these models are trained by using different sets of MPEG-7 features. A series of tests was conducted on hand-labeled audio tracks of TRECVID broadcast news to evaluate the performance of MPEG-7 features and the selected classification methods in the proposed solution. The results obtained from our experiments clearly demonstrate that classification of mixed type audio data using Audio Spectrum Centroid, Audio Spectrum Spread, and Audio Spectrum Flatness features has considerably high accuracy rates in news domain.
ieee international conference semantic computing | 2015
Selver Ezgi Kucukbay; Mustafa Sert
Audio data contains several sounds and is an important source for multimedia applications. One of them is unstructured Environmental Sounds (also referred to as audio events) that have noise-like characteristics with flat spectrums. Therefore, in general, recognition methods applied for music and speech data are not appropriate for the Environmental Sounds. In this paper, we propose an MFCC-SVM based approach that exploits the effect of feature representation and learner optimization tasks for efficient recognition of audio events from audio signals. The proposed approach considers efficient representation of MFCC features using different window and hop sizes by changing the number of Mel coefficients in the analyses as well as optimizing the SVM parameters. Moreover, 16 different audio events from the IEEE Audio and Acoustic Signal Processing (AASP) Challenge Dataset, namely alert, clear throat, cough, door slam, drawer, keyboard, keys, knock, laughter, mouse, page turn, pen drop, phone, printer, speech, and switch that are collected from office live environments are utilized in the evaluations. Our empirical evaluations show that, when the results of the proposed methods are chosen for MFFC feature and SVM classifier, the tests conducted through using 5-fold cross validation gives the results of 62%, 58% and 55% for Precision, Recall and F-measure scores, respectively. Extensive experiments on audio-based event detection using the IEEE AASP Challenge dataset show the effectiveness of the proposed approach.
international symposium on multimedia | 2013
Çiğdem Okuyucu; Mustafa Sert; Adnan Yazici
Environmental sounds (ES) have different characteristics, such as unstructured nature and typically noise-like and flat spectrums, which make recognition task difficult compared to speech or music sounds. Here, we perform an exhaustive feature and classifier analysis for the recognition of considerably similar ES categories and propose a best representative feature to yield higher recognition accuracy. In the experiments, thirteen (13) ES categories, namely emergency alarm, car horn, gun, explosion, automobile, helicopter, water, wind, rain, applause, crowd, and laughter are detected and tested based on eleven (11) audio features (MPEG-7 family, ZCR, MFCC, and combinations) by using the HMM and SVM classifiers. Extensive experiments have been conducted to demonstrate the effectiveness of these joint features for ES classification. Our experiments show that, the joint feature set ASFCS-H (Audio Spectrum Flatness, Centroid, Spread, and Audio Harmonicity) is the best representative feature set with an average F-measure value of 80.6%.
international symposium on consumer electronics | 2006
Mustafa Sert; Buyurman Baykal; Adnan Yazici
An audio fingerprinting system deals with four challenging tasks: The robustness, the reliability, the compactness, and the scalability. By preserving the others, we explore the compactness and robustness aspects of audio fingerprinting systems and propose a description and storage model based on structural analysis of audio clips. The proposed method constructs the fingerprints from the most representative section of an audio clip. Contrary to similar studies, there is no need to construct and store all the fingerprints of each frame within the database; only one fingerprint per clip is sufficient. We make use of the audio spectrum flatness (ASF) and the audio signature (AS) features of the MPEG-7 standard, which are new to the audio feature family and have not been considered as much as other feature types. The fingerprints are stored in the form of XML, thus providing the interoperability on a world-wide scale. XML-based representation of fingerprints is very suitable particularly for portable devices such as a PDA or a mobile phone due to the transportation issues. The proposed approach is evaluated on a test bed consisting of 540 musical clips based on the MPEG-7 features. The well known MFCC feature set is also considered in the experiments for the evaluation of features
international conference on multimedia and expo | 2006
Mustafa Sert; Buyurman Baykal; Adnan Yazici
We present a novel algorithm for structural analysis of audio to detect repetitive patterns that are suitable for content-based audio information retrieval systems, since repetitive patterns can provide valuable information about the content of audio, such as a chorus or a concept. The audio spectrum flatness (ASF) feature of the MPEG-7 standard, although not having been considered as much as other feature types, has been utilized and evaluated as the underlying feature set. Expressive summaries are chosen as the longest patterns by the k-means clustering algorithm. Proposed approach is evaluated on a test bed consisting of popular song and speech clips based on the ASF feature. The well known Mel frequency cepstral coefficients (MFCCs) are also considered in the experiments for the evaluation of features. Experiments show that, all the repetitive patterns and their locations are obtained with the accuracy of 93% and 78% for music and speech, respectively
international symposium on consumer electronics | 2004
Mustafa Sert; Buyurman Baykal
In this stirdv, we have developed a iveh-hased qiier?, engine called AudinCBR to enahle conrenr-bused arid semunlic queries 011 midito? data. The interface is ahle to uncwer the Qitery-by-E.rample (QBE) and te.rtrra1 qiieries us in truditional Information Retrieval (IR) s,vsterns. Tlie relevancef~edhack. which is missing in man,v similar .sv.stem.s. is also covered in oiir system. hi QBE queries, matching process is pef:formed based on pre-.selected law-level andio .featitres. which are standardized in MPEG-7 &rts. Semantic qireries are perfirmed in thefbrm c~/te.rtiial qneries bv tising /he ohjecl and event concepts, as i d 1 u.s their temporul and conceptid relationships ,for an audio. The originality of o w appruach 10 /he retrieval r-dies on the provided iiser inrerfhce, the,/orm o/descrip/ion, and the utilized data model. The mer intet-firce is an important aspect in emerging ,fields like audio IR to retrieve desired elements. Thewfore, we de.scrihe snme new graphical iiser interfaces /hat accommodate d(fferen/ modes of intermtion witli the iiser. I n nrder to e-rtrnct the andio .semantics and low-level .fiu/nre.s ,ftonr uti aiidio. an unllofutiorl tool i s int!-odlrced. Finalll’. %re gi1.e e.runl/,le.~ I?/ seninntic qnerie.s that oiir .s,v.stem .srippnrt.v’. Index Terms Audio annotation tool, audio data model, MPEC-7, query interface for audio-IR.
flexible query answering systems | 2016
Hilal Ergun; Mustafa Sert
Recent burst of multimedia content available on Internet is pushing expectations on multimedia retrieval systems to even higher grounds. Multimedia retrieval systems should offer better performance both in terms of speed and memory consumption while maintaining good accuracy compared to state-of-the-art implementations. In this paper, we discuss alternative implementations of visual object retrieval systems based on popular bag of words model and show optimal selection of processing steps. We demonstrate our offering using both keyword and example-based retrieval queries on three frequently used benchmark databases, namely Oxford, Paris and Pascal VOC 2007. Additionally, we investigate effect of different distance comparison metrics on retrieval accuracy. Results show that, relatively simple but efficient vector quantization can compete with more sophisticated feature encoding schemes together with the adapted inverted index structure.
International Journal of Semantic Computing | 2016
Hilal Ergun; Yusuf Caglar Akyuz; Mustafa Sert; Jianquan Liu
Visual concept recognition is an active research field in the last decade. Related to this attention, deep learning architectures are showing great promise in various computer vision domains including image classification, object detection, event detection and action recognition in videos. In this study, we investigate various aspects of convolutional neural networks for visual concept recognition. We analyze recent studies and different network architectures both in terms of running time and accuracy. In our proposed visual concept recognition system, we first discuss various important properties of popular convolutional network architecture under consideration. Then we describe our method for feature extraction at different levels of abstraction. We present extensive empirical information along with best practices for big data practitioners. Using these best practices we propose efficient fusion mechanisms both for single and multiple network models. We present state-of-the-art results on benchmark datasets while keeping computational costs at low level. Our results show that these state-of-the-art results can be reached without using extensive data augmentation techniques.
Multimedia Tools and Applications | 2018
Adnan Yazici; Murat Koyuncu; Turgay Yilmaz; Saeid Sattari; Mustafa Sert; Elvan Gulen
This paper introduces an intelligent multimedia information system, which exploits machine learning and database technologies. The system extracts semantic contents of videos automatically by using the visual, auditory and textual modalities, then, stores the extracted contents in an appropriate format to retrieve them efficiently in subsequent requests for information. The semantic contents are extracted from these three modalities of data separately. Afterwards, the outputs from these modalities are fused to increase the accuracy of the object extraction process. The semantic contents that are extracted using the information fusion are stored in an intelligent and fuzzy object-oriented database system. In order to answer user queries efficiently, a multidimensional indexing mechanism that combines the extracted high-level semantic information with the low-level video features is developed. The proposed multimedia information system is implemented as a prototype and its performance is evaluated using news video datasets for answering content and concept-based queries considering all these modalities and their fused data. The performance results show that the developed multimedia information system is robust and scalable for large scale multimedia applications.
ieee international conference on multimedia big data | 2016
Hilal Ergun; Mustafa Sert
Deep learning architectures are showing great promise in various computer vision domains including image classification, object detection, event detection and action recognition. In this study, we investigate various aspects of convolutional neural networks (CNNs) from the big data perspective. We analyze recent studies and different network architectures both in terms of running time and accuracy. We present extensive empirical information along with best practices for big data practitioners. Using these best practices we propose efficient fusion mechanisms both for single and multiple network models. We present state-of-the art results on benchmark datasets while keeping computational costs at a lower level. Another contribution of our paper is that these state-of-the-art results can be reached without using extensive data augmentation techniques.