Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Margarita Kotti is active.

Publication


Featured researches published by Margarita Kotti.


Signal Processing | 2008

Review: Speaker segmentation and clustering

Margarita Kotti; Vassiliki Moschou; Constantine Kotropoulos

This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering.


international conference on acoustics, speech, and signal processing | 2006

Musical Instrument Classification using Non-Negative Matrix Factorization Algorithms and Subset Feature Selection

Emmanouil Benetos; Margarita Kotti; Constantine Kotropoulos

In this paper, a class, of algorithms for automatic classification of individual musical instrument sounds is presented. Several perceptual features used in sound classification applications as well as MPEG-7 descriptors were measured for 300 sound recordings consisting of 6 different musical instrument classes. Subsets of the feature set are selected using branch-and-bound search, obtaining the most suitable features for classification, A class of classifiers is developed based on the non-negative matrix factorization (NMF). The standard NMF method is examined as well as its modifications: the local, the sparse, and the discriminant NMF. The experimental results compare feature subsets of varying sizes alongside the various NMF algorithms. It has been found that a subset containing the mean and die variance of the first mel-frequency cepstral coefficient and the audiospectrumflatness descriptor along with the means of the audiospectrumenvelope and the audiospectrumspread descriptors when is fed to a standard NMF classifier yields an accuracy exceeding 95%


international conference on pattern recognition | 2008

Gender classification in two Emotional Speech databases

Margarita Kotti; Constantine Kotropoulos

Gender classification is a challenging problem, which finds applications in speaker indexing, speaker recognition, speaker diarization, annotation and retrieval of multimedia databases, voice synthesis, smart human-computer interaction, biometrics, social robots etc. Although it has been studied for more than thirty years, by no means it is a solved problem. Processing emotional speech in order to identify speakerpsilas gender makes the problem even more interesting. A large pool of 1379 features is created including 605 novel features. A branch and bound feature selection algorithm is applied to select a subset of 15 features among the 1379 originally extracted. Support vector machines with various kernels are tested as gender classifiers, when applied to two databases, namely: the Berlin database of Emotional Speech and the Danish Emotional Speech database. The reported classification results out perform those obtained by state-of-the-art techniques, since a perfect classification accuracy is obtained.


IEEE Transactions on Audio, Speech, and Language Processing | 2009

Robust Detection of Phone Boundaries Using Model Selection Criteria With Few Observations

George Almpanidis; Margarita Kotti; Constantine Kotropoulos

Automatic phone segmentation techniques based on model selection criteria are studied. We investigate the phone boundary detection efficiency of entropy- and Bayesian- based model selection criteria in continuous speech based on the DISTBIC hybrid segmentation algorithm. DISTBIC is a text-independent bottom-up approach that identifies sequential model changes by combining metric distances with statistical hypothesis testing. Using robust statistics and small sample corrections in the baseline DISTBIC algorithm, phone boundary detection accuracy is significantly improved, while false alarms are reduced. We also demonstrate further improvement in phonemic segmentation by taking into account how the model parameters are related in the probability density functions of the underlying hypotheses as well as in the model selection via the information complexity criterion and by employing M-estimators of the model parameters. The proposed DISTBIC variants are tested on the NTIMIT database and the achieved F 1 measure is 74.7% using a 20-ms tolerance in phonemic segmentation.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Computationally Efficient and Robust BIC-Based Speaker Segmentation

Margarita Kotti; Emmanouil Benetos; Constantine Kotropoulos

An algorithm for automatic speaker segmentation based on the Bayesian information criterion (BIC) is presented. BIC tests are not performed for every window shift, as previously, but when a speaker change is most probable to occur. This is done by estimating the next probable change point thanks to a model of utterance durations. It is found that the inverse Gaussian fits best the distribution of utterance durations. As a result, less BIC tests are needed, making the proposed system less computationally demanding in time and memory, and considerably more efficient with respect to missed speaker change points. A feature selection algorithm based on branch and bound search strategy is applied in order to identify the most efficient features for speaker segmentation. Furthermore, a new theoretical formulation of BIC is derived by applying centering and simultaneous diagonalization. This formulation is considerably more computationally efficient than the standard BIC, when the covariance matrices are estimated by other estimators than the usual maximum-likelihood ones. Two commonly used pairs of figures of merit are employed and their relationship is established. Computational efficiency is achieved through the speaker utterance modeling, whereas robustness is achieved by feature selection and application of BIC tests at appropriately selected time instants. Experimental results indicate that the proposed modifications yield a superior performance compared to existing approaches.


international symposium on circuits and systems | 2006

Musical instrument classification using non-negative matrix factorization algorithms

Emmanouil Benetos; Margarita Kotti; Constantine Kotropoulos

In this paper, a class of algorithms for automatic classification of individual musical instrument sounds is presented. Several perceptual features used in general sound classification applications were measured for 300 sound recordings consisting of 6 different musical instrument classes (piano, violin, cello, flute, bassoon and soprano saxophone). In addition, MPEG-7 basic spectral and spectral basis descriptors were considered, providing an effective combination for accurately describing the spectral and timbral audio characteristics. The audio files were split using 70% of the available data for training and the remaining 30% for testing. A classifier was developed based on non-negative matrix factorization (NMF) techniques, thus introducing a novel application of NMF. The standard NMF method was examined, as well as its modifications: the local, the sparse, and the discriminant NMF. Experimental results are presented to compare MPEG-7 spectral basis representations with MPEG-7 basic spectral features alongside the various NMF algorithms. The results indicate that the use of the spectrum projection coefficients for feature extraction and the standard NMF classifier yields an accuracy exceeding 95%


international symposium on circuits and systems | 2006

Automatic speaker change detection with the Bayesian information criterion using MPEG-7 features and a fusion scheme

Margarita Kotti; Emmanouil Benetos; Constantine Kotropoulos

This paper addresses unsupervised speaker change detection, a necessary step for several indexing tasks. We assume that there is no prior knowledge either on the number of speakers or their identities. Features included in the MPEG-7 audio prototype are investigated such as the AudioWaveformEnvelope and the AudioSpectrumCentroid. The model selection criterion is the Bayesian information criterion (BIC). A multiple pass algorithm is proposed. It uses a dynamic thresholding for scalar features and a fusion scheme so as to refine the segmentation results. It also models every speaker by a multivariate Gaussian probability density function and whenever new information is available, the respective model is updated. The experiments are carried out on a dataset created by concatenating speakers from the TIMIT database, that is referred to as the TIMIT data set. It is and demonstrated that the performance of the proposed multiple pass algorithm is better than that of other approaches


Neurocomputing | 2007

A neural network approach to audio-assisted movie dialogue detection

Margarita Kotti; Emmanouil Benetos; Constantine Kotropoulos; Ioannis Pitas

A novel framework for audio-assisted dialogue detection based on indicator functions and neural networks is investigated. An indicator function defines that an actor is present at a particular time instant. The cross-correlation function of a pair of indicator functions and the magnitude of the corresponding cross-power spectral density are fed as input to neural networks for dialogue detection. Several types of artificial neural networks, including multilayer perceptrons (MLPs), voted perceptrons, radial basis function networks, support vector machines, and particle swarm optimization-based MLPs are tested. Experiments are carried out to validate the feasibility of the aforementioned approach by using ground-truth indicator functions determined by human observers on six different movies. A total of 41 dialogue instances and another 20 non-dialogue instances are employed. The average detection accuracy achieved is high, ranging between 84.78%+/-5.499% and 91.43%+/-4.239%.


international conference on multimedia and expo | 2006

Automatic Speaker Segmentation using Multiple Features and Distance Measures: A Comparison of Three Approaches

Margarita Kotti; Luis Gustavo P. M. Martins; Emmanouil Benetos; Jaime S. Cardoso; Constantine Kotropoulos

This paper addresses the problem of unsupervised speaker change detection. Three systems based on the Bayesian information criterion (BIC) are tested. The first system investigates the AudioSpectrumCentroid and the AudioWaveformEnvelope features, implements a dynamic thresholding followed by a fusion scheme, and finally applies BIC. The second method is a real-time one that uses a metric-based approach employing the line spectral pairs and the BIC to validate a potential speaker change point. The third method consists of three modules. In the first module, a measure based on second-order statistics is used; in the second module, the Euclidean distance and T2 hotelling statistic are applied; and in the third module, the BIC is utilized. The experiments are carried out on a dataset created by concatenating speakers from the TIMIT database, that is referred to as the TIMIT data set. A comparison between the performance of the three systems is made based on t-statistics


international conference on multimedia and expo | 2006

Applying Supervised Classifiers Based on Non-negative Matrix Factorization to Musical Instrument Classification

Emmanouil Benetos; Margarita Kotti; Constantine Kotropoulos

In this paper, a new approach for automatic audio classification using non-negative matrix factorization (NMF) is presented. Training is performed onto each audio class individually, whilst during the test phase each test recording is projected onto the several training matrices. Experiments demonstrating the efficiency of the proposed approach were performed for musical instrument classification. Several perceptual features as well as MPEG-7 descriptors were measured for 300 sound recordings consisting of 6 different musical instrument classes. Subsets of the feature set were selected using branch-and-bound search, in order to obtain the most discriminating features for classification. Several NMF techniques were utilized, namely the standard NMF method, the local NMF, and the sparse NMF. The experiments demonstrate an almost perfect classification (classification error 1.0%), outperforming the state-of-the-art techniques tested for the aforementioned experiment

Collaboration


Dive into the Margarita Kotti's collaboration.

Top Co-Authors

Avatar

Constantine Kotropoulos

Aristotle University of Thessaloniki

View shared research outputs
Top Co-Authors

Avatar

Emmanouil Benetos

Queen Mary University of London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ioannis Pitas

Aristotle University of Thessaloniki

View shared research outputs
Top Co-Authors

Avatar

Konstantinos I. Diamantaras

Aristotle University of Thessaloniki

View shared research outputs
Top Co-Authors

Avatar

Vassiliki Moschou

Aristotle University of Thessaloniki

View shared research outputs
Top Co-Authors

Avatar

Petros Maragos

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Enrica Papi

Imperial College London

View shared research outputs
Researchain Logo
Decentralizing Knowledge