Is this you? Create Your Porfile

Veton Kepuska

Florida Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Veton Kepuska is active.

Explore More

Publication

Featured researches published by Veton Kepuska.

international conference on acoustics, speech, and signal processing | 1989

Investigation of phonemic context in speech using self-organizing feature maps

Veton Kepuska; John N. Gowdy

Some experiments with a neural-network model based on the self-organizing feature map algorithm are described. The main problem in phonemic recognition is the overlapping of feature vectors due to variability of speech and due to the coarticulation effect. This property of speech is reflected in the self-organized neural-network model in that a network unit can respond to more than one phonemic class. The authors have shown for their database that the sequence of responding units is consistent and similar for isolated utterances of the same word and distinct for different words. Thus, recognition can be based on network sequence identification. However, it is desirable that this sequence be somewhat simplified. Toward this goal they propose an algorithm for sequence smoothing. It is proposed that this network can be used as the feature extraction stage of another neural network that can learn the responding sequences as part of a speech recognition system.<<ETX>>

Journal of the Acoustical Society of America | 2007

Scoring and re-scoring dynamic time warping of speech

Veton Kepuska; Harinath K. Reddy

A method includes (i) measuring first distances between (a) vectors belonging to a set of vectors that represent an utterance and (b) vectors belonging to a set of vectors that represent a template, the measuring being done in accordance with a first order of the utterance vectors a first order of the template vectors, and (ii) measuring second distances between (a) individual vectors belonging to the set of vectors that represent the utterance and (b) individual vectors belonging to the set of vectors that represent the template, the measuring being done in accordance with a second order of the utterance vectors and a second order of the template vectors, and (iii) in which the first template vector order and the second template vector order are different and/or the first utterance vector order and the second utterance vector order are different. In another aspect, a method includes measuring distances between vectors that represent an utterance and vectors that represent a template, generating information indicative of how well the vectors of the utterance match the vectors of the template, and making a matching decision based on the measured distances and on the generated information.

southeastern symposium on system theory | 1988

The Kohonen net for speaker dependent isolated word recognition

Veton Kepuska; John N. Gowdy

Summary form only given, as follows. The authors evaluate the effectiveness of a neural-network model for use in a speech-recognition system. Performance of the model depends on many factors, including dimensionality of the network, number of neural units, type of neural units, and lateral feedback. They explore the effect of these parameters, as well as suitability and robustness of such a network for the task of speaker-dependent isolated-word discrimination. The model is tested using the Texas Instruments speech database.<<ETX>>

International Journal of Engineering Research and Applications | 2017

Comparing Speech Recognition Systems (Microsoft API, Google API And CMU Sphinx)

Veton Kepuska

The idea of this paper is to design a tool that will be used to test and compare commercial speech recognition systems, such as Microsoft Speech API and Google Speech API, with open-source speech recognition systems such as Sphinx-4. The best way to compare automatic speech recognition systems in different environments is by using some audio recordings that were selected from different sources and calculating the word error rate (WER). Although the WER of the three aforementioned systems were acceptable, it was observed that the Google API is superior.

Journal of Renewable and Sustainable Energy | 2013

Energy savings from using mobile smart technologies

Veton Kepuska; Paul Karaffa; Guinevere Shaw; Jacob Zurasky; Christopher Kovalik; Jordan Arnold; Salvador Macaraig

This paper presents the most recent results of energy saving benefits from the convergence of consumer products into a multi-function smart device, such as a smartphone or tablet, compared to single-function products (e.g., an electronic clock). Although individual users are predominantly driven only by the marked trends and the individual energy savings are moderate, the sheer number of users makes this global trend promising and a model for sustainable energy. The energy consumption of selected smart devices was tested using 42 frequently used applications and utilities (e.g., portable gaming, scanning, music players, etc.). A range of Operating Systems (OS) were selected for testing: Android OS (Google), Blackberry OS (RIM), iOS (Apple), and Windows 7 OS (Microsoft). Testing was conducted using the following smartphones: Blackberry Curve 9300, iPhone 4, Samsung Focus, Samsung Galaxy S; and tablets: iPad 2 and Samsung Galaxy Tablet 7″. In order to investigate the battery consumption, two programs were d...

Spie Newsroom | 2010

Wake-up-word recognition

Veton Kepuska

At the present time, humans cannot interact spontaneously with computers. Current state-of-the-art speech recognizers operate with an optimistic word-accuracy rate of 99%. In spontaneous speech, users typically utter 150–200 words per minute. This implies that the recognizer will incur one error in less than a minute. Thus, true ‘hands-free’ computer interaction is not yet possible. Detection of when a user is talking to a machine as opposed to someone else is an unsolved problem. We propose the use of a special trigger, a ‘wake-up word’ (WUW), to indicate when a user addresses a machine, similar to when humans use proper names to refer to each other. For this approach to be successful, we need to solve the problem of false acceptance. Present recognizers assume that all spoken interaction is in-vocabulary (INV, as opposed to out-of-vocabulary or OOV) speech. However, employing a machine WUW requires a computer to listen all the time to be able to respond appropriately. Hence, we have to solve problems related to correct recognition and rejection. We solved these problems by developing a WUW speechrecognition (SR) system.1, 2 Our WUW SR is a highly efficient and accurate recognizer,3 specializing in detection of a single word or phrase when spoken in the alerting (or WUW) context of requesting attention,4, 5 while rejecting all other words, phrases, sounds, noises, and other acoustic events with virtually 100% accuracy, including the same word or phrase uttered in nonalerting (referential) context.6, 7 The WUW SR task is similar to keyword spotting. However, it is different in one important aspect, i.e., in being able to discriminate the specific word/phrase used only in alerting context. Specifically, the sentence “Computer, begin PowerPoint presentation” exemplifies the use of the word ‘computer’ in an alerting context. On the other hand, in “My computer has dual Intel 64bit processors, each with quad cores,” the word ‘computer’ is used in referential (nonalerting) context. We developed WUW SR using only acoustic features, without relying on language modeling. They are based on the triplet of Figure 1. Details of signal processing at the front end of our speechrecognition setup, which contains a common spectrogram-computation module, feature-based voice-activity detection (VAD), and modules that compute three features for each analysis frame, including melfrequency cepstral coefficients (MFCCs), linear predictive coefficient (LPC) MFCCs, and enhanced (ENH) MFCCs. The VAD classifier determines the state (speech or no speech) of each frame or segment. This information is used by the recognizer’s back end. FFT: Fast Fourier transform.

ieee annual information technology electronics and mobile communication conference | 2017

Real time automated facial expression recognition app development on smart phones

Humaid Alshamsi; Veton Kepuska; Hongying Meng

Automated facial expression recognition (AFER) is a crucial technology to and a challenging task for human computer interaction. Previous methods of AFER have incorporated different features and classification methods and use basic testing approaches. In this paper, we employ the best feature descriptor for AFER by empirically evaluating the feature descriptors named the Facial Landmarks descriptor and the Center of Gravity descriptor. We examine each feature descriptor by considering one classification method, such as the Support Vector Machine (SVM) method, with three unique facial expression recognition (FER) datasets. In addition to test accuracies, we present confusion matrices of AFER. We also analyze the effect of using these feature and image resolutions on AFER performance. Our study indicates that the Facial Landmarks descriptor is the best choice to run AFER on mobile phones. The results of our study demonstrate that the proposed facial expression recognition on a mobile phone application is successful and provides up to 96.3% recognition accuracy.

Journal of the Acoustical Society of America | 2017

Representation of the combined concurrent acoustic sources in the electrical domain

Azhar Abdulaziz; Veton Kepuska

In the electrical domain, adding samples of individual sound sources is used to represent a combined acoustic signal formed by them. It is commonly believed that, when many concurrent acoustic sources are sampled by a single microphone, it will produce samples that are equal to the linear addition of the electrical samples of those sources. The key problem about this argument is it assumes linearity between the total sound pressure level (SPL) and each individual SPL. It also assumes that the microphone voltage varies linearly with the SPL applied on it. However, it is more likely that both of those relationships are non linear, at least in theory. We believe that this is the first study that correlates the mathematical relationship between the individual acoustic signals and their combination in both electrical and acoustic domains. A theoretical new formula is derived throughout this work showing that the acoustic source samples are added in a non-linear manner. In the electrical domain, the voltages th...

Proceedings of SPIE, the International Society for Optical Engineering | 2006

Communications protocol for RF-based indoor wireless localization systems

Tamas Kasza; Mehdi M. Shahsavari; Veton Kepuska; Maria Pinzone

A novel application-specific communications scheme for RF-based indoor wireless localization networks is proposed. In such a system wireless badges, attached to people or objects, report positions to wireless router units. Badges have very limited communication, energy, and processing capabilities. Routers are responsible for propagating collected badge information hop-by-hop toward one central unit of the system and are significantly less constrained by battery than the badges. Each unit can radiate a special sequence of bits at selected frequencies, so that any router in the wireless neighborhood can sense, store, aggregate and forward Received Signal Strength Indicator (RSSI) information. Once the central unit receives RSSI from routers, it calculates the overall relative position of each unit in the system. This new scheme has been developed based on the Chipcon CC1010 Evaluation Module with limited communication capabilities. The implemented protocol rules allow scalability of numerous system parameters. The feasibility of the proposed protocol is simulated on a typical floor: 2-dimensional topology where routers are deployed in a grid fashion. Results show that assuming normal operation and a maximum of thousand badges the system can periodically report about every five seconds. Different scenarios are compared, and the proposed scheme is demonstrated to meet strict reliability requirements while providing energy-efficient badges and an acceptable level of latency.

Archive | 2008