Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Moses Ekpenyong is active.

Publication


Featured researches published by Moses Ekpenyong.


Speech Communication | 2014

Statistical parametric speech synthesis for Ibibio

Moses Ekpenyong; Eno-Abasi Urua; Oliver Watts; Simon King; Junichi Yamagishi

Ibibio is a Nigerian tone language, spoken in the south-east coastal region of Nigeria. Like most African languages, it is resource-limited. This presents a major challenge to conventional approaches to speech synthesis, which typically require the training of numerous predictive models of linguistic features such as the phoneme sequence (i.e., a pronunciation dictionary plus a letter-to-sound model) and prosodic structure (e.g., a phrase break predictor). This training is invariably supervised, requiring a corpus of training data labelled with the linguistic feature to be predicted. In this paper, we investigate what can be achieved in the absence of many of these expensive resources, and also with a limited amount of speech recordings. We employ a statistical parametric method, because this has been found to offer good performance even on small corpora, and because it is able to directly learn the relationship between acoustics and whatever linguistic features are available, potentially mitigating the absence of explicit representations of intermediate linguistic layers such as prosody. We present an evaluation that compares systems that have access to varying degrees of linguistic structure. The simplest system only uses phonetic context (quinphones), and this is compared to systems with access to a richer set of context features, with or without tone marking. It is found that the use of tone marking contributes significantly to the quality of synthetic speech. Future work should therefore address the problem of tone assignment using a dictionary and the building of a prediction module for out-of-vocabulary words.


International Journal of Speech Technology | 2008

Towards an unrestricted domain TTS system for African tone languages

Moses Ekpenyong; Eno-Abasi Urua; Dafydd Gibbon

In this paper we discuss the procedural problems, issues and challenges involved in developing a generic speech synthesizer for African tone languages. We base our development methodology on the “MultiSyn” unit-selection approach, supported by Festival Text-To-Speech (TTS) Toolkit for Ibibio, a Lower Cross subgroup of the (New) Benue-Congo language family widely spoken in the southeastern region of Nigeria. We present in a chronological order, the several levels of infrastructural and linguistic problems as well as challenges identified in the Local Language Speech Technology Initiative (LLSTI) during the development process (from the corpus preparation and refinement stage to the integration and synthesis stage). We provide solutions to most of these challenges and point to possible outlook for further refinement. The evaluation of the initial prototype shows that the synthesis system will be useful to non-literate communities and a wide spectrum of applications.


international joint conference on neural network | 2016

Unsupervised mining of under-resourced speech corpora for tone features classification

Moses Ekpenyong; Udoinyang G. Inyang

In this contribution, the unsupervised mining of speech corpora for the efficient classification of tone features was investigated. Input vectors to the experiment were generated from tone pattern alignments of Ibibio (Benue-Congo, Nigeria) corpus. The corpus used for the experiment contained 16,905 words/phrases. The proposed system design is novel, and integrates two unsupervised tools - k-means clustering and self organizing map (SOM) model, into a methodological workflow, that evaluates and selects the optimal number of clusters with the subsequent association of each clustering point to the input data points. In order to reduce data dimensionality for effective visualization, a non-negative matrix factorization (NMF) was introduced to rid the k-means clusters of noisy attributes. The k-means cluster points generated by the optimum clusters (two in this case) were evaluated by the Silhouette algorithm and finally fed into the SOM, to improve the efficiency of features classification. Results obtained validate existing research claims and demonstrates the importance of vowel-only features in the recognition of tone patterns. A SOM visualization of the input vectors revealed that vowel-only feature correlates better with other input vectors such as syllable and phoneme, compared to consonant-only features. Furthermore, clustering the input datasets into the optimal number of clusters enabled proper and timely visualization of the map. This contribution is therefore vital for advancing future speech processing research on under-resourced languages.


international conference on artificial intelligence and soft computing | 2016

Towards a Hybrid Learning Approach to Efficient Tone Pattern Recognition

Moses Ekpenyong; Udoinyang G. Inyang; Imeh Umoren

Tone has remained an interesting puzzle to the development of language resources for African languages, mainly because its appearance (within a word) is not segmentally fixed. In this contribution, we begin by proposing a tone marking framework that intelligently tags an input corpus using a close-copy synthesis of tone-tags generated by a Hidden Markov Model (HMM) syllabifier. Next, we investigate the recognition of tone patterns by building a generic architecture that will serve diverse languages. The proposed architecture is a multi-layer feedforward neural network implementing the Levenberg-Marquardt backpropagation algorithm. The network consists of, (i) seventeen inputs describing the tone patterns of Ibibio (ISO 693-3: nic; Ethnologue: IBB), with training data captured from an input corpus of 16,905 phrases; (ii) a target class that learns tone recognition from a combination of the input tone patterns and boundary tone – an important feature used for intonation analysis. Results obtained showed that our tone marking model perfectly tagged the input corpus, except for phonemes with more than one diacritic marks. Concerning the recognition of tone patterns, we deduced from a confusion matrix that 93.1 % of the tone patterns were correctly classified, while the remaining 6.9 % of the patterns were misclassified. A greater chunk of the misclassified cases came from non-boundary tone information, which presence inhibits speech quality. The ROC curve also showed good classification of the training, testing and validation datasets. A future direction of this paper is the introduction of an unsupervised solution and additional tone-bearing information such as syllables and vowels, to improve the learning system; and a comparison of our approach with other methods.


international conference on computational science | 2015

Handoffs Decision Optimization of Mobile Celular Networks

Moses Ekpenyong; Joseph Isabona; Etebong Isong

A neural network (NN) approach is proposed in this paper to optimize the operation of cellular networks. First, we derive an analytical equation to establish the effect of essential handoff contributory factors. Data of these factors were then obtained from base stations of an operational network carrier, and trained on a backpropagation NN algorithm. Results obtained revealed that the network performance improved with big data and increase in number of neurons for SINR data, but, increasing the number of layers degraded the system performance. However, with relative signal strength data, improved performance was achieved for big data and increase in number of layers for normalized and un-normalized data. Finally, self organizing map was explored to visualize the existing system for the purpose of improving further, its performance.


computational intelligence communication systems and networks | 2014

Call drop minimization using Fuzzy associative memory

Moses Ekpenyong; Inemesit E. Gibson; Imeh Umoren

In this paper, a Fuzzy associative memory approach is adopted to minimize the effect of drop calls in wireless cellular networks. To implement this approach, a number of factors that contribute to call dropping in CDMA networks were identified and data for these factors collected from an operational telecommunication carrier. These data were then used to establish the membership functions for driving the fuzzy inference engine, and through extensive simulations, the overall efficiencies for the existing and optimized forms of the system were obtained. It was observed that as the traffic got burstier, the existing system failed, compared to the optimized system which sustained the efficiency at about 72%. Furthermore, the optimized system exhibited fair allocation of system resources, effectively managing processes such as handovers, and, precluding unnecessary wastage of the system resources. The performance of the optimized system, however, degraded when the drop call probability exceeded the recommended threshold of 0.02. In practical systems, this constraint is obligatory to avert severe network degradation. The interactive effect of the selected factors on the network efficiency was also investigated. We observed the independence of some of these factors, as the drop call probability and system efficiency remained unchanged. But as more channels became available for the growing number of users., there was need for optimal configuration settings to avert scenarios that may negatively impact on the overall system performance.


language and technology conference | 2013

Adaptive Prosody Modelling for Improved Synthetic Speech Quality

Moses Ekpenyong; Udoinyang G. Inyang; Emem Obong Udoh

Neural networks and fuzzy logic have proven to be efficient when applied individually to a variety of domain-specific problems, but their precision is enhanced when hybridized. This contribution presents a combined framework for improving the accuracy of prosodic models. It adopts the Adaptive Neuro-fuzzy Inference System (ANFIS), to offer self-tuned cognitive-learning capabilities, suitable for predicting the imprecise nature of speech prosody. After initializing the Fuzzy Inference System (FIS) structure, an Ibibio (ISO 693–3: nic; Ethnologue: IBB) speech dataset was trained using the gradient descent and non-negative least squares estimator (LSE) to demonstrate the feasibility of the proposed model. The model was then validated using synthesized speech corpus dataset of fundamental frequency (F0) values of ibibio tones, captured at various contour positions (initial, mid, final) within the courpus. Results obtained showed an insignificant difference between the predicted output and the check dataset with a checking error of 0.0412, and validates our claim that the proposed model is satisfactory and suitable for improving prosody prediction of synthetic speech.


Computer and Information Science | 2012

Morpho-Syntactic Analysis Framework for Tone Language Text-to-Speech Systems

Moses Ekpenyong; Emem Obong Udoh

This paper presents a morpho-syntactic analysis framework using data-driven methodology. The proposed framework complements the front-end design of a recent text-to-speech (TTS) project and is generic for other tone language systems. We experiment the design for Ibibio (ISO 693-2: nic; Ethnologue: IBB), a Lower Cross language of the (New) Benue Congo language family, widely spoken in the south-eastern region of Nigeria. Implementation shows that the design is sufficient for morpho-syntactic parsing and useful for prosody improvement in TTS systems. Also, the methodology adopted detaches a greater part of the linguistic features specification from the program code. This allows for easy morphological alterations of utterances and replication of the synthesizer for other languages.


international conference on artificial intelligence and soft computing | 2018

Modelling Speaker Variability Using Covariance Learning

Moses Ekpenyong; Imeh Umoren

In this contribution, we investigate the relationship between speakers and speech utterance, and propose a speaker normalization/adaptation model that incorporates correlation amongst the utterance classes produced by male and female speakers of varying age categories (children: 0–15; youths: 16–30; adults: 31–50; seniors: \({>}50\)). Using Principal Component Analysis (PCA), a speaker space was constructed, and based on the speaker covariance matrix obtained directly from the speech data signals, a visualisation of the first three principal components (PCs) was achieved. For effective covariance learning, a component-wise normalisation of each vector weights of the covariance matrix was performed, and a machine learning algorithm (the SOM: self organising map) implemented to model selected speaker features (F0, intensity, pulse) variability. Results obtained reveal that, for the features selected, F0 gave the most variance, as both genders exhibited high variability. For male speakers, PC1 captured the most variance of 87%, while PC2 and PC3 captured the least variances of 7% and 3%, respectively. For female speakers, PC1 captured the most variance of 97%, while PC2 and PC3 captured the least variances of 2% and 1%, respectively. Further, intensity and pulse features show close similarity patterns between the speech features, and are not most relevant for speaker variability modelling. Component planes visualisation of the respective speech patterns learned from the features covariance revealed consistent patterns, and hence, useful in speaker recognition systems.


Speech Communication | 2018

Unsupervised visualization of Under-resourced speech prosody

Moses Ekpenyong; Udoinyang G. Inyang; Emem Obong Udoh

Abstract In this paper, an unsupervised visualization framework for analyzing under-resourced speech prosody is proposed. An experiment was carried out for Ibibio–a Lower Cross Language of the New Benue Congo family, spoken in the Southeast coastal region of Nigeria, West Africa. The proposed methodology adopts machine learning, with semi-automated procedure for extracting prosodic features from a translated prosodically stable corpus ‘The Tiger and the Mouse’—a text corpus that demonstrates the prosody of read-aloud English. A self-organizing map (SOM) was used to learn the classification of certain input vectors (speech duration, fundamental frequency: F0, phoneme pattern (vowels only), tone pattern), and provide visualization of the clusters structure. Results obtained from the experiment showed that duration and F0 features realized from speech syllables are indispensable for modeling phoneme and tone patterns, but the tone input classes revealed clusters with well separated boundaries and well distributed component planes, compared to the phoneme input classes. Further, except for very few outliers, the map weights were well distributed with proper neighboring neuron connections across the input space. A possible future work to advance this research is the development of the languages corpus, for the discovery of prosodic patterns in expressive speech.

Collaboration


Dive into the Moses Ekpenyong's collaboration.

Top Co-Authors

Avatar

Joseph Isabona

Benson Idahosa University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Imeh Umoren

Akwa Ibom State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Etebong Isong

Akwa Ibom State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge