Osonde Osoba
University of Southern California
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Osonde Osoba.
Neural Networks | 2016
Kartik Audhkhasi; Osonde Osoba; Bart Kosko
Injecting carefully chosen noise can speed convergence in the backpropagation training of a convolutional neural network (CNN). The Noisy CNN algorithm speeds training on average because the backpropagation algorithm is a special case of the generalized expectation-maximization (EM) algorithm and because such carefully chosen noise always speeds up the EM algorithm on average. The CNN framework gives a practical way to learn and recognize images because backpropagation scales with training data. It has only linear time complexity in the number of training samples. The Noisy CNN algorithm finds a special separating hyperplane in the networks noise space. The hyperplane arises from the likelihood-based positivity condition that noise-boosts the EM algorithm. The hyperplane cuts through a uniform-noise hypercube or Gaussian ball in the noise space depending on the type of noise used. Noise chosen from above the hyperplane speeds training on average. Noise chosen from below slows it on average. The algorithm can inject noise anywhere in the multilayered network. Adding noise to the output neurons reduced the average per-iteration training-set cross entropy by 39% on a standard MNIST image test set of handwritten digits. It also reduced the average per-iteration training-set classification error by 47%. Adding noise to the hidden layers can also reduce these performance measures. The noise benefit is most pronounced for smaller data sets because the largest EM hill-climbing gains tend to occur in the first few iterations. This noise effect can assist random sampling from large data sets because it allows a smaller random sample to give the same or better performance than a noiseless sample gives.
Neural Networks | 2013
Osonde Osoba; Bart Kosko
Noise can provably speed up convergence in many centroid-based clustering algorithms. This includes the popular k-means clustering algorithm. The clustering noise benefit follows from the general noise benefit for the expectation-maximization algorithm because many clustering algorithms are special cases of the expectation-maximization algorithm. Simulations show that noise also speeds up convergence in stochastic unsupervised competitive learning, supervised competitive learning, and differential competitive learning.
Fluctuation and Noise Letters | 2013
Osonde Osoba; Sanya Mitaim; Bart Kosko
We present a noise-injected version of the expectation–maximization (EM) algorithm: the noisy expectation–maximization (NEM) algorithm. The NEM algorithm uses noise to speed up the convergence of the EM algorithm. The NEM theorem shows that additive noise speeds up the average convergence of the EM algorithm to a local maximum of the likelihood surface if a positivity condition holds. Corollary results give special cases when noise improves the EM algorithm. We demonstrate these noise benefits on EM algorithms for three data models: the Gaussian mixture model (GMM), the Cauchy mixture model (CMM), and the censored log-convex gamma model. The NEM positivity condition simplifies to a quadratic inequality in the GMM and CMM cases. A final theorem shows that the noise benefit for independent identically distributed additive noise decreases with sample size in mixture models. This theorem implies that the noise benefit is most pronounced if the data is sparse.
international symposium on neural networks | 2013
Kartik Audhkhasi; Osonde Osoba; Bart Kosko
We show that noise can speed training in hidden Markov models (HMMs). The new Noisy Expectation-Maximization (NEM) algorithm shows how to inject noise when learning the maximum-likelihood estimate of the HMM parameters because the underlying Baum-Welch training algorithm is a special case of the Expectation-Maximization (EM) algorithm. The NEM theorem gives a sufficient condition for such an average noise boost. The condition is a simple quadratic constraint on the noise when the HMM uses a Gaussian mixture model at each state. Simulations show that a noisy HMM converges faster than a noiseless HMM on the TIMIT data set.
international symposium on neural networks | 2013
Kartik Audhkhasi; Osonde Osoba; Bart Kosko
We prove that noise can speed convergence in the backpropagation algorithm. The proof consists of two separate results. The first result proves that the backpropagation algorithm is a special case of the generalized Expectation-Maximization (EM) algorithm for iterative maximum likelihood estimation. The second result uses the recent EM noise benefit to derive a sufficient condition for backpropagation training. The noise adds directly to the training data. A noise benefit also applies to the deep bidirectional pre-training of the neural network as well as to the backpropagation training of the network. The geometry of the noise benefit depends on the probability structure of the neurons at each layer. Logistic sigmoidal neurons produce a forbidden noise region that lies below a hyperplane. Then all noise on or above the hyperplane can only speed convergence of the neural network. The forbidden noise region is a sphere if the neurons have a Gaussian signal or activation function. These noise benefits all follow from the general noise benefit of the EM algorithm. Monte Carlo sample means estimate the population expectations in the EM algorithm. We demonstrate the noise benefits using MNIST digit classification.
international symposium on neural networks | 2011
Osonde Osoba; Sanya Mitaim; Bart Kosko
We prove a general sufficient condition for a noise benefit in the expectation-maximization (EM) algorithm. Additive noise speeds the average convergence of the EM algorithm to a local maximum of the likelihood surface when the noise condition holds. The sufficient condition states when additive noise makes the signal more probable on average. The performance measure is Kullback relative entropy. A Gaussian-mixture problem demonstrates the EM noise benefit. Corollary results give other special cases when noise improves performance in the EM algorithm.
Fluctuation and Noise Letters | 2016
Osonde Osoba; Bart Kosko
We generalize the noisy expectation-maximization (NEM) algorithm to allow arbitrary modes of noise injection besides just adding noise to the data. The noise must still satisfy a NEM positivity condition. This generalization includes the important special case of multiplicative noise injection. A generalized NEM theorem shows that all measurable modes of injecting noise will speed the average convergence of the EM algorithm if the noise satisfies a generalized NEM positivity condition. This noise-benefit condition has a simple quadratic form for Gaussian and Cauchy mixture models in the case of multiplicative noise injection. Simulations show a multiplicative-noise EM speed-up of more than 27% in a simple Gaussian mixture model. Injecting blind noise only slowed convergence. A related theorem gives a sufficient condition for an average EM noise benefit for arbitrary modes of noise injection if the data model comes from the general exponential family of probability density functions. A final theorem shows that injected noise slows EM convergence on average if the NEM inequalities reverse and the noise satisfies a negativity condition.
The Journal of Defense Modeling and Simulation | 2017
Osonde Osoba; Bart Kosko
Feedback fuzzy cognitive maps (FCMs) can model the complex structure of public support for insurgency and terrorism (PSOT). FCMs are fuzzy causal signed digraphs that model degrees of causality in interwoven webs of feedback causality and policy variables. Their nonlinear dynamics permit forward-chaining inference from input causes and policy options to output effects. We show how a concept node causally affects downstream nodes through a weighted product of the intervening causal edge strengths. FCMs allow users to add detailed dynamics and feedback links directly to the causal model. Users can also fuse or combine FCMs from multiple experts by weighting and adding the underlying FCM fuzzy edge matrices. The combined FCM tends to better represent domain knowledge as the expert sample size increases if the expert sample approximates a random sample. Statistical or machine-learning algorithms can use numerical sample data to learn and tune a FCM’s causal edges. A differential Hebbian learning law can approximate a PSOT FCM’s directed edges of partial causality using time-series training data. The PSOT FCM adapts to the computational factor-tree PSOT model that Davis and OMahony based on prior social science research and case studies. Simulation experiments compare the PSOT models with the adapted FCM models.
Fuzzy Optimization and Decision Making | 2012
Osonde Osoba; Sanya Mitaim; Bart Kosko
We prove that three independent fuzzy systems can uniformly approximate Bayesian posterior probability density functions by approximating the prior and likelihood probability densities as well as the hyperprior probability densities that underly the priors. This triply fuzzy function approximation extends the recent theorem for uniformly approximating the posterior density by approximating just the prior and likelihood densities. This approximation allows users to state priors and hyper-priors in words or rules as well as to adapt them from sample data. A fuzzy system with just two rules can exactly represent common closed-form probability densities so long as they are bounded. The function approximators can also be neural networks or any other type of uniform function approximator. Iterative fuzzy Bayesian inference can lead to rule explosion. We prove that conjugacy in the if-part set functions for prior, hyperprior, and likelihood fuzzy approximators reduces rule explosion. We also prove that a type of semi-conjugacy of if-part set functions for those fuzzy approximators results in fewer parameters in the fuzzy posterior approximator.
international symposium on neural networks | 2011
Osonde Osoba; Sanya Mitaim; Bart Kosko
We prove that independent fuzzy systems can uniformly approximate Bayesian posterior probability density functions by approximating prior and likelihood probability densities as well as hyperprior probability densities that underly priors. This triply fuzzy function approximation extends the recent theorem for uniformly approximating the posterior density by approximating just the prior and likelihood densities. This allows users to state priors and hyper-priors in words or rules as well as to adapt them from sample data. A fuzzy system with just two rules can exactly represent common closed-form probability densities so long as they are bounded. The function approximators can also be neural networks or any other type of uniform function approximator.