Huma Lodhi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Huma Lodhi is active.

Explore More

Publication

Featured researches published by Huma Lodhi.

Journal of Machine Learning Research | 2002

Text classification using string kernels

Huma Lodhi; Craig Saunders; John Shawe-Taylor; Nello Cristianini; Chris Watkins

We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences which are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. A preliminary experimental comparison of the performance of the kernel compared with a standard word feature space kernel [6] is made showing encouraging results.

international conference on machine learning | 2001

Latent Semantic Kernels

Nello Cristianini; John Shawe-Taylor; Huma Lodhi

Kernel methods like support vector machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vector-space representation of two documents, in analogy with classical information retrieval (IR) approaches.Latent semantic indexing (LSI) has been successfully used for IR purposes as a technique for capturing semantic relations between terms and inserting them into the similarity measure between two documents. One of its main drawbacks, in IR, is its computational cost.In this paper we describe how the LSI approach can be implemented in a kernel-defined feature space.We provide experimental results demonstrating that the approach can significantly improve performance, and that it does not impair it.

discovery science | 2005

Support vector inductive logic programming

Stephen Muggleton; Huma Lodhi; Ataollah Amini; Michael J. E. Sternberg

In this paper we explore a topic which is at the intersection of two areas of Machine Learning: namely Support Vector Machines (SVMs) and Inductive Logic Programming (ILP). We propose a general method for constructing kernels for Support Vector Inductive Logic Programming (SVILP). The kernel not only captures the semantic and syntactic relational information contained in the data but also provides the flexibility of using arbitrary forms of structured and non-structured data coded in a relational way. While specialised kernels have been developed for strings, trees and graphs our approach uses declarative background knowledge to provide the learning bias. The use of explicitly encoded background knowledge distinguishes SVILP from existing relational kernels which in ILP-terms work purely at the atomic generalisation level. The SVILP approach is a form of generalisation relative to background knowledge, though the final combining function for the ILP-learned clauses is an SVM rather than a logical conjunction. We evaluate SVILP empirically against related approaches, including an industry-standard toxin predictor called TOPKAT. Evaluation is conducted on a new broad-ranging toxicity dataset (DSSTox). The experimental results demonstrate that our approach significantly outperforms all other approaches in the study.

Sigkdd Explorations | 2002

Automatic scientific text classification using local patterns: KDD CUP 2002 (task 1)

Moustafa Ghanem; Yike Guo; Huma Lodhi; Yong Zhang

In this paper, we describe our approach for addressing Task 1 in the KDD CUP 2002 competition. The approach is based on developing and using an improved automatic feature selection method in conjunction with traditional classifiers. The feature selection method used is based on capturing frequently occurring keyword combinations (or motifs) within short segments of the text of a document and has proved to produce more accurate classification results than approaches relying solely on using keyword-based features.

Journal of Chemical Information and Modeling | 2007

A Novel Logic-Based Approach for Quantitative Toxicology Prediction

Ata Amini; Stephen Muggleton; Huma Lodhi; Michael J. E. Sternberg

There is a pressing need for accurate in silico methods to predict the toxicity of molecules that are being introduced into the environment or are being developed into new pharmaceuticals. Predictive toxicology is in the realm of structure activity relationships (SAR), and many approaches have been used to derive such SAR. Previous work has shown that inductive logic programming (ILP) is a powerful approach that circumvents several major difficulties, such as molecular superposition, faced by some other SAR methods. The ILP approach reasons with chemical substructures within a relational framework and yields chemically understandable rules. Here, we report a general new approach, support vector inductive logic programming (SVILP), which extends the essentially qualitative ILP-based SAR to quantitative modeling. First, ILP is used to learn rules, the predictions of which are then used within a novel kernel to derive a support-vector generalization model. For a highly heterogeneous dataset of 576 molecules with known fathead minnow fish toxicity, the cross-validated correlation coefficients (R2CV) from a chemical descriptor method (CHEM) and SVILP are 0.52 and 0.66, respectively. The ILP, CHEM, and SVILP approaches correctly predict 55, 58, and 73%, respectively, of toxic molecules. In a set of 165 unseen molecules, the R2 values from the commercial software TOPKAT and SVILP are 0.26 and 0.57, respectively. In all calculations, SVILP showed significant improvements in comparison with the other methods. The SVILP approach has a major advantage in that it uses ILP automatically and consistently to derive rules, mostly novel, describing fragments that are toxicity alerts. The SVILP is a general machine-learning approach and has the potential of tackling many problems relevant to chemoinformatics including in silico drug design.

Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques 1st | 2010

Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques

Huma Lodhi; Yoshihiro Yamanishi

Chemoinformatics is a scientific area that endeavours to study and solve complex chemical problems using computational techniques and methods. Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques provides an overview of current research in machine learning and applications to chemoinformatics tasks. As a timely compendium of research, this book offers perspectives on key elements that are crucial for complex study and investigation.

computational methods in systems biology | 2004

Modelling metabolic pathways using stochastic logic programs-based ensemble methods

Huma Lodhi; Stephen Muggleton

In this paper we present a methodology to estimate rates of enzymatic reactions in metabolic pathways. Our methodology is based on applying stochastic logic learning in ensemble learning. Stochastic logic programs provide an efficient representation for metabolic pathways and ensemble methods give state-of-the-art performance and are useful for drawing biological inferences. We construct ensembles by manipulating the data and driving randomness into a learning algorithm. We applied failure adjusted maximization as a base learning algorithm. The proposed ensemble methods are applied to estimate the rate of reactions in metabolic pathways of Saccharomyces cerevisiae. The results show that our methodology is very useful and it is effective to apply SLPs-based ensembles for complex tasks such as modelling of metabolic pathways.

intelligent data engineering and automated learning | 2000

Boosting the Margin Distribution

Huma Lodhi; Grigoris J. Karakoulas; John Shawe-Taylor

The paper considers applying a boosting strategy to optimise the generalisation bound obtained recently by Shawe-Taylor and Cristianini [7] in terms of the two norm of the slack variables. The formulation performs gradient descent over the quadratic loss function which is insensitive to points with a large margin. A novel feature of this algorithm is a principled adaptation of the size of the target margin. Experiments with text and UCI data shows that the new algorithm improves the accuracy of boosting. DMarginBoost generally achieves significant improvements over Adaboost.

discovery science | 2009

Learning Large Margin First Order Decision Lists for Multi-Class Classification

Huma Lodhi; Stephen Muggleton; Michael J. E. Sternberg

Inductive Logic Programming (ILP) systems have been successfully applied to solve binary classification problems. It remains an open question how an accurate solution to a multi-class problem can be obtained by using a logic based learning method. In this paper we present a novel logic based approach to solve challenging multi-class classification problems. Our technique is based on the use of large margin methods in conjunction with the kernels constructed from first order rules induced by an ILP system. The proposed approach learns a multi-class classifier by using a divide and conquer reduction strategy that splits multi-classes into binary groups and solves each individual problem recursively hence generating an underlying decision list structure. We also study the well known one-vs-all scheme in conjunction with logic-based kernel learning. In order to construct a highly informative logical and relational space we introduce a low dimensional embedding method. The technique is amenable to skewed/non-skewed class distribution where multi-class problems such as protein fold recognition are generally characterized by highly uneven class distribution. We performed a series of experiments to evaluate the proposed rule selection and multi-class schemes. The methods were applied to solve challenging problems in computation biology and bioinformatics, namely multi-class protein fold recognition and mutagenicity detection. Experimental comparisons of the performance of large margin first order decision list based multi-class scheme with the standard multi-class ILP algorithm and multi-class Support Vector Machine yielded statistically significant results. The results also demonstrated a favorable comparison between the performances of decision list based scheme and one-vs-all strategy.

discovery science | 2011

Bootstrapping parameter estimation in dynamic systems

Huma Lodhi; David R. Gilbert

We propose a novel approach for parameter estimation in dynamic systems. The method is based on the use of bootstrapping for time series data. It estimates parameters within the least square framework. The data points that do not appear in the individual bootstrapped datasets are used to assess the goodness of fit and for adaptive selection of the optimal parameters. We evaluate the efficacy of the proposed method by applying it to estimate parameters of dynamic biochemical systems. Experimental results show that the approach performs accurate estimation in both noise-free and noisy environments, thus validating its effectiveness. It generally outperforms related approaches in the scenarios where data is characterized by noise.

Explore More