Mark Culp | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark Culp is active.

Explore More

Publication

Featured researches published by Mark Culp.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2008

Graph-Based Semisupervised Learning

Mark Culp; George Michailidis

Graph-based learning provides a useful approach for modeling data in classification problems. In this modeling scenario, the relationship between labeled and unlabeled data impacts the construction and performance of classifiers and, therefore, a semisupervised learning framework is adopted. We propose a graph classifier based on kernel smoothing. A regularization framework is also introduced and it is shown that the proposed classifier optimizes certain loss functions. Its performance is assessed on several synthetic and real benchmark data sets with good results, especially in settings where only a small fraction of the data are labeled.

automated software engineering | 2012

Software defect prediction using semi-supervised learning with dimension reduction

Huihua Lu; Bojan Cukic; Mark Culp

Accurate detection of fault prone modules offers the path to high quality software products while minimizing non essential assurance expenditures. This type of quality modeling requires the availability of software modules with known fault content developed in similar environment. Establishing whether a module contains a fault or not can be expensive. The basic idea behind semi-supervised learning is to learn from a small number of software modules with known fault content and supplement model training with modules for which the fault information is not available. In this study, we investigate the performance of semi-supervised learning for software fault prediction. A preprocessing strategy, multidimensional scaling, is embedded in the approach to reduce the dimensional complexity of software metrics. Our results show that the semi-supervised learning algorithm with dimension-reduction preforms significantly better than one of the best performing supervised learning algorithms, random forest, in situations when few modules with known fault content are available for training.

The Annals of Applied Statistics | 2009

On multi-view learning with additive models

Mark Culp; George Michailidis; Kjell Johnson

In many scientific settings data can be naturally partitioned into variable groupings called views. Common examples include environmental (1st view) and genetic information (2nd view) in ecological applications, chemical (1st view) and biological (2nd view) data in drug discovery. Multi-view data also occur in text analysis and proteomics applications where one view consists of a graph with observations as the vertices and a weighted measure of pairwise similarity between observations as the edges. Further, in several of these applications the observations can be partitioned into two sets, one where the response is observed (labeled) and the other where the response is not (unlabeled). The problem for simultaneously addressing viewed data and incorporating unlabeled observations in training is referred to as multi-view transductive learning. In this work we introduce and study a comprehensive generalized fixed point additive modeling framework for multi-view transductive learning, where any view is represented by a linear smoother. The problem of view selection is discussed using a generalized Akaike Information Criterion, which provides an approach for testing the contribution of each view. An efficient implementation is provided for fitting these models with both backfitting and local-scoring type algorithms adjusted to semi-supervised graph-based learning. The proposed technique is assessed on both synthetic and real data sets and is shown to be competitive to state-of-the-art co-training and graph-based techniques.

Cancer Research | 2013

NEDD9 Depletion Destabilizes Aurora A Kinase and Heightens the Efficacy of Aurora A Inhibitors: Implications for Treatment of Metastatic Solid Tumors

Ryan J. Ice; Sarah L. McLaughlin; Ryan H. Livengood; Mark Culp; Erik R. Eddy; Alexey V. Ivanov; Elena N. Pugacheva

Aurora A kinase (AURKA) is overexpressed in 96% of human cancers and is considered an independent marker of poor prognosis. While the majority of tumors have elevated levels of AURKA protein, few have AURKA gene amplification, implying that posttranscriptional mechanisms regulating AURKA protein levels are significant. Here, we show that NEDD9, a known activator of AURKA, is directly involved in AURKA stability. Analysis of a comprehensive breast cancer tissue microarray revealed a tight correlation between the expression of both proteins, significantly corresponding with increased prognostic value. A decrease in AURKA, concomitant with increased ubiquitination and proteasome-dependent degradation, occurs due to depletion or knockout of NEDD9. Reexpression of wild-type NEDD9 was sufficient to rescue the observed phenomenon. Binding of NEDD9 to AURKA is critical for AURKA stabilization, as mutation of S296E was sufficient to disrupt binding and led to reduced AURKA protein levels. NEDD9 confers AURKA stability by limiting the binding of the cdh1-substrate recognition subunit of APC/C ubiquitin ligase to AURKA. Depletion of NEDD9 in tumor cells increases sensitivity to AURKA inhibitors. Combination therapy with NEDD9 short hairpin RNAs and AURKA inhibitors impairs tumor growth and distant metastasis in mice harboring xenografts of breast tumors. Collectively, our findings provide rationale for the use of AURKA inhibitors in treatment of metastatic tumors and predict the sensitivity of the patients to AURKA inhibitors based on NEDD9 expression.

Journal of Computational and Graphical Statistics | 2008

An Iterative Algorithm for Extending Learners to a Semi-Supervised Setting

Mark Culp; George Michailidis

In this article, we present an iterative self-training algorithm whose objective is to extend learners from a supervised setting into a semi-supervised setting. The algorithm is based on using the predicted values for observations where the response is missing (unlabeled data) and then incorporating the predictions appropriately at subsequent stages. Convergence properties of the algorithm are investigated for particular learners, such as linear/logistic regression and linear smoothers with particular emphasis on kernel smoothers. Further, implementation issues of the algorithm with other learners such as generalized additive models, tree partitioning methods, partial least squares, etc. are also addressed. The connection between the proposed algorithm and graph-based semi-supervised learning methods is also discussed. The algorithm is illustrated on a number of real datasets using a varying degree of labeled responses.

BMC Genomics | 2013

Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS

Eli Rodgers-Melnick; Mark Culp; Stephen P. DiFazio

BackgroundThe large-scale identification of physical protein-protein interactions (PPIs) is an important step toward understanding how biological networks evolve and generate emergent phenotypes. However, experimental identification of PPIs is a laborious and error-prone process, and current methods of PPI prediction tend to be highly conservative or require large amounts of functional data that may not be available for newly-sequenced organisms.ResultsIn this study we demonstrate a random-forest based technique, ENTS, for the computational prediction of protein-protein interactions based only on primary sequence data. Our approach is able to efficiently predict interactions on a whole-genome scale for any eukaryotic organism, using pairwise combinations of conserved domains and predicted subcellular localization of proteins as input features. We present the first predicted interactome for the forest tree Populus trichocarpa in addition to the predicted interactomes for Saccharomyces cerevisiae, Homo sapiens, Mus musculus, and Arabidopsis thaliana. Comparing our approach to other PPI predictors, we find that ENTS performs comparably to or better than a number of existing approaches, including several that utilize a variety of functional information for their predictions. We also find that the predicted interactions are biologically meaningful, as indicated by similarity in functional annotations and enrichment of co-expressed genes in public microarray datasets. Furthermore, we demonstrate some of the biological insights that can be gained from these predicted interaction networks. We show that the predicted interactions yield informative groupings of P. trichocarpa metabolic pathways, literature-supported associations among human disease states, and theory-supported insight into the evolutionary dynamics of duplicated genes in paleopolyploid plants.ConclusionWe conclude that the ENTS classifier will be a valuable tool for the de novo annotation of genome sequences, providing initial clues about regulatory and metabolic network topology, and revealing relationships that are not immediately obvious from traditional homology-based annotations.

Journal of Cell Science | 2012

Src binds cortactin through an SH2 domain cystine-mediated linkage.

Jason V. Evans; Amanda Gatesman Ammer; John Jett; Chris A. Bolcato; Jason C. Breaux; Karen H. Martin; Mark Culp; Peter M. Gannett; Scott A. Weed

Summary Tyrosine-kinase-based signal transduction mediated by modular protein domains is critical for cellular function. The Src homology (SH)2 domain is an important conductor of intracellular signaling that binds to phosphorylated tyrosines on acceptor proteins, producing molecular complexes responsible for signal relay. Cortactin is a cytoskeletal protein and tyrosine kinase substrate that regulates actin-based motility through interactions with SH2-domain-containing proteins. The Src kinase SH2 domain mediates cortactin binding and tyrosine phosphorylation, but how Src interacts with cortactin is unknown. Here we demonstrate that Src binds cortactin through cystine bonding between Src C185 in the SH2 domain within the phosphotyrosine binding pocket and cortactin C112/246 in the cortactin repeats domain, independent of tyrosine phosphorylation. Interaction studies show that the presence of reducing agents ablates Src-cortactin binding, eliminates cortactin phosphorylation by Src, and prevents Src SH2 domain binding to cortactin. Tandem MS/MS sequencing demonstrates cystine bond formation between Src C185 and cortactin C112/246. Mutational studies indicate that an intact cystine binding interface is required for Src-mediated cortactin phosphorylation, cell migration, and pre-invadopodia formation. Our results identify a novel phosphotyrosine-independent binding mode between the Src SH2 domain and cortactin. Besides Src, one quarter of all SH2 domains contain cysteines at or near the analogous Src C185 position. This provides a potential alternative mechanism to tyrosine phosphorylation for cysteine-containing SH2 domains to bind cognate ligands that may be widespread in propagating signals regulating diverse cellular functions.

Molecular Cancer Research | 2014

NEDD9 Depletion Leads to MMP14 Inactivation by TIMP2 and Prevents Invasion and Metastasis.

Sarah L. McLaughlin; Ryan J. Ice; Anuradha Rajulapati; Polina Y. Kozyulina; Ryan H. Livengood; Varvara K. Kozyreva; Yuriy V. Loskutov; Mark Culp; Scott A. Weed; Alexey V. Ivanov; Elena N. Pugacheva

The scaffolding protein NEDD9 is an established prometastatic marker in several cancers. Nevertheless, the molecular mechanisms of NEDD9-driven metastasis in cancers remain ill-defined. Here, using a comprehensive breast cancer tissue microarray, it was shown that increased levels of NEDD9 protein significantly correlated with the transition from carcinoma in situ to invasive carcinoma. Similarly, it was shown that NEDD9 overexpression is a hallmark of highly invasive breast cancer cells. Moreover, NEDD9 expression is crucial for the protease-dependent mesenchymal invasion of cancer cells at the primary site but not at the metastatic site. Depletion of NEDD9 is sufficient to suppress invasion of tumor cells in vitro and in vivo, leading to decreased circulating tumor cells and lung metastases in xenograft models. Mechanistically, NEDD9 localized to invasive pseudopods and was required for local matrix degradation. Depletion of NEDD9 impaired invasion of cancer cells through inactivation of membrane-bound matrix metalloproteinase MMP14 by excess TIMP2 on the cell surface. Inactivation of MMP14 is accompanied by reduced collagenolytic activity of soluble metalloproteinases MMP2 and MMP9. Reexpression of NEDD9 is sufficient to restore the activity of MMP14 and the invasive properties of breast cancer cells in vitro and in vivo. Collectively, these findings uncover critical steps in NEDD9-dependent invasion of breast cancer cells. Implications: This study provides a mechanistic basis for potential therapeutic interventions to prevent metastasis. Mol Cancer Res; 12(1); 69–81. ©2013 AACR.

Journal of Chemical Information and Modeling | 2010

The ensemble bridge algorithm: a new modeling tool for drug discovery problems

Mark Culp; Kjell Johnson; George Michailidis

Ensemble algorithms have been historically categorized into two separate paradigms, boosting and random forests, which differ significantly in the way each ensemble is constructed. Boosting algorithms represent one extreme, where an iterative greedy optimization strategy, weak learners (e.g., small classification trees), and stage weights are employed to target difficult-to-classify regions in the training space. On the other extreme, random forests rely on randomly selected features and complex learners (learners that exhibit low bias, e.g., large regression trees) to classify well over the entire training data. Because the approach is not targeting the next learner for inclusion, it tends to provide a natural robustness to noisy labels. In this work, we introduce the ensemble bridge algorithm, which is capable of transitioning between boosting and random forests using a regularization parameter nu in [0,1]. Because the ensemble bridge algorithm is a compromise between the greedy nature of boosting and the randomness present in random forests, it yields robust performance in the presence of a noisy response and superior performance in the presence of a clean response. Often, drug discovery data (e.g., computational chemistry data) have varying levels of noise. Hence, this method enables a practitioner to employ a single method to evaluate ensemble performance. The methods robustness is verified across a variety of data sets where the algorithm repeatedly yields better performance than either boosting or random forests alone. Finally, we provide diagnostic tools for the new algorithm, including a measure of variable importance and an observational clustering tool.

predictive models in software engineering | 2011

An iterative semi-supervised approach to software fault prediction

Huihua Lu; Bojan Cukic; Mark Culp

Background: Many statistical and machine learning techniques have been implemented to build predictive fault models. Traditional methods are based on supervised learning. Software metrics for a module and corresponding fault information, available from previous projects, are used to train a fault prediction model. This approach calls for a large size of training data set and enables the development of effective fault prediction models. In practice, data collection costs, the lack of data from earlier projects or product versions may make large fault prediction training data set unattainable. Small size of the training set that may be available from the current project is known to deteriorate the performance of the fault predictive model. In semi-supervised learning approaches, software modules with known or unknown fault content can be used for training. Aims: To implement and evaluate a semi-supervised learning approach in software fault prediction. Methods: We investigate an iterative semi-supervised approach to software quality prediction in which a base supervised learner is used within a semi-supervised application. Results: We varied the size of labeled software modules from 2% to 50% of all the modules in the project. After tracking the performance of each iteration in the semi-supervised algorithm, we observe that semi-supervised learning improves fault prediction if the number of initially labeled software modules exceeds 5%. Conclusion: The semi-supervised approach outperforms the corresponding supervised learning approach when both use random forest as base classification algorithm.

Explore More