Houssam Nassif | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Houssam Nassif is active.

Explore More

Publication

Featured researches published by Houssam Nassif.

international conference on data mining | 2009

Information Extraction for Clinical Data Mining: A Mammography Case Study

Houssam Nassif; Ryan W. Woods; Elizabeth S. Burnside; Mehmet Ayvaci; Jude W. Shavlik; David C. Page

Breast cancer is the leading cause of cancer mortality in women between the ages of 15 and 54. During mammography screening, radiologists use a strict lexicon (BI-RADS) to describe and report their findings. Mammography records are then stored in a well-defined database format (NMD). Lately, researchers have applied data mining and machine learning techniques to these databases. They successfully built breast cancer classifiers that can help in early detection of malignancy. However, the validity of these models depends on the quality of the underlying databases. Unfortunately, most databases suffer from inconsistencies, missing data, inter-observer variability and inappropriate term usage. In addition, many databases are not compliant with the NMD format and/or solely consist of text reports. BI-RADS feature extraction from free text and consistency checks between recorded predictive variables and text reports are crucial to addressing this problem. We describe a general scheme for concept information retrieval from free text given a lexicon, and present a BI-RADS features extraction algorithm for clinical data mining. It consists of a syntax analyzer, a concept finder and a negation detector. The syntax analyzer preprocesses the input into individual sentences. The concept finder uses a semantic grammar based on the BI-RADS lexicon and the experts’ input. It parses sentences detecting BI-RADS concepts. Once a concept is located, a lexical scanner checks for negation. Our method can handle multiple latent concepts within the text, filtering out ultrasound concepts. On our dataset, our algorithm achieves 97.7% precision, 95.5% recall and an F1-score of 0.97. It outperforms manual feature extraction at the 5% statistical significance level.

Proteins | 2009

Prediction of protein-glucose binding sites using support vector machines.

Houssam Nassif; Hassan Al-Ali; Sawsan Khuri; Walid Keirouz

Glucose is a simple sugar that plays an essential role in many basic metabolic and signaling pathways. Many proteins have binding sites that are highly specific to glucose. The exponential increase of genomic data has revealed the identity of many proteins that seem to be central to biological processes, but whose exact functions are unknown. Many of these proteins seem to be associated with disease processes. Being able to predict glucose‐specific binding sites in these proteins will greatly enhance our ability to annotate protein function and may significantly contribute to drug design. We hereby present the first glucose‐binding site classifier algorithm. We consider the sugar‐binding pocket as a spherical spatio‐chemical environment and represent it as a vector of geometric and chemical features. We then perform Random Forests feature selection to identify key features and analyze them using support vector machines classification. Our work shows that glucose binding sites can be modeled effectively using a limited number of basic chemical and residue features. Using a leave‐one‐out cross‐validation method, our classifier achieves a 8.11% error, a 89.66% sensitivity and a 93.33% specificity over our dataset. From a biochemical perspective, our results support the relevance of ordered water molecules and ions in determining glucose specificity. They also reveal the importance of carboxylate residues in glucose binding and the high concentration of negatively charged atoms in direct contact with the bound glucose molecule. Proteins 2009.

conference on recommender systems | 2016

Adaptive, Personalized Diversity for Visual Discovery

Choon Hui Teo; Houssam Nassif; Daniel N. Hill; Sriram Srinivasan; Mitchell Goodman; Vijai Mohan; S. V. N. Vishwanathan

Search queries are appropriate when users have explicit intent, but they perform poorly when the intent is difficult to express or if the user is simply looking to be inspired. Visual browsing systems allow e-commerce platforms to address these scenarios while offering the user an engaging shopping experience. Here we explore extensions in the direction of adaptive personalization and item diversification within Stream, a new form of visual browsing and discovery by Amazon. Our system presents the user with a diverse set of interesting items while adapting to user interactions. Our solution consists of three components (1) a Bayesian regression model for scoring the relevance of items while leveraging uncertainty, (2) a submodular diversification framework that re-ranks the top scoring items based on category, and (3) personalized category preferences learned from the users behavior. When tested on live traffic, our algorithms show a strong lift in click-through-rate and session duration.

BMC Bioinformatics | 2012

Automated identification of protein-ligand interaction features using Inductive Logic Programming: a hexose binding case study

José Carlos Almeida Santos; Houssam Nassif; David C. Page; Stephen Muggleton; Michael J. E. Sternberg

BackgroundThere is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP), which automatically generates comprehensible rules in addition to prediction. The development of ILP systems which can learn rules of the complexity required for studies on protein structure remains a challenge. In this work we use a new ILP system, ProGolem, and demonstrate its performance on learning features of hexose-protein interactions.ResultsThe rules induced by ProGolem detect interactions mediated by aromatics and by planar-polar residues, in addition to less common features such as the aromatic sandwich. The rules also reveal a previously unreported dependency for residues cys and leu. They also specify interactions involving aromatic and hydrogen bonding residues. This paper shows that Inductive Logic Programming implemented in ProGolem can derive rules giving structural features of protein/ligand interactions. Several of these rules are consistent with descriptions in the literature.ConclusionsIn addition to confirming literature results, ProGolem’s model has a 10-fold cross-validated predictive accuracy that is superior, at the 95% confidence level, to another ILP system previously used to study protein/hexose interactions and is comparable with state-of-the-art statistical learners.

international health informatics symposium | 2010

Uncovering age-specific invasive and DCIS breast cancer rules using inductive logic programming

Houssam Nassif; David C. Page; Mehmet Ayvaci; Jude W. Shavlik; Elizabeth S. Burnside

Breast cancer is the most common type of cancer among women. Current clinical breast cancer diagnosis involves a biopsy, which is a costly, invasive and potentially painful procedure. Some researchers proposed models, based on mammography features and personal information, that help identify pre-biopsy invasive breast carcinoma and ductal carcinoma in situ (DCIS). Recently, a differential discriminating ability between invasive and DCIS has been linked to age. Based on this finding, we use an age-stratified mammography and biopsy relational dataset and apply Inductive Logic Programming (ILP) techniques to learn age-specific logical rules that classify invasive and DCIS occurrences. We then use statistical modeling to retrieve rules that have a significantly different performance across age-stratas. These final rules reveal a number of interesting results. Although a palpable lump is more commonly associated with younger patients, it turns out to be a better predictor of invasive cancer in older women. A recurrence has a higher probability to be invasive in older and middle-aged women. A previously unreported rule revealed by our technique is that recurrence is more likely a DCIS predictor in younger women. This younger DCIS predicting rule effectively links the current diagnostic mammogram to older studies, and provides opposite predictions across the age divide. The resulting rules are age-specific, can help patients and their physicians make more informed decisions about managing their breast health, and constitute a personalized predictive model.

bioinformatics and biomedicine | 2012

Extracting BI-RADS features from Portuguese clinical texts

Houssam Nassif; Filipe Cunha; Inês Moreira; Ricardo Cruz-Correia; Eliana Sousa; David C. Page; Elizabeth S. Burnside; Inês de Castro Dutra

In this work we build the first BI-RADS parser for Portuguese free texts, modeled after existing approaches to extract BI-RADS features from English medical records. Our concept finder uses a semantic grammar based on the BI-RADS lexicon and on iterative transferred expert knowledge. We compare the performance of our algorithm to manual annotation by a specialist in mammography. Our results show that our parsers performance is comparable to the manual method.

BMC Cancer | 2014

Predicting invasive breast cancer versus DCIS in different age groups

Mehmet Ayvaci; Oguzhan Alagoz; Jagpreet Chhatwal; Alejandro Munoz del Rio; Edward A. Sickles; Houssam Nassif; Karla Kerlikowske; Elizabeth S. Burnside

BackgroundIncreasing focus on potentially unnecessary diagnosis and treatment of certain breast cancers prompted our investigation of whether clinical and mammographic features predictive of invasive breast cancer versus ductal carcinoma in situ (DCIS) differ by age.MethodsWe analyzed 1,475 malignant breast biopsies, 1,063 invasive and 412 DCIS, from 35,871 prospectively collected consecutive diagnostic mammograms interpreted at University of California, San Francisco between 1/6/1997 and 6/29/2007. We constructed three logistic regression models to predict the probability of invasive cancer versus DCIS for the following groups: women ≥ 65 (older group), women 50–64 (middle age group), and women < 50 (younger group). We identified significant predictors and measured the performance in all models using area under the receiver operating characteristic curve (AUC).ResultsThe models for older and the middle age groups performed significantly better than the model for younger group (AUC = 0.848 vs, 0.778; p = 0.049 and AUC = 0.851 vs, 0.778; p = 0.022, respectively). Palpability and principal mammographic finding were significant predictors in distinguishing invasive from DCIS in all age groups. Family history of breast cancer, mass shape and mass margins were significant positive predictors of invasive cancer in the older group whereas calcification distribution was a negative predictor of invasive cancer (i.e. predicted DCIS). In the middle age group—mass margins, and in the younger group—mass size were positive predictors of invasive cancer.ConclusionsClinical and mammographic finding features predict invasive breast cancer versus DCIS better in older women than younger women. Specific predictive variables differ based on age.

european conference on machine learning | 2013

Score As You Lift (SAYL): a statistical relational learning approach to uplift modeling

Houssam Nassif; Finn Kuusisto; Elizabeth S. Burnside; David C. Page; Jude W. Shavlik; Vítor Santos Costa

We introduce Score As You Lift (SAYL), a novel Statistical Relational Learning (SRL) algorithm, and apply it to an important task in the diagnosis of breast cancer. SAYL combines SRL with the marketing concept of uplift modeling, uses the area under the uplift curve to direct clause construction and final theory evaluation, integrates rule learning and probability assignment, and conditions the addition of each new theory rule to existing ones. Breast cancer, the most common type of cancer among women, is categorized into two subtypes: an earlier in situ stage where cancer cells are still confined, and a subsequent invasive stage. Currently older women with in situ cancer are treated to prevent cancer progression, regardless of the fact that treatment may generate undesirable side-effects, and the woman may die of other causes. Younger women tend to have more aggressive cancers, while older women tend to have more indolent tumors. Therefore older women whose in situ tumors show significant dissimilarity with in situ cancer in younger women are less likely to progress, and can thus be considered for watchful waiting. Motivated by this important problem, this work makes two main contributions. First, we present the first multi-relational uplift modeling system, and introduce, implement and evaluate a novel method to guide search in an SRL framework. Second, we compare our algorithm to previous approaches, and demonstrate that the system can indeed obtain differential rules of interest to an expert on real data, while significantly improving the data uplift.

european conference on machine learning | 2012

Relational differential prediction

Houssam Nassif; Vítor Santos Costa; Elizabeth S. Burnside; David C. Page

A typical classification problem involves building a model to correctly segregate instances of two or more classes. Such a model exhibits differential prediction with respect to given data subsets when its performance is significantly different over these subsets. Driven by a mammography application, we aim at learning rules that predict breast cancer stage while maximizing differential prediction over age-stratified data. In this work, we present the first multi-relational differential prediction (aka uplift modeling) system, and propose three different approaches to learn differential predictive rules within the Inductive Logic Programming framework. We first test and validate our methods on synthetic data, then apply them on a mammography dataset for breast cancer stage differential prediction rule discovery. We mine a novel rule linking calcification to in situ breast cancer in older women.

knowledge discovery and data mining | 2017

An Efficient Bandit Algorithm for Realtime Multivariate Optimization

Daniel N. Hill; Houssam Nassif; Yi Liu; Anand Iyer; S. V. N. Vishwanathan

Optimization is commonly employed to determine the content of web pages, such as to maximize conversions on landing pages or click-through rates on search engine result pages. Often the layout of these pages can be decoupled into several separate decisions. For example, the composition of a landing page may involve deciding which image to show, which wording to use, what color background to display, etc. Such optimization is a combinatorial problem over an exponentially large decision space. Randomized experiments do not scale well to this setting, and therefore, in practice, one is typically limited to optimizing a single aspect of a web page at a time. This represents a missed opportunity in both the speed of experimentation and the exploitation of possible interactions between layout decisions Here we focus on multivariate optimization of interactive web pages. We formulate an approach where the possible interactions between different components of the page are modeled explicitly. We apply bandit methodology to explore the layout space efficiently and use hill-climbing to select optimal content in realtime. Our algorithm also extends to contextualization and personalization of layout selection. Simulation results show the suitability of our approach to large decision spaces with strong interactions between content. We further apply our algorithm to optimize a message that promotes adoption of an Amazon service. After only a single week of online optimization, we saw a 21% conversion increase compared to the median layout. Our technique is currently being deployed to optimize content across several locations at Amazon.com.

Explore More