Is this you? Create Your Porfile

Walid Shalaby

University of North Carolina at Charlotte

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Walid Shalaby is active.

Explore More

Publication

Featured researches published by Walid Shalaby.

technical symposium on computer science education | 2015

Simulating IBM Watson in the Classroom

Wlodek Zadrozny; Sean Gallagher; Walid Shalaby; Adarsh Avadhani

IBM Watson exemplifies multiple innovations in natural language processing and question answering. In addition, Watson uses most of the known techniques in these two domains as well as many methods from related domains. Hence, there is pedagogical value in a rigorous understanding of its function. The paper provides the description of a text analytics course focused on building a simulator of IBM Watson, conducted in Spring 2014 at UNC Charlotte. We believe this is the first time a simulation containing all the major Watson components was created in a university classroom. The system achieved a respectable (close to) 20% accuracy on Jeopardy! questions, and there remain many known and new avenues of improving performance that can be explored in the future. The code and documentation are available on GitHub. The paper is a joint effort of the teacher and some of the students who were leading teams implementing component technologies, and therefore deeply involved in making the class successful.

international conference on big data | 2014

Knowledge based dimensionality reduction for technical text mining

Walid Shalaby; Wlodek Zadrozny; Sean Gallagher

In this paper we propose a novel technique for dimensionality reduction using freely available online knowledge bases. The complexity of our method is linearly proportional to the size of the full feature set, making it applicable efficiently to huge and complex datasets. We demonstrate this approach by investigating its effectiveness on patent data, the largest free technical text. We report empirical results on classification of the CLEF-IP 2010 dataset using bigram features supported by mentions in Wikipedia, Wiktionary, and GoogleBooks knowledge bases. We achieve a 13-fold reduction in number of bigrams features and a 1.7% increase in classification accuracy over the unigrams baseline. These results give concrete evidence that significant accuracy improvements and massive reduction in dimensionality could be achieved using our approach, hence help alleviating the tradeoff between task complexity and accuracy.

international acm sigir conference on research and development in information retrieval | 2018

Toward an Interactive Patent Retrieval Framework based on Distributed Representations

Walid Shalaby; Wlodek Zadrozny

We present a novel interactive framework for patent retrieval leveraging distributed representations of concepts and entities extracted from the patents text. We propose a simple and practical interactive relevance feedback mechanism where the user is asked to annotate relevant/irrelevant results from the top n hits. We then utilize this feedback for query reformulation and term weighting where weights are assigned based on how good each term is at discriminating the relevant vs. irrelevant candidates. First, we demonstrate the efficacy of the distributed representations on the CLEF-IP 2010 dataset where we achieve significant improvement of 4.6% in recall over the keyword search baseline. Second, we simulate interactivity to demonstrate the efficacy of our interactive term weighting mechanism. Simulation results show that we can achieve significant improvement in recall from one interaction iteration outperforming previous semantic and interactive patent retrieval methods.

arXiv: Computation and Language | 2018

Beyond word embeddings: learning entity and concept representations from large scale knowledge bases

Walid Shalaby; Wlodek Zadrozny; Hongxia Jin

Text representations using neural word embeddings have proven effective in many NLP applications. Recent researches adapt the traditional word embedding models to learn vectors of multiword expressions (concepts/entities). However, these methods are limited to textual knowledge bases (e.g., Wikipedia). In this paper, we propose a novel and simple technique for integrating the knowledge about concepts from two large scale knowledge bases of different structure (Wikipedia and Probase) in order to learn concept representations. We adapt the efficient skip-gram model to seamlessly learn from the knowledge in Wikipedia text and Probase concept graph. We evaluate our concept embedding models on two tasks: (1) analogical reasoning, where we achieve a state-of-the-art performance of 91% on semantic analogies, (2) concept categorization, where we achieve a state-of-the-art performance on two benchmark datasets achieving categorization accuracy of 100% on one and 98% on the other. Additionally, we present a case study to evaluate our model on unsupervised argument type identification for neural semantic parsing. We demonstrate the competitive accuracy of our unsupervised method and its ability to better generalize to out of vocabulary entity mentions compared to the tedious and error prone methods which depend on gazetteers and regular expressions.

computer software and applications conference | 2016

Entity Type Recognition Using an Ensemble of Distributional Semantic Models to Enhance Query Understanding

Walid Shalaby; Khalifeh Al Jadda; Mohammed Korayem; Trey Grainger

We present an ensemble approach for categorizing search query entities in the recruitment domain. Understanding the types of entities expressed in a search query (Company, Skill, Job Title, etc.) enables more intelligent information retrieval based upon those entities compared to a traditional keyword-based search. Because search queries are typically very short, leveraging a traditional bag-of-words model to identify entity types would be inappropriate due to the lack of contextual information. Our approach instead combines clues from different sources of varying complexity in order to collect real-world knowledge about query entities. We employ distributional semantic representations of query entities through two models: 1) contextual vectors generated from encyclopedic corpora like Wikipedia, and 2) high dimensional word embedding vectors generated from millions of job postings using word2vec. Additionally, our approach utilizes both entity linguistic properties obtained from WordNet and ontological properties extracted from DBpedia. We evaluate our approach on a data set created at CareerBuilder, the largest job board in the US. The data set contains entities extracted from millions of job seekers/recruiters search queries, job postings, and resume documents. After constructing the distributional vectors of search entities, we use supervised machine learning to infer search entity types. Empirical results show that our approach outperforms the state-of-the-art word2vec distributional semantics model trained on Wikipedia. Moreover, we achieve micro-averaged F1 score of 97% using the proposed distributional representations ensemble.

international conference on big data | 2016