Mohamed Yahya | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mohamed Yahya is active.

Explore More

Publication

Featured researches published by Mohamed Yahya.

conference on information and knowledge management | 2013

Robust question answering over the web of linked data

Mohamed Yahya; Klaus Berberich; Shady Elbassuoni; Gerhard Weikum

Knowledge bases and the Web of Linked Data have become important assets for search, recommendation, and analytics. Natural-language questions are a user-friendly mode of tapping this wealth of knowledge and data. However, question answering technology does not work robustly in this setting as questions have to be translated into structured queries and users have to be careful in phrasing their questions. This paper advocates a new approach that allows questions to be partially translated into relaxed queries, covering the essential but not necessarily all aspects of the users input. To compensate for the omissions, we exploit textual sources associated with entities and relational facts. Our system translates user questions into an extended form of structured SPARQL queries, with text predicates attached to triple patterns. Our solution is based on a novel optimization model, cast into an integer linear program, for joint decomposition and disambiguation of the user question. We demonstrate the quality of our methods through experiments with the QALD benchmark.

empirical methods in natural language processing | 2014

ReNoun: Fact Extraction for Nominal Attributes

Mohamed Yahya; Steven Euijong Whang; Rahul Gupta; Alon Y. Halevy

Search engines are increasingly relying on large knowledge bases of facts to provide direct answers to users’ queries. However, the construction of these knowledge bases is largely manual and does not scale to the long and heavy tail of facts. Open information extraction tries to address this challenge, but typically assumes that facts are expressed with verb phrases, and therefore has had difficulty extracting facts for noun-based relations. We describe ReNoun, an open information extraction system that complements previous efforts by focusing on nominal attributes and on the long tail. ReNoun’s approach is based on leveraging a large ontology of noun attributes mined from a text corpus and from user queries. ReNoun creates a seed set of training data by using specialized patterns and requiring that the facts mention an attribute in the ontology. ReNoun then generalizes from this seed set to produce a much larger set of extractions that are then scored. We describe experiments that show that we extract facts with high precision and for attributes that cannot be extracted with verb-based techniques.

international world wide web conferences | 2012

Deep answers for naturally asked questions on the web of data

Mohamed Yahya; Klaus Berberich; Shady Elbassuoni; Maya Ramanath; Volker Tresp; Gerhard Weikum

We present DEANNA, a framework for natural language question answering over structured knowledge bases. Given a natural language question, DEANNA translates questions into a structured SPARQL query that can be evaluated over knowledge bases such as Yago, Dbpedia, Freebase, or other Linked Data sources. DEANNA analyzes questions and maps verbal phrases to relations and noun phrases to either individual entities or semantic classes. Importantly, it judiciously generates variables for target entities or classes to express joins between multiple triple patterns. We leverage the semantic type system for entities and use constraints in jointly mapping the constituents of the question to relations, classes, and entities. We demonstrate the capabilities and interface of DEANNA, which allows advanced users to influence the translation process and to see how the different components interact to produce the final result.

international world wide web conferences | 2015

Generating Quiz Questions from Knowledge Graphs

Dominic Seyler; Mohamed Yahya; Klaus Berberich

We propose an approach to generate natural language questions from knowledge graphs such as DBpedia and YAGO. We stage this in the setting of a quiz game. Our approach, though, is general enough to be applicable in other settings. Given a topic of interest (e.g., Soccer) and a difficulty (e.g., hard), our approach selects a query answer, generates a SPARQL query having the answer as its sole result, before verbalizing the question.

international world wide web conferences | 2017

Automated Template Generation for Question Answering over Knowledge Graphs

Abdalghani Abujabal; Mohamed Yahya; Mirek Riedewald; Gerhard Weikum

Templates are an important asset for question answering over knowledge graphs, simplifying the semantic parsing of input utterances and generating structured queries for interpretable answers. State-of-the-art methods rely on hand-crafted templates with limited coverage. This paper presents QUINT, a system that automatically learns utterance-query templates solely from user questions paired with their answers. Additionally, QUINT is able to harness language compositionality for answering complex questions without having any templates for the entire question. Experiments with different benchmarks demonstrate the high quality of QUINT.

international conference on the theory of information retrieval | 2017

Knowledge Questions from Knowledge Graphs

Dominic Seyler; Mohamed Yahya; Klaus Berberich

We address the problem of automatically generating quiz-style knowledge questions from a knowledge graph such as DBpedia. Questions of this kind have ample applications, for instance, to educate users about or to evaluate their knowledge in a specific domain. To solve the problem, we propose a novel end-to-end approach. The approach first selects a named entity from the knowledge graph as an answer. It then generates a structured triple-pattern query, which yields the answer as its sole result. If a multiple-choice question is desired, the approach selects alternative answer options as distractors. Finally, our approach uses a template-based method to verbalize the structured query and yield a natural language question. A key challenge is estimating how difficult the generated question is to human users. To do this, we make use of historical data from the Jeopardy! quiz show and a semantically annotated Web-scale document collection, engineer suitable features, and train a logistic regression classifier to predict question difficulty. Experiments demonstrate the viability of our overall approach.

international world wide web conferences | 2018

Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases

Abdalghani Abujabal; Rishiraj Saha Roy; Mohamed Yahya; Gerhard Weikum

Translating natural language questions to semantic representations such as SPARQL is a core challenge in open-domain question answering over knowledge bases (KB-QA). Existing methods rely on a clear separation between an offline training phase, where a model is learned, and an online phase where this model is deployed. Two major shortcomings of such methods are that (i) they require access to a large annotated training set that is not always readily available and (ii) they fail on questions from before-unseen domains. To overcome these limitations, this paper presents NEQA, a continuous learning paradigm for KB-QA. Offline, NEQA automatically learns templates mapping syntactic structures to semantic ones from a small number of training question-answer pairs. Once deployed, continuous learning is triggered on cases where templates are insufficient. Using a semantic similarity function between questions and by judicious invocation of non-expert user feedback, NEQA learns new templates that capture previously-unseen syntactic structures. This way, NEQA gradually extends its template repository. NEQA periodically re-trains its underlying models, allowing it to adapt to the language used after deployment. Our experiments demonstrate NEQAs viability, with steady improvement in answering quality over time, and the ability to answer questions from new domains.

web science | 2016

Automated question generation for quality control in human computation tasks

Dominic Seyler; Mohamed Yahya; Klaus Berberich; Omar Alonso

When running large human computation tasks in the real-world, honeypots play an important role for assessing the overall quality of the work produced. The generation of such honeypots can be a significant burden on the task owner as they require specific characteristics in their design and implementation and continuous maintenance when operating data pipelines that include a human computation component. In this extended abstract we outline a novel approach for creating honeypots using automatically generated questions from a reference knowledge base with the ability to control such parameters as topic and difficulty.

rules and rule markup languages for the semantic web | 2011

D2R2: disk-oriented deductive reasoning in a RISC-style RDF engine

Mohamed Yahya; Martin Theobald

Deductive reasoning lies in the expressive intersection of Datalog and Description Logics. In this paper, we present the D2R2 engine, which implements deductive reasoning capabilities based on the Query-Sub-Query (QSQR) algorithm on top of the disk-oriented RDF- 3X engine. D2R2 aims to bridge the gap between rule-oriented (intensional) reasoning with deduction rules and data-oriented (extensional) processing of large joins, over a set of highly tuned, disk-based index structures for large RDF collections. We present a generalization of QSQR, which allows for dynamic sub-query scheduling and chaining of extensional predicates into atomic join patterns--two key extensions for coupling QSQR with a disk-oriented storage backend. Experiments over a set of recursive queries and a very large knowledge base, consisting of 20 million RDF facts, as well as comparisons to disk-oriented reasoning engines, confirm the practical viability and significant runtime improvements of D2R2 compared to these engines.

international acm sigir conference on research and development in information retrieval | 2018

Auto-completion for Question Answering Systems at Bloomberg

Konstantine Arkoudas; Mohamed Yahya

The Bloomberg Terminal is the leading source of information and news in the finance industry. Through hundreds of functions that provide access to a vast wealth of structured and semi-structured data, the terminal is able to satisfy a wide range of information needs. Users can find what they need by constructing queries, plotting charts, creating alerts, and so on. Until recently, most queries to the terminal were constructed through dedicated GUIs. For instance, if users wanted to screen for technology companies that met certain criteria, they would specify the criteria by filling out a form via a sequence of interactions with GUI elements such as drop-down lists, checkboxes, radio and toggle buttons, etc. To facilitate information retrieval in the terminal, we are equipping it with the ability to understand and answer queries expressed in natural language. Our QA (question answering) systems map structurally complex questions like the above to a logical meaning representation which can then be translated to an executable query language (such as SQL or SPARQL). At that point we can execute the queries against a suitable back end, obtain the results, and present them to the users. Adding a natural-language interface to a data repository introduces usability challenges of its own, chief amongst them being this: How can the user know what the system can and cannot understand and answer (without needing to undergo extensive training)? We can unpack this question into two separate parts: 1) How can we convey the full range of the systems abilities? 2) How can we convey its limitations? We use auto-complete as a tool to help meet both challenges. Specifically, the first question pertains to the general issue of discoverability: We want at least some of the suggested completions to act as vehicles for discovering data and functionality of which users may have not been previously aware. The second question pertains to expectation management. Naturally, no QA system can attain perfect performance; limiting factors include representational shortcomings and various kinds of incompleteness of the underlying data sources, as well as NLP technology limitations. We want to stop generating completions as a signal indicating that we are not able to understand and/or answer what is being typed.

Explore More