Mohamed Ahmed Sherif
Leipzig University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mohamed Ahmed Sherif.
international conference on semantic systems | 2013
Amrapali Zaveri; Dimitris Kontokostas; Mohamed Ahmed Sherif; Lorenz Bühmann; Mohamed Morsey; Sören Auer; Jens Lehmann
Linked Open Data (LOD) comprises of an unprecedented volume of structured datasets on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowdsourced and even extracted data of relatively low quality. We present a methodology for assessing the quality of linked data resources, which comprises of a manual and a semi-automatic process. The first phase includes the detection of common quality problems and their representation in a quality problem taxonomy. In the manual process, the second phase comprises of the evaluation of a large number of individual resources, according to the quality problem taxonomy via crowdsourcing. This process is accompanied by a tool wherein a user assesses an individual resource and evaluates each fact for correctness. The semi-automatic process involves the generation and verification of schema axioms. We report the results obtained by applying this methodology to DBpedia. We identified 17 data quality problem types and 58 users assessed a total of 521 resources. Overall, 11.93% of the evaluated DBpedia triples were identified to have some quality issues. Applying the semi-automatic component yielded a total of 222,982 triples that have a high probability to be incorrect. In particular, we found that problems such as object values being incorrectly extracted, irrelevant extraction of information and broken links were the most recurring quality problems. With this study, we not only aim to assess the quality of this sample of DBpedia resources but also adopt an agile methodology to improve the quality in future versions by regularly providing feedback to the DBpedia maintainers.
Sprachwissenschaft | 2015
Mohamed Ahmed Sherif; Axel-Cyrille Ngonga Ngomo; Sebastian Hellmann; Steven Moran; Martin Brümmer; John P. McCrae
In this paper we describe the Semantic Quran dataset, a multilingual RDF representation of translations of the Quran. The dataset was created by integrating data from two different semi-structured sources and aligned to an ontology designed to represent multilingual data from sources with a hierarchical structure. The resulting RDF data encompasses 43 different languages which belong to the most under-represented languages in the Linked Data Cloud, including Arabic, Amharic and Amazigh. We designed the dataset to be easily usable in natural-language processing applications with the goal of facilitating the development of knowledge extraction tools for these languages. In particular, the Semantic Quran is compatible with the Natural-Language Interchange Format and contains explicit morpho-syntactic information on the utilized terms. We present the ontology devised for structuring the data. We also provide the transformation rules implemented in our extraction framework. Finally, we detail the link creation process as well as possible usage scenarios for the Semantic Quran dataset.
european semantic web conference | 2014
Axel-Cyrille Ngonga Ngomo; Mohamed Ahmed Sherif; Klaus Lyko
The Linked Data Web has developed into a compendium of partly very large datasets. Devising efficient approaches to compute links between these datasets is thus central to achieve the vision behind the Data Web. Unsupervised approaches to achieve this goal have emerged over the last few years. Yet, so far, none of these unsupervised approaches makes use of the replication of resources across several knowledge bases to improve the accuracy it achieves while linking. In this paper, we present Colibri, an iterative unsupervised approach for link discovery. Colibri allows the discovery of links between n datasets (n ≥ 2) while improving the quality of the instance data in these datasets. To this end, Colibri combines error detection and correction with unsupervised link discovery. We evaluate our approach on five benchmark datasets with respect to the F-score it achieves. Our results suggest that Colibri can significantly improve the results of unsupervised machine-learning approaches for link discovery while correctly detecting erroneous resources.
Semantic Web archive | 2013
Amrapali Zaveri; Jens Lehmann; Sören Auer; Mofeed M. Hassan; Mohamed Ahmed Sherif; Michael Martin
The improvement of public health is one of the main indicators for societal progress. Statistical data for monitoring public health is highly relevant for a number of sectors, such as research e.g. in the life sciences or economy, policy making, health care, pharmaceutical industry, insurances etc. Such data is meanwhile available even on a global scale, e.g. in the Global Health Observatory GHO of the United Nationss World Health Organization WHO. GHO comprises more than 50 different datasets, it covers all 198 WHO member countries and is updated as more recent or revised data becomes available or when there are changes to the methodology being used. However, this data is only accessible via complex spreadsheets and, therefore, queries over the 50 different datasets as well as combinations with other datasets are very tedious and require a significant amount of manual work. By making the data available as RDF, we lower the barrier for data re-use and integration. In this article, we describe the conversion and publication process as well as use cases, which can be implemented using the GHO data.
european semantic web conference | 2015
Mohamed Ahmed Sherif; Axel-Cyrille Ngonga Ngomo; Jens Lehmann
With the adoption of RDF across several domains, come growing requirements pertaining to the completeness and quality of RDF datasets. Currently, this problem is most commonly addressed by manually devising means of enriching an input dataset. The few tools that aim at supporting this endeavour usually focus on supporting the manual definition of enrichment pipelines. In this paper, we present a supervised learning approach based on a refinement operator for enriching RDF datasets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against eight manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples.
Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) on | 2014
Suresh Pokharel; Mohamed Ahmed Sherif; Jens Lehmann
Is widely accepted that food supply and quality are major problems in the 21st century. Due to the growth of the worlds population, there is a pressing need to improve the productivity of agricultural crops, which hinges on different factors such as geographical location, soil type, weather condition and particular attributes of the crops to plant. In many regions of the world, information about those factors is not readily accessible and dispersed across a multitude of different sources. One of those regions is Nepal, in which the lack of access to this knowledge poses a significant burden for agricultural planning and decision making. Making such knowledge more accessible can boot up a farmers living standard and increase their competitiveness on national and global markets. In this article, we show how we converted several available, although not easily accessible, datasets to RDF, thereby lowering the barrier for data re-usage and integration. We describe the conversion, linking, and publication process as well as use cases, which can be implemented using the farming datasets in Nepal.
european semantic web conference | 2017
Mohamed Ahmed Sherif; Axel-Cyrille Ngonga Ngomo; Jens Lehmann
A significant portion of the evolution of Linked Data datasets lies in updating the links to other datasets. An important challenge when aiming to update these links automatically under the open-world assumption is the fact that usually only positive examples for the links exist. We address this challenge by presenting and evaluating Wombat, a novel approach for the discovery of links between knowledge bases that relies exclusively on positive examples. Wombat is based on generalisation via an upward refinement operator to traverse the space of link specification. We study the theoretical characteristics of Wombat and evaluate it on 8 different benchmark datasets. Our evaluation suggests that Wombat outperforms state-of-the-art supervised approaches while relying on less information. Moreover, our evaluation suggests that Wombat’s pruning algorithm allows it to scale well even on large datasets.
international conference on semantic systems | 2015
Mohamed Ahmed Sherif; Axel-Cyrille Ngonga Ngomo
Many of the available RDF datasets describe millions of resources by using billions of triples. Consequently, millions of links can potentially exist among such datasets. While parallel implementations of link discovery approaches have been developed in the past, load balancing approaches for local implementations of link discovery algorithms have been paid little attention to. In this paper, we thus present a novel load balancing technique for link discovery on parallel hardware based on particle-swarm optimization. We combine this approach with the Orchid algorithm for geo-spatial linking and evaluate it on real and artificial datasets. Our evaluation suggests that while naïve approaches can be super-linear on small data sets, our deterministic particle swarm optimization outperforms both naïve and classical load balancing approaches such as greedy load balancing on large datasets.
Sprachwissenschaft | 2017
Mohamed Ahmed Sherif; Axel-Cyrille Ngonga Ngomo
Large amounts of geo-spatial information have been made available with the growth of the Web of Data. While discovering links between resources on the Web of Data has been shown to be a demanding task, discovering links between geo-spatial resources proves to be even more challenging. This is partly due to the resources being described by the means of vector geometry. Especially, discrepancies in granularity and error measurements across data sets render the selection of appropriate distance measures for geo-spatial resources difficult. In this paper, we survey existing literature for point-set measures that can be used to measure the similarity of vector geometries. We then present and evaluate the ten measures that we derived from literature. We evaluate these measures with respect to their time-efficiency and their robustness against discrepancies in measurement and in granularity. To this end, we use samples of real data sets of different granularity as input for our evaluation framework. The results obtained on three different data sets suggest that most distance approaches can be led to scale. Moreover, while some distance measures are significantly slower than other measures, distance measure based on means, surjections and sums of minimal distances are robust against the different types of discrepancies.
Procedia Computer Science | 2018
Abdullah Fathi Ahmed; Mohamed Ahmed Sherif; Axel-Cyrille Ngonga Ngomo
Abstract Link discovery is central to the integration and use of data across RDF knowledge bases. Geospatial information is increasingly represented according to the Linked Data principles. Resources within such datasets are described by means of vector geometry, where link discovery approaches have to deal with millions of point sets consisting of billions of points. In this paper, we study the effect of simplifying the resources’ geometries on runtime and F-measure of link discovery approaches. In particular, we evaluate link discovery approaches for computing the point-set distances as well as the topological relations among RDF resources with geospatial representation. The results obtained on two different real datasets suggest that most geospatial link discovery approaches achieve up to 67x speedup using simplification, while the average loss in their F-measure is less than 15%. Our implementation is open-source and available at http://github.com/dice-group/limes .