Is this you? Create Your Porfile

Steven J. Lynden

National Institute of Advanced Industrial Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steven J. Lynden is active.

Explore More

Publication

Featured researches published by Steven J. Lynden.

international conference on move to meaningful internet systems | 2011

ADERIS: an adaptive query processor for joining federated SPARQL endpoints

Steven J. Lynden; Isao Kojima; Akiyoshi Matono; Yusuke Tanimura

Integrating distributed RDF data is facilitated by Linked Data and shared ontologies, however joins over distributed SPARQL services can be costly, time consuming operations. This paper describes the design and implementation of ADERIS, a query processing system for efficiently joining data from multiple distributed SPARQL endpoints. ADERIS decomposes federated SPARQL queries into multiple source queries and integrates the results utilising two techniques: adaptive join reordering, for which a cost model is defined, and the optimisation of subsequent queries to data sources to retrieve further data. The benefit of the approach in terms of minimising response time is illustrated by sample queries containing common SPARQL join patterns.

databases in networked information systems | 2010

Adaptive integration of distributed semantic web data

Steven J. Lynden; Isao Kojima; Akiyoshi Matono; Yusuke Tanimura

The use of RDF (Resource Description Framework) data is a cornerstone of the Semantic Web. RDF data embedded in Web pages may be indexed using semantic search engines, however, RDF data is often stored in databases, accessible via Web Services using the SPARQL query language for RDF, which form part of the Deep Web which is not accessible using search engines. This paper addresses the problem of effectively integrating RDF data stored in separate Web-accessible databases. An approach based on distributed query processing is described, where data from multiple repositories are used to construct partitioned tables that are integrated using an adaptive query processing technique supporting join reordering, which limits any reliance on statistics and metadata about SPARQL endpoints, as such information is often inaccurate or unavailable, but is required by existing systems supporting federated SPARQL queries. The approach presented extends existing approaches in this area by allowing tables to be added to the query plan while it is executing, and shows how an approach currently used within relational query processing can be applied to distributed SPARQL query processing. The approach is evaluated using a prototype implementation and potential applications are discussed.

international conference on data engineering | 2010

Extensions to the Pig data processing platform for scalable RDF data processing using Hadoop

Yusuke Tanimura; Akiyoshi Matono; Steven J. Lynden; Isao Kojima

In order to effectively handle the growing amount of available RDF data, a scalable and flexible RDF data processing framework is needed. We previously proposed a Hadoop-based framework, which takes advantages of scalable and fault-tolerant distributed processing technologies, originally proposed as Googles distributed file system and MapReduce parallel model. In this paper, we present a method extending the Pig data processing platform on top of the Hadoop infrastructure. Pig compiles programs written in a high level language, called Pig Latin, into MapReduce programs that can be executed by Hadoop. In order to support RDF, Pig was extended with the ability to load and store RDF data efficiently. Furthermore, as reasoning is an important requirement for most systems storing RDF data, support for inferring new triples using entailment rules was also added. In this paper, we describe these extensions and present an evaluation of their performance.

grid computing | 2008

Service-based data integration using OGSA-DQP and OGSA-WebDB

Steven J. Lynden; Said Mirza Pahlevi; Isao Kojima

OGSA-DQP is a service-based distributed query processor that is able to execute queries over data services and combine data integration with data analysis by invoking Web services. OGSA-DQP currently supports only one type of data source, relational databases wrapped using OGSA-DAI (a middleware tool that exposes XML or relational database management systems as Grid services). OGSA-WebDB is another middleware tool based on OGSA-DAI that exposes Web databases via the OGSA-DAI interface. The prevalence of XML encoded data and Web-accessible resources means that it is desirable to extend the current functionality of OGSA-DQP to provide support for such resources. This paper presents an extension to OGSA-DQP that allow queries over relational, XML and Web databases wrapped by OGSA-DAI and OGSA-WebDB. An application is presented that illustrates how these features complement each other to provide data integration and analysis in service-based Grids. Experimental results are presented that investigate the benefit of the approach within the application.

ieee international conference on cloud computing technology and science | 2011

Dynamic Data Redistribution for MapReduce Joins

Steven J. Lynden; Yusuke Tanimura; Isao Kojima; Akiyoshi Matono

MapReduce has become a popular method for data processing, in particular for large scale datasets, due to its accessibility as a scalable yet convenient programming paradigm. Data processing tasks often involve joins, and the repartition and fragment-replicate joins are two widely-used join algorithms utilised within the MapReduce framework. This paper presents a multi-join supporting tuple redistribution, building on both the repartition and fragment-replicate joins. Hadoop is used to demonstrate how reduce tasks may improve performance by passing intermediate results to other reduce tasks that are better able to process them using Apache ZooKeeper as a means of communication and data transfer. A performance analysis is presented showing the technique has the potential to reduce response times when processing multiple joins in single MapReduce jobs.

Grid and Cloud Database Management | 2011

Open Standards for Service-Based Database Access and Integration

Steven J. Lynden; Oscar Corcho; Isao Kojima; Mario Antonioletti; Carlos Buil-Aranda

The Database Access and Integration Services (DAIS) Working Group, working within the Open Grid Forum (OGF), has developed a set of data access and integration standards for distributed environments. These standards provide a set of uniform web service-based interfaces for data access. A core specification, WS-DAI, exposes and, in part, manages data resources exposed by DAIS-based services. The WS-DAI document defines a core set of access patterns, messages and properties that form a collection of generic high-level data access interfaces. WS-DAI is then extended by other specifications that specialize access for specific types of data. For example, WS-DAIR extends the WS-DAI specification with interfaces targeting relational data. Similar extensions exist for RDF and XML data. This chapter presents an overview of the specifications, the motivation for defining them and their relationships with other OGF and non-OGF standards. Current implementations of the specifications are described in addition to some existing and potential applications to highlight how this work can benefit web service-based architectures used in Grid and Cloud computing.The Database Access and Integration Services (DAIS) Working Group, working within the Open Grid Forum (OGF), has developed a set of data access and integration standards for distributed environments. These standards provide a set of uniform web service-based interfaces for data access. A core specification, WS-DAI, exposes and, in part, manages data resources exposed by DAIS-based services. The WS-DAI document defines a core set of access patterns, messages and properties that form a collection of generic high-level data access interfaces. WS-DAI is then extended by other specifications that specialize access for specific types of data. For example, WS-DAIR extends the WS-DAI specification with interfaces targeting relational data. Similar extensions exist for RDF and XML data. This chapter presents an overview of the specifications, the motivation for defining them and their relationships with other OGF and non-OGF standards. Current implementations of the specifications are described in addition to some existing and potential applications to highlight how this work can benefit web service-based architectures used in Grid and Cloud computing.

database systems for advanced applications | 2010

ADERIS: adaptively integrating RDF data from SPARQL endpoints

Steven J. Lynden; Isao Kojima; Akiyoshi Matono; Yusuke Tanimura

This paper describes the Adaptive Distributed Endpoint RDF Integration System (ADERIS), an adaptive, distributed query processor for integrating RDF data from multiple data resources supporting the SPARQL query language and protocol. The system allows a user to issue a federated query without any knowledge of the data contained in each endpoint and without specifying details of how the query should be executed. ADERIS relies on very limited information about each RDF data source to construct SPARQL source queries, the results of which are used to construct RDF predicate tables, which are integrated using pipelined index nested loop joins, the number and order of which may vary during query execution in order to reduce response time.

conference on information and knowledge management | 2017

Exploring the Veracity of Online Claims with BackDrop

Julien Leblay; Weiling Chen; Steven J. Lynden

Using the Web to assess the validity of claims presents many challenges. Whether the data comes from social networks or established media outlets, individual or institutional data publishers, one has to deal with scale and heterogeneity, as well as with incomplete, imprecise and sometimes outright false information. All of these are closely studied issues. Yet in many situations, the claims under scrutiny, and the data itself, have some inherent context-dependency making them impossible to completely disprove, or evaluate through a simple (e.g. scalar) measure. While data models used on the Web typically deal with universal knowledge, we believe the time has come to put context, such as time or provenance, at the forefront and watch knowledge through multiple lenses. We present BackDrop, an application that enables annotating knowledge and ontologies found online to explore how the veracity of claims varies with context. BackDrop comes in the form of a Web interface, in which users can interactively populate and annotate knowledge bases, and explore under which circumstances certain claims are more or less credible.

pacific rim international conference on artificial intelligence | 2018

Network Embedding Based on a Quasi-Local Similarity Measure

Xin Liu; Natthawut Kertkeidkachorn; Tsuyoshi Murata; Kyoung-Sook Kim; Julien Leblay; Steven J. Lynden

Network embedding based on the random walk and skip-gram model such as the DeepWalk and Node2Vec algorithms have received wide attention. We identify that these algorithms essentially estimate the node similarities by random walk simulation, which is unreliable, inefficient, and inflexible. We propose to explicitly use node similarity measures instead of random walk simulation. Based on this strategy and a new proposed similarity measure, we present a fast and scalable algorithm AA\(^{+}\)Emb. Experiments show that AA\(^{+}\)Emb outperforms state-of-the-art network embedding algorithms on several commonly used benchmark networks.

web intelligence, mining and semantics | 2017

Analysis of semantic URLs to support automated linking of structured data on the web

Steven J. Lynden

A growing amount of structured data can be found embedded in web pages using formats such as RDFa, JSON-LD and Microdata. Although such data is indexed by search engines and sometimes replicated in centralised knowledge bases, application scenarios exist in which there is a need to discover such data on-the-fly, for example when using the follow-your-nose principle of accessing Linked Open Data, or in applications where the velocity at which data changes can result in centralised repositories being out of date. In this paper we demonstrate two complementary techniques for aiding such applications by analysing URLs. Firstly, we demonstrate that machine learning can be of benefit in predicting, from previously encountered URLs, the likelihood of encountering structured data in an unseen URL. This can be applied within applications that encounter large number of possible URLs to dereference, and must implement some priority scheme to choose relevant URLs. Secondly, we demonstrate that association rule mining can be of use in linking existing resources in a knowledge base, such as DBpedia, to URLs that follow common schemes, such as Semantic (search engine friendly) URLs.

Explore More