Is this you? Create Your Porfile

Tian Guo

École Polytechnique Fédérale de Lausanne

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tian Guo is active.

Explore More

Publication

Featured researches published by Tian Guo.

Big Data Research | 2014

Efficient Indexing and Query Processing of Model-View Sensor Data in the Cloud

Tian Guo; Thanasis G. Papaioannou; Karl Aberer

As the number of sensors that pervade our lives increases (e.g., environmental sensors, phone sensors, etc.), the efficient management of massive amount of sensor data is becoming increasingly important. The infinite nature of sensor data poses a serious challenge for query processing even in a cloud infrastructure. Traditional raw sensor data management systems based on relational databases lack scalability to accommodate large-scale sensor data efficiently. Thus, distributed key-value stores in the cloud are becoming a prime tool to manage sensor data. Model-view sensor data management, which stores the sensor data in the form of modeled segments, brings the additional advantages of data compression and value interpolation. However, currently there are no techniques for indexing and/or query optimization of the model-view sensor data in the cloud; full table scan is needed for query processing in the worst case. In this paper, we propose an innovative index for modeled segments in key-value stores, namely KVI-index. KVI-index consists of two interval indices on the time and sensor value dimensions respectively, each of which has an in-memory search tree and a secondary list materialized in the key-value store. Then, we introduce a KVI-index-Scan-MapReduce hybrid approach to perform efficient query processing upon modeled data streams. As proved by a series of experiments at a private cloud infrastructure, our approach outperforms in query-response time and index-updating efficiency both Hadoop-based parallel processing of the raw sensor data and multiple alternative indexing approaches of model-view data.

conference on information and knowledge management | 2015

Fast Distributed Correlation Discovery Over Streaming Time-Series Data

Tian Guo; Saket Sathe; Karl Aberer

The dramatic rise of time-series data in a variety of contexts, such as social networks, mobile sensing, data centre monitoring, etc., has fuelled interest in obtaining real-time insights from such data using distributed stream processing systems. One such extremely valuable insight is the discovery of correlations in real-time from large-scale time-series data. A key challenge in discovering correlations is that the number of time-series pairs that have to be analyzed grows quadratically in the number of time-series, giving rise to a quadratic increase in both computation cost and communication cost between the cluster nodes in a distributed environment. To tackle the challenge, we propose a framework called AEGIS. AEGIS exploits well-established statistical properties to dramatically prune the number of time-series pairs that have to be evaluated for detecting interesting correlations. Our extensive experimental evaluations on real and synthetic datasets establish the efficacy of AEGIS over baselines.

international joint conference on artificial intelligence | 2017

Hybrid Neural Networks for Learning the Trend in Time Series

Tao Lin; Tian Guo; Karl Aberer

Trend of time series characterizes the intermediate upward and downward behaviour of time series. Learning and forecasting the trend in time series data play an important role in many real applications, ranging from resource allocation in data centers, load schedule in smart grid, and so on. Inspired by the recent successes of neural networks, in this paper we propose TreNet, a novel end-toend hybrid neural network to learn local and global contextual features for predicting the trend of time series. TreNet leverages convolutional neural networks (CNNs) to extract salient features from local raw data of time series. Meanwhile, considering the long-range dependency existing in the sequence of historical trends, TreNet uses a long-short term memory recurrent neural network (LSTM) to capture such dependency. Then, a feature fusion layer is to learn joint representation for predicting the trend. TreNet demonstrates its effectiveness by outperforming CNN, LSTM, the cascade of CNN and LSTM, Hidden Markov Model based method and various kernel based baselines on real datasets.

database systems for advanced applications | 2014

Online Indexing and Distributed Querying Model-View Sensor Data in the Cloud

Tian Guo; Thanasis G. Papaioannou; Hao Zhuang; Karl Aberer

As various kinds of sensors penetrate our daily life (e.g., sensor networks for environmental monitoring), the efficient management of massive amount of sensor data becomes increasingly important at present. Traditional sensor data management systems based on relational database lack scalability to accommodate large-scale sensor data efficiently. Consequently, distributed key-value stores in the cloud is becoming the prime tool to manage sensor data. Meanwhile, model-view sensor data management stores the sensor data in the form of modelled segments. However, currently there is no index and query optimizations upon the modelled segments in the cloud, which results in full table scan in the worst case. In this paper, we propose an innovative model index for sensor data segments in key-value stores (KVM-index). KVM-index consists of two interval indices on the time and sensor value dimensions respectively, each of which has an in-memory search tree and a secondary list materialized in the key-value store. This composite structure enables to update new incoming sensor data segments with constant network I/O. Second, for time (or value)-range and point queries a MapReduce-based approach is designed to process the discrete predicate-related ranges of KVM-index, thereby eliminating computation and communication overheads incurred by accessing irrelevant parts of the index table in conventional MapReduce programs. Finally, we propose a cost based adaptive strategy for the KVM-index-MapReduce framework to process composite queries. As proved by extensive experiments, our approach outperforms in query response time both MapReduce-based processing of the raw sensor data and multiple alternative approaches of model-view sensor data.

international conference on big data | 2015

SigCO: Mining significant correlations via a distributed real-time computation engine

Tian Guo; Jean-Paul Calbimonte; Hao Zhuang; Karl Aberer

The dramatic rise of time-series data produced in a variety of contexts, such as stock markets, mobile sensing, sensor networks, data centre monitoring, etc., has fuelled the development of large-scale distributed real-time computation systems (e.g., Apache Storm, Samza, Spark Streaming, S4, etc.). However, it is still unclear how certain time series mining tasks could be performed using such new emerging systems. In this paper, we focus on the task of efficiently discovering statistically significant correlations among a large number of time series via a distributed realtime computation engine. We propose a framework referred to as SigCO. In SigCO, we put forward a novel partition-aware data shuffling, which is able to adaptively shuffle time series data only to the relevant nodes of the distributed real-time computation engine. On the other hand, in SigCO we design a δ-hypercube structure based correlation computation approach which is capable of pruning unnecessary correlation computations. Finally, our extensive experimental evaluations on real and synthetic datasets establish that SigCO outperforms the baseline approaches in terms of diverse performance metrics.

international conference on data mining | 2016

Distributed Mining and Modeling of Dynamic Lead-Lag Relations in Evolving Entities

Tian Guo; Jean-Paul Calbimonte; Karl Aberer

Discovering and modeling lead-lag relations is a critical task in a variety of domains, including energy management, financial markets and environment monitoring. This task becomes more challenging when processing massive and highly dynamic data sources, often produced by sensors and live feeds that collect data about evolving entities in the real world. To cope with this data volume and velocity, distributed real-time computation systems have been proposed in the last years, although the problem of the lead-lag relation mining and modeling has not been deeply explored in this context. In this paper, we propose DL2-Miner, a novel distributed data mining framework for lead-lag relations based on this computational paradigm. DL2-Miner addresses the fundamental data mining task of uncovering interactions in evolving entities, and encompasses a lead-lag relation detection module with communication and computation optimization and a probabilistic model for lead-lag relation occurrence inference. It is implemented on top of the open source distributed real-time computation system Apache Storm, and preliminary experiments show promising results of our approach.

european conference on machine learning | 2016

Efficient Distributed Decision Trees for Robust Regression

Tian Guo; Konstantin Kutzkov; Mohammed Ahmed; Jean-Paul Calbimonte; Karl Aberer

The availability of massive volumes of data and recent advances in data collection and processing platforms have motivated the development of distributed machine learning algorithms. In numerous real-world applications large datasets are inevitably noisy and contain outliers. These outliers can dramatically degrade the performance of standard machine learning approaches such as regression trees. To this end, we present a novel distributed regression tree approach that utilizes robust regression statistics, statistics that are more robust to outliers, for handling large and noisy data. We propose to integrate robust statistics based error criteria into the regression tree. A data summarization method is developed and used to improve the efficiency of learning regression trees in the distributed setting. We implemented the proposed approach and baselines based on Apache Spark, a popular distributed data processing platform. Extensive experiments on both synthetic and real datasets verify the effectiveness and efficiency of our approach. The data and software related to this paper are available at https://github.com/weilai0980/DRSquare_tree/tree/master/.

data engineering for wireless and mobile access | 2012