Is this you? Create Your Porfile

Marco Stolpe

Technical University of Dortmund

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marco Stolpe is active.

Explore More

Publication

Featured researches published by Marco Stolpe.

european conference on machine learning | 2011

Learning from label proportions by optimizing cluster model selection

Marco Stolpe; Katharina Morik

In a supervised learning scenario, we learn a mapping from input to output values, based on labeled examples. Can we learn such a mapping also from groups of unlabeled observations, only knowing, for each group, the proportion of observations with a particular label? Solutions have real world applications. Here, we consider groups of steel sticks as samples in quality control. Since the steel sticks cannot be marked individually, for each group of sticks it is only known how many sticks of high (low) quality it contains. We want to predict the achieved quality for each stick before it reaches the final production station and quality control, in order to save resources. We define the problem of learning from label proportions and present a solution based on clustering. Our method empirically shows a better prediction performance than recent approaches based on probabilistic SVMs, Kernel k-Means or conditional exponential models.

Sigkdd Explorations | 2016

The Internet of Things: Opportunities and Challenges for Distributed Data Analysis

Marco Stolpe

Nowadays, data is created by humans as well as automatically collected by physical things, which embed electronics, software, sensors and network connectivity. Together, these entities constitute the Internet of Things (IoT). The automated analysis of its data can provide insights into previously unknown relationships between things, their environment and their users, facilitating an optimization of their behavior. Especially the real-time analysis of data, embedded into physical systems, can enable new forms of autonomous control. These in turn may lead to more sustainable applications, reducing waste and saving resources IoTs distributed and dynamic nature, resource constraints of sensors and embedded devices as well as the amounts of generated data are challenging even the most advanced automated data analysis methods known today. In particular, the IoT requires a new generation of distributed analysis methods. Many existing surveys have strongly focused on the centralization of data in the cloud and big data analysis, which follows the paradigm of parallel high-performance computing. However, bandwidth and energy can be too limited for the transmission of raw data, or it is prohibited due to privacy constraints. Such communication-constrained scenarios require decentralized analysis algorithms which at least partly work directly on the generating devices. After listing data-driven IoT applications, in contrast to existing surveys, we highlight the differences between cloudbased and decentralized analysis from an algorithmic perspective. We present the opportunities and challenges of research on communication-efficient decentralized analysis algorithms. Here, the focus is on the difficult scenario of vertically partitioned data, which covers common IoT use cases. The comprehensive bibliography aims at providing readers with a good starting point for their own work

Managing and Mining Sensor Data | 2013

Distributed Data Mining in Sensor Networks

Kanishka Bhaduri; Marco Stolpe

Wireless sensor networks (WSNs) consist of a collection of low cost and low powered sensor devices capable of communicating with each other via an ad-hoc wireless network. Due to their rapid proliferation, sensor networks are currently used in a plethora of applications such as earth sciences, systems health, military applications etc. These sensors collect the data about the environment and this data can be mined for a variety of analysis. Unfortunately, post analysis of the data extracted from the WSN incurs high sensor communication cost for sending the raw data to the base station and at the same time runs the risk of delayed analysis. To overcome this, researchers have proposed several distributed algorithms which can deal with the data in situ – these data mining algorithms utilize the computing power at each node to first do some local computations and then exchange messages with its neighbors to come to a consensus regarding a global model. These algorithms reduce the communication cost vastly and also are extremely efficient in terms of model computation and event detection. In this chapter we focus on such distributed data mining algorithms for data clustering, classification and outlier detection tasks.

european conference on machine learning | 2012

Separable approximate optimization of support vector machines for distributed sensing

Sangkyun Lee; Marco Stolpe; Katharina Morik

Sensor measurements from diverse locations connected with possibly low bandwidth communication channels pose a challenge of resource-restricted distributed data analyses. In such settings it would be desirable to perform learning in each location as much as possible, without transferring all data to a central node. Applying the support vector machines (SVMs) with nonlinear kernels becomes nontrivial, however. In this paper, we present an efficient optimization scheme for training SVMs over such sensor networks. Our framework performs optimization independently in each node, using only the local features stored in the respective node. We make use of multiple local kernels and explicit approximations to the feature mappings induced by them. Together they allow us constructing a separable surrogate objective that provides an upper bound of the primal SVM objective. A central coordination is also designed to adjust the weights among local kernels for improved prediction, while minimizing communication cost.

european conference on machine learning | 2013

Anomaly detection in vertically partitioned data by distributed core vector machines

Marco Stolpe; Kanishka Bhaduri; Kamalika Das; Katharina Morik

Observations of physical processes suffer from instrument malfunction and noise and demand data cleansing. However, rare events are not to be excluded from modeling, since they can be the most interesting findings. Often, sensors collect features at different sites, so that only a subset is present (vertically distributed data). Transferring all data or a sample to a single location is impossible in many real-world applications due to restricted bandwidth of communication. Finding interesting abnormalities thus requires efficient methods of distributed anomaly detection. We propose a new algorithm for anomaly detection on vertically distributed data. It aggregates the data directly at the local storage nodes using RBF kernels. Only a fraction of the data is communicated to a central node. Through extensive empirical evaluation on controlled datasets, we demonstrate that our method is an order of magnitude more communication efficient than state of the art methods, achieving a comparable accuracy.

MSM'10/MUSE'10 Proceedings of the 2010 international conference on Analysis of social media and ubiquitous data | 2010

Towards adjusting mobile devices to user's behaviour

Peter Fricke; Felix Jungermann; Katharina Morik; Nico Piatkowski; Olaf Spinczyk; Marco Stolpe; Jochen Streicher

Mobile devices are a special class of resource-constrained embedded devices. Computing power, memory, the available energy, and network bandwidth are often severely limited. These constrained resources require extensive optimization of a mobile system compared to larger systems. Any needless operation has to be avoided. Timeconsuming operations have to be started early on. For instance, loading files ideally starts before the user wants to access the file. So-called prefetching strategies optimize systems operation. Our goal is to adjust such strategies on the basis of logged system data. Optimization is then achieved by predicting an applications behavior based on facts learned from earlier runs on the same system. In this paper, we analyze system-calls on operating system level and compare two paradigms, namely server-based and device-based learning. The results could be used to optimize the runtime behaviour of mobile devices.

Archive | 2012

Challenges for Data Mining on Sensor Data of Interlinked Processes

Jochen Deuse; Benedikt Konrad; Daniel Lieber; Katharina Morik; Marco Stolpe

In industries like steel production, interlinked production processes leave no time for assessing the physical quality of intermediate products. Failures during the process can lead to high internal costs when already defective products are passed through the entire value chain. However, process data like machine parameters and sensor data which are directly linked to quality can be recorded. Based on a rolling mill case study, the paper discusses how decentralized data mining and intelligent machine-to-machine communication could be used to predict the physical quality of intermediate products online and in real-time for detecting quality issues as early as possible. The recording of huge data masses and the distributed but sequential nature of the problem lead to challenging research questions for the next generation of

Archive | 2012

Sustainable Interlinked Manufacturing Processes through Real-Time Quality Prediction

Daniel Lieber; Benedikt Konrad; Jochen Deuse; Marco Stolpe; Katharina Morik

Based on a rolling mill case study, this paper discusses how data mining techniques and intelligent machine-to-machine telematics could be used to predict internal quality issues of intermediate products in manufacturing processes. The huge amount of data recorded during processing and the distributed but sequential nature of the manufacturing lead to challenging questions for data mining applications and advanced process control approaches in industries like steel production. Moreover, the discovery for hidden information, knowledge and dependencies in the process data contribute significantly to support avoiding waste of resources and achieving the objectives of zero-defect-production, sustainable and energy-efficient manufacturing processes.

Archive | 2013

Using a Clustering Approach with Evolutionary Optimized Attribute Weights to Form Product Families for Production Leveling

Fabian Bohnen; Marco Stolpe; Jochen Deuse; Katharina Morik

Production leveling aims at balancing production volume as well as production mix. Conventional leveling approaches require limited product diversity and stable, predictable customer demands. They are well-suited only for large scale production. This paper presents a methodology that enables the leveling of low volume and high mix production. It is based on two fundamental steps. In the first step, which is focused on in this paper, product types are grouped into families according to their manufacturing similarity. In the second step, a family-oriented leveling pattern is generated. This paper presents an innovative clustering approach for product family formation regarding leveling. It employs evolutionary strategies to optimize the weights of the attributes which are used for clustering according to their impact on the grouping result. The paper refers to an industrial application and also shows how product families can be utilized for leveling.

Computational Sustainability | 2016

Sustainable Industrial Processes by Embedded Real-Time Quality Prediction

Marco Stolpe; Hendrik Blom; Katharina Morik

Sustainability of industrial production focuses on minimizing gas house emissions and the consumption of materials and energy. The iron and steel production offers an enormous potential for resource savings through production enhancements. This chapter describes how embedding data analysis (data mining, machine learning) enhances steel production such that resources are saved. The steps of embedded data analysis are comprehensively presented giving an overview of related work. The challenges of (steel) production for data analysis are investigated. A framework for processing data streams is used for real-time processing. We have developed new algorithms that learn from aggregated data and from vertically distributed data. Two real-world case studies are described: the prediction of the Basic Oxygen Furnace endpoint and the quality prediction in a hot rolling mill process. Both case studies are not academic prototypes, but truly real-world applications.

Explore More