Malu Castellanos | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Malu Castellanos is active.

Explore More

Publication

Featured researches published by Malu Castellanos.

business process management | 2012

Process Mining Manifesto

Wil M. P. van der Aalst; A Arya Adriansyah; Ana Karla Alves de Medeiros; Franco Arcieri; Thomas Baier; Tobias Blickle; R. P. Jagadeesh Chandra Bose; Peter van den Brand; Ronald Brandtjen; Joos C. A. M. Buijs; Andrea Burattin; Josep Carmona; Malu Castellanos; Jan Claes; Jonathan E. Cook; Nicola Costantini; Francisco Curbera; Ernesto Damiani; Massimiliano de Leoni; Pavlos Delias; Boudewijn F. van Dongen; Marlon Dumas; Schahram Dustdar; Dirk Fahland; Diogo R. Ferreira; Walid Gaaloul; Frank van Geffen; Sukriti Goel; Cw Christian Günther; Antonella Guzzo

Process mining techniques are able to extract knowledge from event logs commonly available in today’s information systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application domains. There are two main drivers for the growing interest in process mining. On the one hand, more and more events are being recorded, thus, providing detailed information about the history of processes. On the other hand, there is a need to improve and support business processes in competitive and rapidly changing environments. This manifesto is created by the IEEE Task Force on Process Mining and aims to promote the topic of process mining. Moreover, by defining a set of guiding principles and listing important challenges, this manifesto hopes to serve as a guide for software developers, scientists, consultants, business managers, and end-users. The goal is to increase the maturity of process mining as a new tool to improve the (re)design, control, and support of operational business processes.

knowledge discovery and data mining | 2013

Spotting opinion spammers using behavioral footprints

Arjun Mukherjee; Abhinav Kumar; Bing Liu; Junhui Wang; Meichun Hsu; Malu Castellanos; Riddhiman Ghosh

Opinionated social media such as product reviews are now widely used by individuals and organizations for their decision making. However, due to the reason of profit or fame, people try to game the system by opinion spamming (e.g., writing fake reviews) to promote or to demote some target products. In recent years, fake review detection has attracted significant attention from both the business and research communities. However, due to the difficulty of human labeling needed for supervised learning and evaluation, the problem remains to be highly challenging. This work proposes a novel angle to the problem by modeling spamicity as latent. An unsupervised model, called Author Spamicity Model (ASM), is proposed. It works in the Bayesian setting, which facilitates modeling spamicity of authors as latent and allows us to exploit various observed behavioral footprints of reviewers. The intuition is that opinion spammers have different behavioral distributions than non-spammers. This creates a distributional divergence between the latent population distributions of two clusters: spammers and non-spammers. Model inference results in learning the population distributions of the two clusters. Several extensions of ASM are also considered leveraging from different priors. Experiments on a real-life Amazon review dataset demonstrate the effectiveness of the proposed models which significantly outperform the state-of-the-art competitors.

extending database technology | 2009

Data integration flows for business intelligence

Umeshwar Dayal; Malu Castellanos; Alkis Simitsis; Kevin Wilkinson

Business Intelligence (BI) refers to technologies, tools, and practices for collecting, integrating, analyzing, and presenting large volumes of information to enable better decision making. Todays BI architecture typically consists of a data warehouse (or one or more data marts), which consolidates data from several operational databases, and serves a variety of front-end querying, reporting, and analytic tools. The back-end of the architecture is a data integration pipeline for populating the data warehouse by extracting data from distributed and usually heterogeneous operational sources; cleansing, integrating and transforming the data; and loading it into the data warehouse. Since BI systems have been used primarily for off-line, strategic decision making, the traditional data integration pipeline is a oneway, batch process, usually implemented by extract-transform-load (ETL) tools. The design and implementation of the ETL pipeline is largely a labor-intensive activity, and typically consumes a large fraction of the effort in data warehousing projects. Increasingly, as enterprises become more automated, data-driven, and real-time, the BI architecture is evolving to support operational decision making. This imposes additional requirements and tradeoffs, resulting in even more complexity in the design of data integration flows. These include reducing the latency so that near real-time data can be delivered to the data warehouse, extracting information from a wider variety of data sources, extending the rigidly serial ETL pipeline to more general data flows, and considering alternative physical implementations. We describe the requirements for data integration flows in this next generation of operational BI system, the limitations of current technologies, the research challenges in meeting these requirements, and a framework for addressing these challenges. The goal is to facilitate the design and implementation of optimal flows to meet business requirements.

international conference on management of data | 1991

Suitability of datamodels as canonical models for federated databases

Fèlix Saltor; Malu Castellanos; Manuel García-Solaco

We develop a framework of characteristics, essential and recommended, that a data model should have to be suitable as canonical model for federated databases. This framework is based on the two factors of the representation ability of a model: expressiveness and semantic relativism. Several data models are analyzed with repect to the characteristics of the framework, to evaluate their adequacy as canonical models.

international conference on data engineering | 2005

iBOM: a platform for intelligent business operation management

Malu Castellanos; Fabio Casati; Ming-Chien Shan; Umeshwar Dayal

As IT systems become more and more complex and as business operations become increasingly automated, there is a growing need from business managers to have better control on business operations and on how these are aligned with business goals. This paper describes iBOM, a platform for business operation management developed by HP that allows users to i) analyze operations from a business perspective and manage them based on business goals; ii) define business metrics, perform intelligent analysis on them to understand causes of undesired metric values, and predict future values; iii) optimize operations to improve business metrics. A key aspect is that all this functionality is readily available almost at the click of the mouse. The description of the work proceeds from some specific requirements to the solution developed to address them. We also show that the platform is indeed general, as demonstrated by subsequent deployment domains other than finance.

international conference on management of data | 2012

Optimizing analytic data flows for multiple execution engines

Alkis Simitsis; Kevin Wilkinson; Malu Castellanos; Umeshwar Dayal

Next generation business intelligence involves data flows that span different execution engines, contain complex functionality like data/text analytics, machine learning operations, and need to be optimized against various objectives. Creating correct analytic data flows in such an environment is a challenging task and is both labor-intensive and time-consuming. Optimizing these flows is currently an ad-hoc process where the result is largely dependent on the abilities and experience of the flow designer. Our previous work addressed analytic flow optimization for multiple objectives over a single execution engine. This paper focuses on optimizing flows for a single objective, namely performance, over multiple execution engines. We consider flows that span a DBMS, a Map-Reduce engine, and an orchestration engine (e.g., an ETL tool or scripting language). This configuration is emerging as a common paradigm used to combine analysis of unstructured data with analysis of structured data (e.g., NoSQL plus SQL). We present flow transformations that model data shipping, function shipping, and operation decomposition and we describe how flow graphs are generated for multiple engines. Performance results for various configurations demonstrate the benefit of optimization.

Distributed and Parallel Databases | 2004

A Comprehensive and Automated Approach to Intelligent Business Processes Execution Analysis

Malu Castellanos; Fabio Casati; Umeshwar Dayal; Ming-Chien Shan

Business process management tools have traditionally focused on supporting the modeling and automation of business processes, with the aim of enabling faster and more cost-effective process executions. As more and more processes become automated, customers become increasingly interested in managing process executions. Specifically, there is a desire for getting more visibility into process executions, to be able to quickly spot problems and areas for improvements. The idea is that, by being able to assess the process execution quality, it is possible to take actions to improve and optimize process execution, thereby leading to processes that have higher quality and lower costs. All this is possible today, but involves the execution of specialized data mining projects that typically last months, costs hundreds of thousands of dollars, and only provide a specialized, narrow solution whose applicability is often relatively short in time, due to the ever changing business and IT environments. Still, the need is such that companies undertake these efforts.To address these needs, this paper presents a set of concepts and architectures that lay the foundation for providing users with intelligent analysis and predictions about business process executions. For example, the tools are able to provide users with information about why the quality of a process execution is low, what will be the outcome of a certain process, or how many processes will be started next week. This information is crucial to gain visibility into the processes, understand or foresee problems and areas of optimization, and quickly identify solutions. Intelligent analysis and predictions are achieved by applying data mining techniques to process execution data. In contrast to traditional approaches, where lengthy projects, considerable efforts, and specialized skills in both business processes and data mining are needed to achieve these objectives, we aim at automating the entire data mining process lifecycle, so that intelligent functionality can be provided by the system while requiring little or no user input. The ambitious end goal of the work presented in this paper is that of laying the foundation for a framework and tool that is capable of providing analysts with key intelligence information about process execution, affecting crucial IT and business decisions, almost literally at the click of a button.

international conference on data engineering | 2010

Optimizing ETL workflows for fault-tolerance

Alkis Simitsis; Kevin Wilkinson; Umeshwar Dayal; Malu Castellanos

Extract-Transform-Load (ETL) processes play an important role in data warehousing. Typically, design work on ETL has focused on performance as the sole metric to make sure that the ETL process finishes within an allocated time window. However, other quality metrics are also important and need to be considered during ETL design. In this paper, we address ETL design for performance plus fault-tolerance and freshness. There are many reasons why an ETL process can fail and a good design needs to guarantee that it can be recovered within the ETL time window. How to make ETL robust to failures is not trivial. There are different strategies that can be used and they each have different costs and benefits. In addition, other metrics can affect the choice of a strategy; e.g., higher freshness reduces the time window for recovery. The design space is too large for informal, ad-hoc approaches. In this paper, we describe our QoX optimizer that considers multiple design strategies and finds an ETL design that satisfies multiple objectives. In particular, we define the optimizer search space, cost functions, and search algorithms. Also, we illustrate its use through several experiments and we show that it produces designs that are very near optimal.

conference on information and knowledge management | 2013

Discovering coherent topics using general knowledge

Zhiyuan Chen; Arjun Mukherjee; Bing Liu; Meichun Hsu; Malu Castellanos; Riddhiman Ghosh

Topic models have been widely used to discover latent topics in text documents. However, they may produce topics that are not interpretable for an application. Researchers have proposed to incorporate prior domain knowledge into topic models to help produce coherent topics. The knowledge used in existing models is typically domain dependent and assumed to be correct. However, one key weakness of this knowledge-based approach is that it requires the user to know the domain very well and to be able to provide knowledge suitable for the domain, which is not always the case because in most real-life applications, the user wants to find what they do not know. In this paper, we propose a framework to leverage the general knowledge in topic models. Such knowledge is domain independent. Specifically, we use one form of general knowledge, i.e., lexical semantic relations of words such as synonyms, antonyms and adjective attributes, to help produce more coherent topics. However, there is a major obstacle, i.e., a word can have multiple meanings/senses and each meaning often has a different set of synonyms and antonyms. Not every meaning is suitable or correct for a domain. Wrong knowledge can result in poor quality topics. To deal with wrong knowledge, we propose a new model, called GK-LDA, which is able to effectively exploit the knowledge of lexical relations in dictionaries. To the best of our knowledge, GK-LDA is the first such model that can incorporate the domain independent knowledge. Our experiments using online product reviews show that GK-LDA performs significantly better than existing state-of-the-art models.

international conference on conceptual modeling | 2010

Leveraging business process models for ETL design

Kevin Wilkinson; Alkis Simitsis; Malu Castellanos; Umeshwar Dayal

As Business Intelligence evolves from off-line strategic decision making to on-line operational decision making, the design of the back-end Extract-Transform-Load (ETL) processes is becoming even more complex. Many challenges arise in this new context like their optimization and modeling. In this paper, we focus on the disconnection between the IT-level view of the enterprise presented by ETL processes and the business view of the enterprise required by managers and analysts. We propose the use of business process models for a conceptual view of ETL. We show how to link this conceptual view to existing business processes and how to translate from this conceptual view to a logical ETL view that can be optimized. Thus, we link the ETL processes back to their underlying business processes and so enable not only a business view of the ETL, but also a near real-time view of the entire enterprise.

Explore More