Is this you? Create Your Porfile

Marc Solé

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marc Solé is active.

Explore More

Publication

Featured researches published by Marc Solé.

applications and theory of petri nets | 2010

Process mining from a basis of state regions

Marc Solé; Josep Carmona

A central problem in the area of Process Mining is to obtain a formal model that represents selected behavior of a system. The theory of regions has been applied to address this problem, enabling the derivation of a Petri net whose language includes a set of traces. However, when dealing with real-life systems, the available tool support for performing such task is unsatisfactory, due to the complex algorithms that are required. In this paper, the theory of regions is revisited to devise a novel technique that explores the space of regions by combining the elements of a region basis. Due to its light space requirements, the approach can represent an important step for bridging the gap between the theory of regions and its industrial application. Experimental results improve in orders of magnitude state-of-the-art tools for the same task.

International Journal of Information Security | 2012

Efficient microaggregation techniques for large numerical data volumes

Marc Solé; Victor Muntés-Mulero; Jordi Nin

The contradictory requirements of data privacy and data analysis have fostered the development of statistical disclosure control techniques. In this context, microaggregation is one of the most frequently used methods since it offers a good trade-off between simplicity and quality. Unfortunately, most of the currently available microaggregation algorithms have been devised to work with small datasets, while the size of current databases is constantly increasing. The usual way to tackle this problem is to partition large data volumes into smaller fragments that can be processed in reasonable time by available algorithms. This solution is applied at the cost of losing quality. In this paper, we revisited the computational needs of microaggregation showing that it can be reduced to two steps: sorting the dataset with regard to a vantage point and a set of k-nearest neighbors searches. Considering this new point of view, we propose three new efficient quality-preserving microaggregation algorithms based on k-nearest neighbors search techniques. We present a comparison of our approaches with the most significant strategies presented in the literature using three real very large datasets. Experimental results show that our proposals overcome previous techniques by keeping a better balance between performance and the quality of the anonymized dataset.

formal methods in computer aided design | 2002

Traversal Techniques for Concurrent Systems

Marc Solé; Enric Pastor

Symbolic model checking based on Binary Decision Diagrams (BDDs) is a verification tool that has received an increasing attention by the research community. Conventional breadth-first approach to state generation results is often responsible for inefficiencies due to the growth of the BDD sizes. This is specially true for concurrent systems in which existing research (mostly oriented to synchronous designs) is ineffective. In this paper we show that it is possible to improve BFS symbolic traverse for concurrent systems by scheduling the application of the transition relation. The scheduling scheme is devised analyzing the causality relations between the events that occur in the system. We apply the scheduled symbolic traverse to invariant checking. We present a number of schedule schemes and analyze its implementation and effectiveness in a prototype verification tool.

IEEE Transactions on Knowledge and Data Engineering | 2011

Optimal Symbol Alignment Distance: A New Distance for Sequences of Symbols

Javier Herranz; Jordi Nin; Marc Solé

Comparison functions for sequences (of symbols) are important components of many applications, for example, clustering, data cleansing, and integration. For years, many efforts have been made to improve the performance of such comparison functions. Improvements have been done either at the cost of reducing the accuracy of the comparison, or by compromising certain basic characteristics of the functions, such as the triangular inequality. In this paper, we propose a new distance for sequences of symbols (or strings) called Optimal Symbol Alignment distance (OSA distance, for short). This distance has a very low cost in practice, which makes it a suitable candidate for computing distances in applications with large amounts of (very long) sequences. After providing a mathematical proof that the OSA distance is a real distance, we present some experiments for different scenarios (DNA sequences, record linkage, etc.), showing that the proposed distance outperforms, in terms of execution time and/or accuracy, other well-known comparison functions such as the Edit or Jaro-Winkler distances.

IEEE Transactions on Knowledge and Data Engineering | 2013

Region-Based Foldings in Process Discovery

Marc Solé; Josep Carmona

A central problem in the area of Process Mining is to obtain a formal model that represents the processes that are conducted in a system. If realized, this simple motivation allows for powerful techniques that can be used to formally analyze and optimize a system, without the need to resort to its semiformal and sometimes inaccurate specification. The problem addressed in this paper is known as Process Discovery: to obtain a formal model from a set of system executions. The theory of regions is a valuable tool in process discovery: it aims at learning a formal model (Petri nets) from a set of traces. On its genuine form, the theory is applied on an automaton and therefore one should convert the traces into an acyclic automaton in order to apply these techniques. Given that the complexity of the region-based techniques depends on the size of the input automata, revealing the underlying cycles and folding the initial automaton can incur in a significant complexity alleviation of the region-based techniques. In this paper, we follow this idea by incorporating region information in the cycle detection algorithm, enabling the identification of complex cycles that cannot be obtained efficiently with state-of-the-art techniques. The experimental results obtained by the devised tool suggest that the techniques presented in this paper are a big step into widening the application of the theory of regions in Process Mining for industrial scenarios.

Information Fusion | 2012

Kd-trees and the real disclosure risks of large statistical databases

Javier Herranz; Jordi Nin; Marc Solé

Estimating the disclosure risk of a Statistical Disclosure Control (SDC) protection method by means of (distance-based) record linkage techniques is a very popular approach to analyze the privacy level offered by such a method. When databases are very large, some particular record linkage techniques such as blocking or partitioning are usually applied to make this process reasonably efficient. However, in this case the record linkage process is not exact, which means that the disclosure risk of a SDC protection method may be underestimated. In this paper we propose the use of kd-trees techniques to apply exact yet very efficient record linkage when (protected) datasets are very large. We describe some experiments showing that this approach achieves better results, in terms of both accuracy and running time, than more classical approaches such as record linkage based on a sliding window. We also discuss and experiment on the use of these techniques not to link a whole protected record with its original one, but just to guess the value of some confidential attribute(s) of the record(s). This fact leads to concepts such as k-neighbor l-diversity or k-neighbor p-sensitivity, a generalization (to any SDC protection method) of l-diversity or p-sensitivity, which have been defined for SDC protection methods ensuring k-anonymity, such as microaggregation.

applications and theory of petri nets | 2011

Light Region-based Techniques for Process Discovery

Marc Solé; Josep Carmona

A central problem in the area of Process Mining is to obtain a formal model that represents selected behavior of a system. The theory of regions has been applied to address this problem, enabling the derivation of a Petri net whose language includes a set of traces. However, when dealing with real-life systems, the available tool support for performing such a task is unsatisfactory, due to the complex algorithms that are required. In this paper, the theory of regions is revisited to devise a novel technique that explores the space of regions by combining the elements of a region basis. Due to its light space requirements, the approach can represent an important step for bridging the gap between the theory of regions and its industrial application. Experimental results show that there is improvement in orders of magnitude in comparison with state-of-the-art tools for the same task.

automated technology for verification and analysis | 2010

Rbminer: a tool for discovering Petri nets from transition systems

Marc Solé; Josep Carmona

The theory of regions was introduced in the nineties to enable the transformation of an automata into a Petri net. From very restricting initial requirements, the theory has evolved in several dimensions in the last two decades, widening the scope of application to more general scenarios. In contrast, few tools have appeared to support these new theories, thus relegating the potential of the area only to the academic domain. This paper introduces rbminer, a tool that combines the theory of regions with linear algebra to compute a basis of state regions. Due to its light space requirements, this approach may contribute to bridge the gap between the theory of regions and its industrial application.

international conference on application of concurrency to system design | 2012

A High-Level Strategy for C-net Discovery

Marc Solé; Josep Carmona

Causal nets have been recently proposed as a suitable model for process mining, due to their declarative semantics and compact representation. However, the discovery of causal nets from a log is a complex problem. The current algorithmic support for the discovery of causal nets comprises either fast but inaccurate methods (compromising quality), or accurate algorithms that are computationally demanding, thus limiting the size of the inputs they can process. In this paper a high-level strategy is presented, which uses appropriate clustering techniques to split the log into pieces, and benefits from the additive nature of causal nets. This allows amalgamating structurally the discovered causal net of each piece to derive a valuable model. The claims in this paper are accompanied with experimental results showing the significance of the high-level strategy presented.

applications and theory of petri nets | 2012

An SMT-Based discovery algorithm for c-nets

Marc Solé; Josep Carmona

Recently, Causal nets have been proposed as a suitable model for process discovery, due to their declarative semantics and the great expressiveness they possess. In this paper we propose an algorithm to discover a causal net from a set of traces. It is based on encoding the problem as a Satisfiability Modulo Theories (SMT) formula, and uses a binary search strategy to optimize the derived model. The method has been implemented in a prototype tool that interacts with an SMT solver. The experimental results obtained witness the capability of the approach to discover complex behavior in limited time.

Explore More