Featured Researches

Data Structures And Algorithms

Efficient Time and Space Representation of Uncertain Event Data

Process mining is a discipline which concerns the analysis of execution data of operational processes, the extraction of models from event data, the measurement of the conformance between event data and normative models, and the enhancement of all aspects of processes. Most approaches assume that event data is accurately capture behavior. However, this is not realistic in many applications: data can contain uncertainty, generated from errors in recording, imprecise measurements, and other factors. Recently, new methods have been developed to analyze event data containing uncertainty; these techniques prominently rely on representing uncertain event data by means of graph-based models explicitly capturing uncertainty. In this paper, we introduce a new approach to efficiently calculate a graph representation of the behavior contained in an uncertain process trace. We present our novel algorithm, prove its asymptotic time complexity, and show experimental results that highlight order-of-magnitude performance improvements for the behavior graph construction.

Read more
Data Structures And Algorithms

Efficient and near-optimal algorithms for sampling connected subgraphs

We study the graphlet sampling problem: given an integer k≥3 and a simple graph G=(V,E) , sample a connected induced k -node subgraph of G (also called k -graphlet) uniformly at random. This is a fundamental graph mining primitive, with applications in social network analysis and bioinformatics. In this work, we give the following results. (1) A near-tight bound for the classic k -graphlet random walk, as a function of the mixing time of G . In particular, ignoring k O(k) factors, we show that the random walk mixes in time Θ ~ (t(G)⋅ρ(G ) k−1 ) , where t(G) is the mixing time of G and ρ(G) is the ratio between its maximum and minimum degree. (2) The first efficient algorithm for uniform graphlet sampling. The algorithm has a preprocessing phase that uses time O(n k 2 lnk+m) and space O(n) , and a sampling phase that uses k O(k) O(logΔ) time per sample. (3) A near-optimal algorithm for ϵ -uniform graphlet sampling. The preprocessing takes time O( k 6 ϵ nlogn) and space O(n) , and the sampling takes k O(k) O(( 1 ϵ ) 10 log 1 ϵ ) expected time per sample.

Read more
Data Structures And Algorithms

Efficiently Computing Maximum Flows in Scale-Free Networks

We study the maximum-flow/minimum-cut problem on scale-free networks, i.e., graphs whose degree distribution follows a power-law. We propose a simple algorithm that capitalizes on the fact that often only a small fraction of such a network is relevant for the flow. At its core, our algorithm augments Dinitz's algorithm with a balanced bidirectional search. Our experiments on a scale-free random network model indicate sublinear run time. On scale-free real-world networks, we outperform the commonly used highest-label Push-Relabel implementation by up to two orders of magnitude. Compared to Dinitz's original algorithm, our modifications reduce the search space, e.g., by a factor of 275 on an autonomous systems graph. Beyond these good run times, our algorithm has an additional advantage compared to Push-Relabel. The latter computes a preflow, which makes the extraction of a minimum cut potentially more difficult. This is relevant, for example, for the computation of Gomory-Hu trees. On a social network with 70000 nodes, our algorithm computes the Gomory-Hu tree in 3 seconds compared to 12 minutes when using Push-Relabel.

Read more
Data Structures And Algorithms

Entropy of Mersenne-Twisters

The Mersenne-Twister is one of the most popular generators of uniform pseudo-random numbers. It is used in many numerical libraries and software. In this paper, we look at the Komolgorov entropy of the original Mersenne-Twister, as well as of more modern variations such as the 64-bit Mersenne-Twisters, the Well generators, and the Melg generators.

Read more
Data Structures And Algorithms

Erratum: Fast and Simple Horizontal Coordinate Assignment

We point out two flaws in the algorithm of Brandes and Köpf (Proc. GD 2001), which is often used for the horizontal coordinate assignment in Sugiyama's framework for layered layouts. One of them has been noted and fixed multiple times, the other has not been documented before and requires a non-trivial adaptation. On the bright side, neither running time nor extensions of the algorithm are affected adversely.

Read more
Data Structures And Algorithms

Estimating Rank-One Spikes from Heavy-Tailed Noise via Self-Avoiding Walks

We study symmetric spiked matrix models with respect to a general class of noise distributions. Given a rank-1 deformation of a random noise matrix, whose entries are independently distributed with zero mean and unit variance, the goal is to estimate the rank-1 part. For the case of Gaussian noise, the top eigenvector of the given matrix is a widely-studied estimator known to achieve optimal statistical guarantees, e.g., in the sense of the celebrated BBP phase transition. However, this estimator can fail completely for heavy-tailed noise. In this work, we exhibit an estimator that works for heavy-tailed noise up to the BBP threshold that is optimal even for Gaussian noise. We give a non-asymptotic analysis of our estimator which relies only on the variance of each entry remaining constant as the size of the matrix grows: higher moments may grow arbitrarily fast or even fail to exist. Previously, it was only known how to achieve these guarantees if higher-order moments of the noises are bounded by a constant independent of the size of the matrix. Our estimator can be evaluated in polynomial time by counting self-avoiding walks via a color -coding technique. Moreover, we extend our estimator to spiked tensor models and establish analogous results.

Read more
Data Structures And Algorithms

Euclidean Affine Functions and Applications to Calendar Algorithms

We study properties of Euclidean affine functions (EAFs), namely those of the form f(r) = (\alpha\cdot r + \beta)/\delta , and their closely related expression \mathring{f}(r) = (\alpha\cdot r + \beta)\%\delta , where r , \alpha , \beta and \delta are integers, and where / and \% respectively denote the quotient and remainder of Euclidean division. We derive algebraic relations and numerical approximations that are important for the efficient evaluation of these expressions in modern CPUs. Since simple division and remainder are particular cases of EAFs (when \alpha = 1 and \beta = 0 ), the optimisations proposed in this paper can also be appplied to them. Such expressions appear in some of the most common tasks in any computer system, such as printing numbers, times and dates. We use calendar calculations as the main application example because it is richer with respect to the number of EAFs employed. Specifically, the main application presented in this article relates to Gregorian calendar algorithms. We will show how they can be implemented substantially more efficiently than is currently the case in widely used C, C++, C# and Java open source libraries. Gains in speed of a factor of two or more are common.

Read more
Data Structures And Algorithms

Exact Algorithms for Scheduling Problems on Parallel Identical Machines with Conflict Jobs

Machine scheduling problems involving conflict jobs can be seen as a constrained version of the classical scheduling problem, in which some jobs are conflict in the sense that they cannot be proceeded simultaneously on different machines. This conflict constraint naturally arises in several practical applications and has recently received considerable attentions in the research community. In fact, the problem is typically NP-hard (even for approximation) and most of algorithmic results achieved so far have heavily relied on special structures of the underlying graph used to model the conflict-job relation. Our focus is on three objective functions: minimizing the makespan, minimizing the weighted summation of the jobs' completion time, and maximizing the total weights of completed jobs; the first two of which have been intensively studied in the literature. For each objective function considered, we present several mixed integer linear programming models and a constraint programming model, from which we can solve the problems to optimality using dedicated solvers. Binary search-based algorithms are also proposed to solve the makespan problem. The results of numerical experiments performed on randomly generated data sets with up to 32 jobs and 6 machines are reported and analysed to verify the performance of the proposed methods.

Read more
Data Structures And Algorithms

Expected Performance and Worst Case Scenario Analysis of the Divide-and-Conquer Method for the 0-1 Knapsack Problem

In this paper we furnish quality certificates for the Divide-and-Conquer method solving the 0-1 Knapsack Problem: the worst case scenario and estimates for the expected performance. The probabilistic setting is given and the main random variables are defined for the analysis of the expected performance. The efficiency is rigorously approximated for one iteration of the method then, these values are used to derive analytic estimates for the performance of a general Divide-and-Conquer tree. All the theoretical results are verified with statistically suited numerical experiments for a wider illustration of the method.

Read more
Data Structures And Algorithms

Exploitation of Multiple Replenishing Resources with Uncertainty

We consider an optimization problem in which a (single) bat aims to exploit the nectar in a set of n cacti with the objective of maximizing the expected total amount of nectar it drinks. Each cactus i∈[n] is characterized by a parameter r i >0 that determines the rate in which nectar accumulates in i . In every round, the bat can visit one cactus and drink all the nectar accumulated there since its previous visit. Furthermore, competition with other bats, that may also visit some cacti and drink their nectar, is modeled by means of a stochastic process in which cactus i is emptied in each round (independently) with probability 0< s i <1 . Our attention is restricted to purely-stochastic strategies that are characterized by a probability vector ( p 1 ,…, p n ) determining the probability p i that the bat visits cactus i in each round. We prove that for every ϵ>0 , there exists a purely-stochastic strategy that approximates the optimal purely-stochastic strategy to within a multiplicative factor of 1+ϵ , while exploiting only a small core of cacti. Specifically, we show that it suffices to include at most 2(1−σ) ϵ⋅σ cacti in the core, where σ= min i∈[n] s i . We also show that this upper bound on core size is asymptotically optimal as a core of a significantly smaller size cannot provide a (1+ϵ) -approximation of the optimal purely-stochastic strategy. This means that when the competition is more intense (i.e., σ is larger), a strategy based on exploiting smaller cores will be favorable.

Read more

Ready to get started?

Join us today