DData Series Indexing Gone Parallel
Botao Peng (Expected graduation date: August 2020; supervised by: Panagiota Fatourou, Themis Palpanas)
LIPADE, Universit´e de Paris, [email protected]
Abstract —Data series similarity search is a core operation forseveral data series analysis applications across many differentdomains. However, the state-of-the-art techniques fail to deliverthe time performance required for interactive exploration, oranalysis of large data series collections. In this Ph.D. work,we present the first data series indexing solutions, for bothon-disk and in-memory data, that are designed to inherentlytake advantage of multi-core architectures, in order to acceleratesimilarity search processing times. Our experiments on a varietyof synthetic and real data demonstrate that our approaches areup to orders of magnitude faster than the alternatives. Morespecifically, our on-disk solution can answer exact similaritysearch queries on 100GB datasets in a few seconds, and our in-memory solution in a few milliseconds, which enables real-time,interactive data exploration on very large data series collections.
Index Terms —Data series, Indexing, Modern hardware I. INTRODUCTION
An increasing number of applications across many diversedomains continuously produce very large amounts of data se-ries (such as in finance, environmental sciences, astrophysics,neuroscience, engineering, and others [1]–[3]), which makesthem one of the most common types of data. When thesesequence collections are generated (often times composed ofa large number of short series [3], [4]), users need to queryand analyze them (e.g., detect anomalies [5], [6]). This processis heavily dependent on data series similarity search (whichapart from being a useful query in itself, also lies at thecore of several machine learning methods, such as, clustering,classification, motif and outlier detection, etc.) [7].The brute-force approach for evaluating similarity searchqueries is by performing a sequential pass over the dataset.However, as data series collections grow larger, scanning thecomplete dataset becomes a performance bottleneck, takinghours or more to complete [8]. This is especially problem-atic in exploratory search scenarios, where every next querydepends on the results of previous queries. Consequently, wehave witnessed an increased interest in developing indexingtechniques and algorithms for similarity search [3], [7]–[22].Nevertheless, the continued increase in the rate and volumeof data series production with collections that grow to severalterabytes [1] renders existing data series indexing technologiesinadequate. For example, the current state-of-the-art index,ADS+ [8] requires more than 4min to answer any singleexact query on a moderately sized 250GB sequence collection. A data series, or data sequence, is an ordered sequence of real-valuedpoints. If the ordering dimension is time then we talk about time series,though, series can be ordered over other measures (e.g., angle in astronomicalradial profiles, mass in mass spectroscopy, position in genome sequences, etc.).
Moreover, index construction time also becomes a significantbottleneck in the analysis process [7]. Thus, traditional solu-tions and systems are inefficient at, or incapable of managingand processing the existing voluminous sequence collections. [Contributions]
In our work, we focus on the use of multi-core and multi-socket architectures, as well Single IstructionMultiple Data (SIMD) computations, to accelerate data seriessimilarity search. Our contributions are organized as follows.1. ParIS [23] is the first data series index designed for multi-core architectures. We describe parallel algorithms for indexcreation and exact query answering, which employ parallelismin reading the data from disk and processing them in the CPU.2.ParIS+ [24] is an improvement of ParIS that achievesperfect overlap between the disk I/O and the CPU costs(completely masking out CPU cost) when creating the index.3. MESSI [25] is the first parallel in-memory data seriesindex. Contrary to ParIS/ParIS+, MESSI employs a tree-based query answering strategy that minimizes the numberof distance calculations, leading to highly efficient search.II. P
RELIMINARIES [Data Series]
A data series, S = { p , ..., p n } , is a sequence ofpoints (Figure 1(a)), where each point p i = ( v i , t i ) , ≤ i ≤ n ,is a real value v i at a position t i that corresponds to the order ofthis value in the sequence. We call n the size , or length of theseries. (For streaming series, we create and index subsequencesof length n using a sliding window.) [Similarity Search] Nearest Neighbor (NN) queries are de-fined as follows: given a query series S q of length n , and a dataseries collection S of sequences of the same length, n , we wantto identify the series S c ∈ S that has the smallest distance to S q among all the series in the collection S . Common distancemeasures for comparing data series are Euclidean Distance(ED) [26] and dynamic time warping (DTW) [10]. [iSAX Representation] The iSAX representation first sum-marizes the points in the data series using segments of equallength, where the value of each segment is the mean ofthe corresponding points (Piecewise Aggregate Approximation(PAA)), and then divides the (y-axis) space in different regionsby assigning a bit-wise symbol to each region, and representseach segment of the series with the symbol of the regionthe segment falls into. This forms a word like (subscripts denote the number of bits used to represent thesymbol of each segment), which is called the indexableSymbolic Aggregate approXimation (iSAX) [9]. [ADS+ Index] Based on this representation, the ADS+ indexwas developed [8]. It makes use of variable cardinalities (i.e., a r X i v : . [ c s . D B ] S e p a) a raw data series (b) its PAA representation N ( , ) (c) its iSAX representation ROOT0 0 0 . . .
1 0 1 . . .
1 1 110 0 1 11 0 110 00 1 11 01 1 (d) ADS+ indexFig. 1. The iSAX representation, and the ADS+ index structure variable degrees of precision for the symbol of each segment;in order to build a hierarchical tree index, consisting of threetypes of nodes: (i) the root node points to several childrennodes, w in the worst case (when the series in the collectioncover all possible iSAX representations), where w is thenumber of segments; (ii) each inner node contains the iSAXrepresentation of all the series below it, and has two children;and (iii) each leaf node contains the iSAX representationsand pointers to the raw data for all series inside it. Whenthe number of series in a leaf node becomes greater than themaximum leaf capacity, the leaf splits: it becomes an innernode and creates two new leaves, by increasing the cardinalityof the iSAX representation of one of the segments (the onethat will result in the most balanced split of the contents ofthe node to its two new children [8], [12]). The two refinediSAX representations (new bits set to and ) are assignedto the two new leaves. In our example, the series of will beplaced in the outlined node of the index.The ParIS/ParIS+ and MESSI indices use the iSAX repre-sentation and basic ADS+ index structure [27], but implementalgorithms specifically designed for multi-core architectures.III. P ROPOSED S OLUTION [ParIS/ParIS+ Approach]
We describe our approach, calledParallel Indexing of Sequences (ParIS), and then presentParIS+, which improves uppon ParIS.Figure 2 provides an overview of the entire pipeline of howthe ParIS index is created and then used for query answering.In Stage of the pipeline, a thread, called the Coordinator worker, reads raw data series from the disk and transfers theminto the raw data buffer in main memory. In Stage , a numberof IndexBulkLoading workers, process the data series in theraw data buffer to create their iSAX summarizations. EachiSAX summarization determines to which root subtree of thetree index the series belongs. Specifically, this is determinedby the first bit of each of the w segments of the iSAXsummarization. The summarizations are then stored in one ofthe index Receiving Buffers (RecBufs) in main memory. Thereare as many RecBufs as the root subtrees of the index tree,each one storing the iSAX summarizations that belong to a single subtree. This number is usually a few tens of thousands(and at most w , where w is the number of segments in theiSAX representation of each time series; we set w to , asin previous studies [8]). The iSAX summarizations are alsostored in the array SAX (used during query answering).When all available main memory is full, Stage starts. Inthis stage, a pool of IndexConstruction workers processes thecontents of RecBufs; every such worker is assigned a distinctRecBuf at each time: it reads the data stored in it and buildsthe corresponding index subtree. So, root subtrees are built inparallel. The leaves of each subtree is flushed to the disk atthe end of the tree construction process. This results in freespace in main memory. These 3 stages are repeated until allraw data series are read from the disk, the entire index tree isconstructed, and the SAX array is completed. The index treetogether with SAX form the ParIS index, which is then usedin Stage for answering similarity search queries.Paris+ improves ParIS by completely masking out the CPUcost when creating the index. This is not true for ParIS, whoseindex creation (stages 1-3) is not purely I/O bounded. Thereason for this is that, in ParIS, the IndexConstruction work-ers do not work concurrently with the Coordinator worker.Moreover, the IndexBulkLoading workers do not have enoughCPU work to do to fully overlap the time needed by theCoordinator worker to read the raw data file. ParIS+ is anoptimized version of ParIS, which achieves a complete overlapof the CPU computation with the I/O cost. In ParIS+, the In-dexBulkLoading workers have undertaken the task of buildingthe index tree, in addition to performing the tasks of stage 2.The IndexConstruction workers now simply materialize theleaves by flushing them on disk.For query answering, ParIS and ParIS+ are the same, and thealgorithm operates as follows. It first computes an approximateanswer by calculating BSF (Best-So-Far), i.e., the real distancebetween the query and the best candidate series, which is inthe leaf with the smallest lower bound distance to the query.Then, a number of lower bound calculation workers computethe lower bound distances between the query and the iSAXsummary of each data series in the dataset, which are storedin the SAX array , and prune the series whose lower bounddistance is larger than the approximate real distance computedearlier. The data series that are not pruned, are stored in acandidate list, which then a number of real distance calculationworkers consume in parallel and compute the real distancesbetween the query and the series stored in it (for which theraw values need to be read from disk). For details see [23]. [MESSI Approach]
In Figure 3, we present the pipeline ofthe in-MEmory data SerieS Index (MESSI) [25]. In contrastto ParIS/ParIS+, the raw data are stored in an in-memoryarray, called
RawData . In Stage 1, this array is split intoa predetermined number of chunks. A (different) numberof index bulk loading worker threads process the chunks tocalculate the iSAX summaries of the raw data series theystore. Chunks are assigned to index workers one after the other(using Fetch&Inc). Based on the iSAX representation, we canfigure out in which subtree of the index tree an iSAX summary uery answeringindex bulk loading fill up index Receiving Buffers (RecBufs) index construction index treeDiskMain memory process summarizations in each buffer grow subtree flush subtree leaves to disksplit based on iSAXsummarization SAX: array of iSAX sum/tions queryanswersIterate
Stage 3
Grow index and persist index leaves to disk by IndexConstruction workers
Stage 4
Similarity search query answeringRaw data
Index Construction Query Answering
SAX: array of iSAXsummarizationsRaw Data Buffer
Stage 2
Load data to index by
IndexBulkLoading workers
Stage 1
Preprocessing by the coordinator worker
ParIS+ ParIS index prune based on LB distancescalculate real distances for non-pruned series … compute BSF pointers to the raw data Fig. 2. Pipeline for creation of and query answering with the ParIS/ParIS+ indices. query answering index bulk loading fill up index iSAX Buffers (RecBufs) index construction index tree M a i n m e m o r y process summarizations in each buffer grow subtree split based on iSAXsummarization queryanswersRaw data I nd e x C o n s t r u c t i o n Q u e r y A n s w e r i n g index prune nodes based on LB distances; insert nodes in priority queue calculate LB & real distances for non-pruned nodes … compute BSF SAX: array of iSAXsummarizations S t a g e S i m il a r i t y s e a r c h qu e r y a n s w e r i n g S t a g e L o a d d a t a t o i nd e x S t a g e P r e p r o c e ss i n g d a t a p o i n t e r s t o t h e r a w da t a Fig. 3. Pipeline for creation of and query answering with the MESSI index. will be stored. A number of iSAX buffers , one for each rootsubtree of the index tree, contain the iSAX summaries to bestored in that subtree. Each index worker stores the iSAXsummaries it computes in the appropriate iSAX buffers. Toreduce synchronization cost, each iSAX buffer is split intoparts and each worker works on its own part . The numberof iSAX buffers is usually a few tens of thousands and atmost w , where w is the number of segments in the iSAXsummaries of each data series ( w is fixed to in this paper,as in previous studies [8], [23]).MESSI’s index construction phase is different from ParIS.ParIS uses a number of buffers to temporarily store pointers tothe iSAX summaries of the raw data series before constructingthe index [23], [24]. MESSI allocates smaller such buffers perthread for storing the iSAX summaries themselves. In thisway, it eliminates the synchronization cost in accessing the We also tried an alternative technique: each buffer was protected by alock and many threads were accessing each buffer. However, this resulted inworse performance due to contention in accessing the iSAX buffers. iSAX buffers. To achieve load balancing, MESSI splits thearray storing the raw data series into small blocks, and assignsblocks to threads in a round-robin fashion.When the iSAX summaries for all raw data series have beencomputed, we move to Stage 2, and the index workers proceedin the construction of the tree index. Each worker is assignedan iSAX buffer to work on (using Fetch&Inc). Each workerreads the data stored in (all parts of) its assigned buffer andbuilds the corresponding index subtree. Therefore, all indexworkers process distinct subtrees of the index, and can workin parallel and independently from one another, with no needfor synchronization . When an index worker finishes with thecurrent iSAX buffer it works on, it continues with the nextiSAX buffer that has not yet been processed.When the series in all iSAX buffers have been processed, thetree index has been built and can be used to answer similaritysearch queries, as shown in Stage 3. To answer a query, wefirst perform a search for the query iSAX summary in thetree index. This returns a leaf whose iSAX summary hasthe closest distance to the iSAX summary of the query. Wecalculate the real distances of the (raw) data series pointedto by the elements of this leaf to the query series, and storethe minimum distance in a shared BSF (Best-So-Far) variable.Then, the index workers traverse the index subtrees (one afterthe other) using BSF to decide which subtrees will be pruned.The leaves of the subtrees that cannot be pruned are placed(along with the lower bound distance between the raw valuesof the query series and the iSAX summary of the leaf node)into a number of minimum priority queues. Each thread insertselements in the priority queues in a round-robin fashion so thatload balancing is achieved. the same number of elements).As soon as the necessary elements have been placed in thepriority queues, each index worker chooses a priority queueto work on, and repeatedly pops leaf nodes, on which itperforms the following operations. It first checks whether thelower bound distance stored in the priority queue is larger Parallelizing the processing inside each one of the index root subtreeswould require a lot of synchronization due to node splitting. han the current BSF: if it is then we are certain that the leafnode does not contain possible answers and we can pruneit; otherwise, the worker needs to examine the series in thisleaf node, by first computing lower bound distances using theiSAX summaries, and if needed also the real distances usingthe raw values. During this process, we may find a series witha smaller distance to the query, in which case we update theBSF. When a worker reaches a node whose distance is biggerthan the BSF, it gives up this priority queue and starts workingon another, because it is certain that all the other elementsin the abandoned queue have an even higher distance to thequery series. This process is repeated until all priority queuesare processed. At the end of the calculation, the value of BSFis returned as the query answer.Note that, similarly to ParIS/ParIS+, MESSI uses SIMD(Single-Instruction Multiple-Data) for calculating the distancesof the index iSAX summaries from the query iSAX summary( lower bound distance calculations ), and the raw data seriesfrom the query data series ( real distance calculations ) [23].IV. E
XPERIMENTAL E VALUATION
We summarize the performance results for index creationand query answering using the ParIS/ParIS+ and MESSIindices, for both on-disk and in-memory data. We compareour methods to the state-of-the-art index, ADS+ [8], and serialscan method, UCR Suite [10]. We use two sockets and splitthe number of cores equally between them. The datasets weuse are Synthetic (random walk: 100M series of 256 points),SALD (electroencephalography: 200M series of 128 points),and Seismic (seismic activity: 100M series of 256 points). [Index Creation Performance]
In Figure 4, we evaluate thetime it takes to create the tree index for a Synthetic datasetof 100M series as we vary the number of cores. The resultsdemonstrate that ParIS+ completely removes the (visible) CPUcost when we use more than 6 cores. Figure 5 shows that theindex creation time of MESSI reduces linearly as the numberof cores increases (dataset of 100GB).We also evaluate the time it takes to create the data seriesfor different datasets of size 100GB. The results depictedin Figure 6 show that ParIS+ is 2.6x faster than ADS+ forSynthetic, 3.2x faster for SALD, and 2.3x faster for Seismic.Figure 7 focuses on in-memory index creation. We observethat for the 100GB Synthetic dataset, MESSI performs 3.6xfaster than an in-memory implementation of ParIS. Note thatParIS is faster than ParIS+ for in-memory index creation(remember that in query answering, they both use the samealgorithm and perform equally). The reason is that ParIS+accesses repeatedly the nodes that are children of the rootin order to grow the corresponding sub-trees (for on-diskdata, this helps to overlap the CPU with the disk I/O cost).However, when in-memory, there is no expensive disk I/Ocost to overlap them with; thus, ParIS+ ends up performingunnecessary calculations as it traverses the sub-trees over andover again. In contrast, ParIS only accesses the children of theroot once for every time the main memory gets full (refer toStage 2 of Figure 2). Regarding the real datasets, MESSI is 3.6x faster than ParIS on SALD, and 3.7x faster than ParISon Seismic (both datasets are 100GB in size). [Query Answering Performance]
Figure 8 (log-scale y-axis)shows the exact query answering time for ParIS+ and ADS+,as we vary the number of cores, for HDD and SSD storage. Inboth cases performance improves as we increase the numberof cores, with the SSD being > order of magnitude faster.Figure 9 (log-scale y-axis) compares the performance of theMESSI query answering algorithm to its competitors, as thenumber of cores increases, for a Synthetic dataset of 100GB.The results show that MESSI significantly outperforms ParISand (an in-memory, parallel implementation of) UCR Suite.In Figure 10 (log-scale y-axis), we show that on HDD andacross the datasets used in our study, ParIS+ is up to one orderof magnitude faster than ADS+ in query answering, and morethan two orders of magnitude faster than UCR Suite. Whenthe data are stored on an SSD (refer to Figure 11; log-scaley-axis), both ADS+ and ParIS+ benefit from the low SSDrandom access latency. In this case, ParIS+ is 15x faster thanADS+, and 2000x faster than UCR Suite.The results of in-memory query answering, depicted inFigure 12 (log-scale y-axis), show that MESSI perform con-siderably better than the other approaches. MESSI is 55xfaster than UCR Suite and 6.4x faster than (the in-memoryimplemenation of) ParIS. The performance improvement withregards to ParIS is because, in contrast to ParIS, MESSIapplies pruning when performing the lower bound distancecalculations, and therefore needs less computations overall toexecute this phase. Moreover, the use of the priority queuesresult in even higher pruning power. As a side effect, MESSIalso performs less real distance calculations than ParIS.Figures 12 also shows that MESSI exhibits the best perfor-mance for the real datasets, SALD and Seismic (both 100GBin size), as well. The reasons for this are those explained inthe previous paragraphs. For the SALD dataset, MESSI queryanswering is 60x faster than UCR Suite and 8.4x faster thanParIS, whereas for the Seismic dataset, MESSI is 80x fasterthan UCR Suite, and almost 11x faster than ParIS. Note thatMESSI exhibits better performance than UCR Suite in the caseof real datasets. This is so because working on random dataresults in better pruning than that on real data.V. C ONCLUSIONS AND C URRENT W ORK
In this thesis, we describe the first data series indices thatexploit the parallelism opportunities of multi-core and multi-socket architectures, for both on-disk and in-memory data. Theevaluation with synthetic and real datasets demonstrates the ef-ficiency of our solutions, which are orders of magnitude fasterthan the state-of-the-art competitors. This level of performanceachieved by our approaches enable for the first time interactivedata exploration on very large data series collections.As part of our current work, we are extending our techniques(i.e., ParIS+ and MESSI) to support the DTW distance mea-sure. In order to do this, no changes are required in the indexstructure: we can index a dataset once, and then use this indexto answer both Euclidean and DTW similarity search queries.
300 1 4 6 12 24 4 6 12 24
ADS+ ParIS ParIS+ T i m e ( S e c o nd s ) AlgorithmsRead Write CPU
Fig. 4. ParIS/ParIS+ index creation T i m e ( S e c o nd s ) Number of coresTree IndexConstruction
Calculate iSAXRepresentations
Fig. 5. MESSI index creation
Synthetic SALD Seismic T i m e ( S e c ond s ) Dataset
ADS+ParISParIS+
Fig. 6. ParIS/ParIS+ index creation
Synthetic SALD Seismic T i m e ( S e c ond s ) Dataset
ParISMESSI
Fig. 7. MESSI index creation T i m e ( S e c o nd s ) Number of cores
ParIS+ on HDD
ParIS+ on SSD
Fig. 8. ParIS+ query answeringon HDD&SSD
12 18 24 48 ( H T ) T i m e ( M illi s e c o nd s ) Number of cores
UCR Suite-p ParIS MESSI
Fig. 9. MESSI query answering(in-memory)
Synthetic SALD Seismic T i m e ( S e c ond s ) Dataset
UCR SuiteADS+ ParIS+
Fig. 10. ParIS+ query an-swering on HDD . . . Synthetic SALD Seismic T i m e ( S e c ond s ) Dataset
UCR SuiteADS+ ParIS+
Fig. 11. ParIS+ query an-swering on SSD
Synthetic SALD Seismic T i m e ( M illi s e c ond s ) Dataset
UCR Suite−pParIS MESSI
Fig. 12. MESSI query an-swering
Moreover, we are working on a GPU-based solution, wherethe CPU and GPU collaborate to answer a query: the CPUhandles the index tree traversals and real distance calculations(the raw data do not fit in the GPU memory), while the GPUperforms the lower bound distance calculations. We are alsointegrating our techniques with a distributed approach [20],[22], which is complementary to the ParIS+ and MESSIsolutions. Finally, we note that our techniques are applicable to high-dimensional vectors in general (not just sequences) [21].Therefore, we will study applications of our techniques inproblems related to deep learning embeddings (which are high-dimensional vectors), such as similarity search for images [28]. [Acks]
Work supported by Investir lAvenir, Univ. of ParisIDEX Emergence en Recherche ANR-18-IDEX-000, ChineseScholarship Council, FMJH PGMO, EDF, Thales and HIPEAC4. Part of work performed while P. Fatourou was visitingLIPADE, and while B. Peng was visiting CARV, FORTH ICS.R
EFERENCES[1] T. Palpanas, “Data series management: The road to big sequenceanalytics,”
SIGMOD Record , 2015.[2] K. Zoumpatianos and T. Palpanas, “Data series management: Fulfillingthe need for big sequence analytics,” in
ICDE , 2018.[3] T. Palpanas and V. Beckmann, “Report on the First and Second Inter-disciplinary Time Series Analysis Workshop (ITISA),”
SIGMOD Rec. ,vol. 48, no. 3, 2019.[4] A. J. Bagnall, R. L. Cole, T. Palpanas, and K. Zoumpatianos, “Dataseries management (dagstuhl seminar 19282),”
Dagstuhl Reports , 9(7),2019.[5] P. Boniol, M. Linardi, F. Roncallo, and T. Palpanas, “AutomatedAnomaly Detection in Large Sequences,”
ICDE , 2020.[6] P. Boniol and T. Palpanas, “Series2Graph: Graph-based SubsequenceAnomaly Detection for Time Series,”
PVLDB , 2020.[7] K. Echihabi, K. Zoumpatianos, T. Palpanas, and H. Benbrahim, “Thelernaean hydra of data series similarity search: An experimental evalu-ation of the state of the art,”
PVLDB , 2019.[8] K. Zoumpatianos, S. Idreos, and T. Palpanas, “Ads: the adaptive dataseries index,”
VLDB J. , 2016.[9] J. Shieh and E. Keogh, “i sax: indexing and mining terabyte sized timeseries,” in
SIGKDD , 2008. [10] T. Rakthanmanon, B. J. L. Campana, A. Mueen, G. E. A. P. A. Batista,M. B. Westover, Q. Zhu, J. Zakaria, and E. J. Keogh, “Searchingand mining trillions of time series subsequences under dynamic timewarping,” in
SIGKDD , 2012.[11] Y. Wang, P. Wang, J. Pei, W. Wang, and S. Huang, “A data-adaptiveand dynamic segmentation index for whole matching on time series,”
VLDB , 2013.[12] A. Camerra, J. Shieh, T. Palpanas, T. Rakthanmanon, and E. Keogh,“Beyond One Billion Time Series: Indexing and Mining Very LargeTime Series Collections with iSAX2+,”
KAIS , vol. 39, no. 1, 2014.[13] M. Dallachiesa, T. Palpanas, and I. F. Ilyas, “Top-k nearest neighborsearch in uncertain data series,”
VLDB , 2014.[14] D. E. Yagoubi, R. Akbarinia, F. Masseglia, and T. Palpanas, “Dpisax:Massively distributed partitioned isax,” in
ICDM , 2017.[15] H. Kondylakis, N. Dayan, K. Zoumpatianos, and T. Palpanas, “Co-conut: A scalable bottom-up approach for building data series indexes,”
PVLDB , 2018.[16] M. Linardi and T. Palpanas, “Ulisse: Ultra compact index for variable-length similarity search in data series,” in
ICDE , 2018.[17] ——, “Scalable, variable-length similarity search in data series: Theulisse approach,”
PVLDB , 2019.[18] H. Kondylakis, N. Dayan, K. Zoumpatianos, and T. Palpanas, “Coconutpalm: Static and streaming data series exploration now in your palm,”in
SIGMOD , 2019.[19] ——, “Coconut: sortable summarizations for scalable indexes over staticand streaming data series,”
VLDBJ , vol. 28, no. 6, 2019.[20] D.-E. Yagoubi, R. Akbarinia, F. Masseglia, and T. Palpanas, “Massivelydistributed time series indexing and querying,”
TKDE , 2019.[21] K. Echihabi, K. Zoumpatianos, T. Palpanas, and H. Benbrahim, “Re-turn of the Lernaean Hydra: Experimental Evaluation of Data SeriesApproximate Similarity Search,”
PVLDB , 2019.[22] O. Levchenko, B. Kolev, D.-E. Yagoubi, D. Shasha, T. Palpanas,P. Valduriez, R. Akbarinia, and F. Masseglia, “Distributed Algorithmsto Find Similar Time Series,” in
ECML/PKDD , 2019.[23] B. Peng, T. Palpanas, and P. Fatourou, “Paris: The next destination forfast data series indexing and query answering,”
IEEE BigData , 2018.[24] ——, “Paris+: Data series indexing on multi-core architectures,”
TKDE ,2020.[25] ——, “Messi: In-memory data series indexing,” in
ICDE , 2020.[26] R. Agrawal, C. Faloutsos, and A. N. Swami, “Efficient similarity searchin sequence databases,” in
FODO , 1993.[27] T. Palpanas, “Evolution of a Data Series Index,”
Communications inComputer and Information Science (CCIS) , 2020.[28] J. Johnson, M. Douze, and H. J´egou, “Billion-scale similarity searchwith gpus,” arXiv preprint arXiv:1702.08734arXiv preprint arXiv:1702.08734