Konstantinos F. Xylogiannopoulos

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Konstantinos F. Xylogiannopoulos is active.

Explore More

Publication

Featured researches published by Konstantinos F. Xylogiannopoulos.

Applied Intelligence | 2014

Analyzing very large time series using suffix arrays

Konstantinos F. Xylogiannopoulos; Panagiotis Karampelas; Reda Alhajj

Suffix arrays form a powerful data structure for pattern detection and matching. In a previous work, we presented a novel algorithm (COV) which is the only algorithm that allows the detection of all repeated patterns in a time series by using the actual suffix array. However, the requirements for storing the actual suffix strings even on external media makes the use of suffix arrays impossible for very large time series. We have already proved that using the concept of Longest Expected Repeated Pattern (LERP) allows the actual suffices to be stored in linear capacity O(n) on external media. The repeated pattern detection using LERP has analogous time complexity, and thus makes the analysis of large time series feasible and limited only to the size of the external media and not memory. Yet, there are cases when hardware limitations might be an obstacle for the analysis of very larger time series of size comparable to hard disk capacity. With the Moving LERP (MLERP) method introduced in this paper, it is possible to analyze very large time series (of size tens or hundreds thousands times larger than what the LERP can analyze) by maximal utilization of the available hardware. Further, when empirical knowledge related to the distribution of repeated pattern’s length is available, the proposed method (MLERP) can achieve better time performance compared to the standard LERP method and definitely much better than using any other pattern matching algorithm and applying brute force techniques which are unfeasible in logical (human) time frame. Thus, we may argue that MLERP is a very useful tool for detecting all repeated patterns in a time series regardless of its size and hardware limitations.

Applied Intelligence | 2016

Repeated patterns detection in big data using classification and parallelism on LERP Reduced Suffix Arrays

Konstantinos F. Xylogiannopoulos; Panagiotis Karampelas; Reda Alhajj

Suffix array is a powerful data structure, used mainly for pattern detection in strings. The main disadvantage of a full suffix array is its quadratic O(n2) space capacity when the actual suffixes are needed. In our previous work [39], we introduced the innovative All Repeated Patterns Detection (ARPaD) algorithm and the Moving Longest Expected Repeated Pattern (MLERP) process. The former detects all repeated patterns in a string using a partition of the full Suffix Array and the latter is capable of analyzing large strings regardless of their size. Furthermore, the notion of Longest Expected Repeated Pattern (LERP), also introduced by the authors in a previous work, significantly reduces to linear O(n) the space capacity needed for the full suffix array. However, so far the LERP value has to be specified in ad hoc manner based on experimental or empirical values. In order to overcome this problem, the Probabilistic Existence of LERP theorem has been proven in this paper and, furthermore, a formula for an accurate upper bound estimation of the LERP value has been introduced using only the length of the string and the size of the alphabet used in constructing the string. The importance of this method is the optimum upper bounding of the LERP value without any previous preprocess or knowledge of string characteristics. Moreover, the new data structure LERP Reduced Suffix Array is defined; it is a variation of the suffix array, and has the advantage of permitting the classification and parallelism to be implemented directly on the data structure. All other alternative methodologies deal with the very common problem of fitting any kind of data structure in a computer memory or disk in order to apply different time efficient methods for pattern detection. The current advanced and elegant proposed methodology allows us to alter the above-mentioned problem such that smaller classes of the problem can be distributed on different systems and then apply current, state-of-the-art, techniques such as parallelism and cloud computing using advanced DBMSs which are capable of handling the storage and analysis of big data. The implementation of the above-described methodology can be achieved by invoking our innovative ARPaD algorithm. Extensive experiments have been conducted on small, comparable strings of Champernowne Constant and DNA as well as on extremely large strings of π with length up to 68 billion digits. Furthermore, the novelty and superiority of our methodology have been also tested on real life application such as a Distributed Denial of Service (DDoS) attack early warning system.

Experimental Mathematics | 2014

Experimental Analysis on the Normality of Using Advanced Data-Mining Techniques (Experimental Analysis on the Normality of pi, e, phi, and square root of 2 Using Advanced Data-Mining Techniques)

Konstantinos F. Xylogiannopoulos; Panagiotis Karampelas; Reda Alhajj

The main focus of the work described in this paper is to examine whether the famous mathematical constants are normal numbers. We have conducted extensive experiments with different attributes for each constant using advanced data-mining techniques, and we have tried to express a theoretical model that can help to determine with high probability whether the numbers are normal in base ten. We have expanded and generalized the experimental results so as to formulate conjectures about the attributes of a normal number, and we have presented conjectures that can lead to determining whether a number is normal. The experimental results and analysis have shown that indeed, satisfy the definition of a normal number. Not only does the distribution of each of the base-10 digits occur with frequency approximately one-tenth, as is known already for very large sequences, but we have also shown for the first time that all arrangements with repetition of digits up to a specific length, depending on the number of decimals examined, occur also with the expected frequencies. As a result, we establish a new process to examine not only whether these constants are just simply normal but normal as well.

ieee international conference on intelligent systems | 2012

Periodicity data mining in time series using Suffix Arrays

Konstantinos F. Xylogiannopoulos; Panagiotis Karampelas; Reda Alhajj

This research paper focuses on data mining in time series and its applications on financial data. Data-mining attempts to analyze time series and extract valuable information about pattern periodicity, which might be concealed by substantial amounts of unformatted, random information. Such information, however, is of great importance as it can be used to forecast future behavior. In this paper, a new methodology is introduced aiming to utilize Suffix Arrays in data mining instead of the commonly used data structure Suffix Trees. Although Suffix Arrays, normally, require high storage capacity, the algorithm proposed allows them to be constructed in linear time. The methodology is also extended to detect repeated patterns in time series with time complexity of. This, combined with the capability of external storage, creates a critical advantage, for an overall efficient data mining and analysis regarding the construction of time series data structure and periodicity detection. The test results, presented below demonstrate the applicability and effectiveness of the proposed technique.

workshop in information security theory and practice | 2014

Early DDoS Detection Based on Data Mining Techniques

Konstantinos F. Xylogiannopoulos; Panagiotis Karampelas; Reda Alhajj

In the past few years, internet has experienced a rapid growth in users and services. This led to an increase of different type of cyber-crimes. One of the most important is the Distributed Denial of Service (DDoS) attack, which someone can unleash through many different isolated hosts and make a system to shut down due to resources exhaustion. The importance of the problem can be easily identified due to the huge number of references found in literature trying to detect and prevent such attacks. In the current paper, a novel method based on a data mining technique is introduced in order to early warn the network administrator of a potential DDoS attack. The method uses the advanced All Repeated Patterns Detection (ARPaD) Algorithm, which allows the detection of all repeated patterns in a sequence. The proposed method can give very fast results regarding all IP prefixes in a sequence of hits and, therefore, warn the network administrator if a potential DDoS attack is under development. Based on several experiments conducted, it has been proven experimentally the importance of the method for the detection of a DDoS attack since it can detect a potential DDoS attack at the beginning and before it affects the system.

international conference on tools with artificial intelligence | 2012

Minimization of Suffix Array's Storage Capacity for Periodicity Detection in Time Series

Konstantinos F. Xylogiannopoulos; Panagiotis Karampelas; Reda Alhajj

In everyday life bulk amount of time-stamped data is accumulated in diverse databases. Such data may be mapped into a time-based representation forming very long time series which could be effectively analyzed for valuable knowledge discovery. However, most of the times analyzing these time series has been proven a very complicated task especially when they are very large. This paper tackles the problem by proposing an optimization method for storing very large time series in suffix arrays for further analysis, and repeated pattern detection is proposed as well. Based on this method, the required part of the time series to be stored for repeated pattern detection can be reduced by at least 25%. The method was applied to DNA chains with length up to 100,000,000 characters long and the corresponding results are presented.

Archive | 2012

Pattern Detection and Analysis in Financial Time Series Using Suffix Arrays

Konstantinos F. Xylogiannopoulos; Panagiotis Karampelas; Reda Alhajj

The current chapter focuses on data-mining techniques in exploring time series of financial data and more specifically of foreign exchange currency rates’ fluctuations. The data-mining techniques used attempt to analyze time series and extract, if possible, valuable information about pattern periodicity that might be hidden behind huge amount of unformatted and vague information. Such information is of great importance because it might be used to interpret correlations among different events regarding markets or even to forecast future behavior. In the present chapter a new methodology has been introduced to take advantage of suffix arrays in data mining instead of the commonly used data structure suffix trees. Although suffix arrays require high-storage capacity, in the proposed algorithm they can be constructed in linear time O(n) or O(nlogn) using an external database management system which allows better and faster results during analysis process. The proposed methodology is also extended to detect repeated patterns in time series with time complexity of O(nlogn). This along with the capability of external storage creates a critical advantage for an overall efficient data-mining analysis regarding construction of time series data structure and periodicity detection.

International Journal of Cyber Warfare and Terrorism (IJCWT) | 2017

Advanced Network Data Analytics for Large-Scale DDoS Attack Detection

Konstantinos F. Xylogiannopoulos; Panagiotis Karampelas; Reda Alhajj

Internet-enableddevicesorInternetofThingsasithasbeenprevailedareincreasingexponentially everyday.Thelackofsecuritystandardsinthemanufacturingofthesedevicesalongwiththehaste of themanufacturers to increase theirmarketshare in thisareahascreatedavery largenetwork ofvulnerabledevices thatcanbeeasilyrecruitedasbotmembersandusedto initiatevery large volumetricDistributedDenialofService(DDoS)attacks.Thesignificanceoftheproblemcanbe easilyacknowledgedduetothelargenumberofcasesregardingattacksoninstitutions,enterprisesand evencountrieswhichhavebeenrecentlyrevealed.Inthecurrentpaperanovelmethodisintroduced, whichisbasedonadataminingtechniquethatcananalyzeincomingIPtrafficdetailsandearly warnthenetworkadministratoraboutapotentiallydevelopingDDoSattack.Themethodcanscale dependingontheavailabilityoftheinfrastructurefromaconventionallaptopcomputertoacomplex cloudinfrastructure.Basedonthehardwareconfigurationasitisprovedwiththeexperimentsthe methodcaneasilymonitoranddetectabnormalnetworktrafficofseveralGbpsinrealtimeusing theminimumhardwareequipment. KeyWoRDS All Repeated Patterns Detection, ARPaD, Data Mining, DDoS, Distributed Denial of Service, LERP-RSA, Suffix Array

advances in social networks analysis and mining | 2016

Frequent and non-frequent pattern detection in big data streams: an experimental simulation in 1 trillion data points

Konstantinos F. Xylogiannopoulos; Panagiotis Karampelas; Reda Alhajj

Big data streaming analysis nowadays has become one of the most important topic in the list of data analysts since enormous amount of data are produced daily by the numerous smart devices. The analysis of such data is very important and the detection of frequent or even non-frequent patterns can be critical for many aspects of our lives. In the current paper, we propose a new methodology based on our previous work regarding the detection of all repeated patterns in a string in order to analyze a very big data stream with 1 Trillion digits, composed from 1 thousand subsequences of 1 billion digits each one. More specifically, using the novel data structure, LERP Reduced Suffix Array, and the innovative ARPaD algorithm which allows the detection of all repeated patterns in a string we managed to analyze each one of the 1 billion data points, using 10 computers with standard hardware configuration, in 33 minutes which outperforms to the best of our knowledge any other existing methodology, which is equivalent to data point generation every 2 microseconds.

international conference on enterprise information systems | 2015

Discretization Method for the Detection of Local Extrema and Trends in Non-discrete Time Series

Konstantinos F. Xylogiannopoulos; Panagiotis Karampelas; Reda Alhajj

Mining, analysis and trend detection in time series is a very important problem for forecasting purposes. Many researchers have developed different methodologies applying techniques from different fields of science in order to perform such analysis. In this paper, we propose a new discretization method that allows the detection of local extrema and trends inside time series. The method uses sliding linear regression of specific time intervals to produce a new time series from the angle of each regression line. The new time series produced allows the detection of local extrema and trends in the original time series. We have conducted several experiments on financial time series in order to discover trends as well as pattern and periodicity detection to forecast future behavior of Dow Jones Industrial Average 30 Index.

Explore More