Flavio Villanustre | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Flavio Villanustre is active.

Explore More

Publication

Featured researches published by Flavio Villanustre.

Journal of Big Data | 2015

Deep learning applications and challenges in big data analytics

Maryam M. Najafabadi; Flavio Villanustre; Taghi M. Khoshgoftaar; Naeem Seliya; Randall Wald; Edin Muharemagic

Big Data Analytics and Deep Learning are two high-focus of data science. Big Data has become important as many organizations both public and private have been collecting massive amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. Companies such as Google and Microsoft are analyzing large volumes of data for business analysis and decisions, impacting existing and future technology. Deep Learning algorithms extract high-level, complex abstractions as data representations through a hierarchical learning process. Complex abstractions are learnt at a given level based on relatively simpler abstractions formulated in the preceding level in the hierarchy. A key benefit of Deep Learning is the analysis and learning of massive amounts of unsupervised data, making it a valuable tool for Big Data Analytics where raw data is largely unlabeled and un-categorized. In the present study, we explore how Deep Learning can be utilized for addressing some important problems in Big Data Analytics, including extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks. We also investigate some aspects of Deep Learning research that need further exploration to incorporate specific challenges introduced by Big Data Analytics, including streaming data, high-dimensional data, scalability of models, and distributed computing. We conclude by presenting insights into relevant future works by posing some questions, including defining data sampling criteria, domain adaptation modeling, defining criteria for obtaining useful data abstractions, improving semantic indexing, semi-supervised learning, and active learning.

Archive | 2016

Introduction to Big Data

Borko Furht; Flavio Villanustre

In this chapter we present the basic terms and concepts in Big Data computing. Big data is a large and complex collection of data sets, which is difficult to process using on-hand database management tools and traditional data processing applications. Big Data topics include the following activities:

Archive | 2016

Social Network Analytics: Hidden and Complex Fraud Schemes

Flavio Villanustre; Borko Furht

In this chapter we briefly describe several case studies of using HPCCC systems in social network analytics.

international workshop on big data software engineering | 2015

Industrial big data analytics: lessons from the trenches

Flavio Villanustre

Big Data Analytics in particular and Data Science in general have become key disciplines in the last decade. The convergence of Information Technology, Statistics and Mathematics, to explore and extract information from Big Data have challenged the way many industries used to operate, shifting the decision making process in many organizations. A new breed of Big Data platforms has appeared, to fulfill the needs to process data that is large, complex, variable and rapidly generated. The author describes the experience in this field from a company that provides Big Data analytics as its core business.

Proceedings of the 1st Workshop on The Science of Cyberinfrastructure | 2015

Dynamic Provisioning of Data Intensive Computing Middleware Frameworks: A Case Study

Linh Bao Ngo; Michael E. Payne; Flavio Villanustre; Richard Taylor; Amy W. Apon

Big data has become an important asset for industry, and academic disciplines now utilize large-scale data in their research. This fourth paradigm of scientific research has led to the inclusion of data management, processing, and analytic tools into the traditional high performance computing software libraries. This integration is facilitated through a collection of supporting software components that comprise a data intensive computing middleware framework. From a shared campus cyberinfrastructure perspective, this represents a new challenge to the system administrators in balancing between the traditional high performance computing software stacks and the new data-intensive middleware on the same physical computing resource. In turn, this limits researchers from having access to the new middleware tools while administrators determine how to overcome the challenge. In this paper, we present our experience in configuring dynamic provisioning of two different data-intensive middleware frameworks from a user perspective. We describe the configuration process from setting up dependencies to deploying the middleware, and how this experience can be applied by other researchers and administrators.

international conference on big data | 2014

Managing the academic data lifecycle: A case study of HPCC

Michael E. Payne; Linh Bao Ngo; Flavio Villanustre; Amy W. Apon

Academic data can be classified into multiple categories and come from a large number of sources. Many research areas require combining data from different sources into a unified set on which analytical techniques can be applied. In this research paper the authors introduce the High Performance Computing Cluster (HPCC) as a platform to streamline the process of ingesting, curating, integrating and transforming scholarly data from multiple sources and in varying formats, particularly when several of these datasets lack common attributes to support the integration process.

Journal of Big Data | 2017

Large-scale distributed L-BFGS

Maryam M. Najafabadi; Taghi M. Khoshgoftaar; Flavio Villanustre; John Holt

With the increasing demand for examining and extracting patterns from massive amounts of data, it is critical to be able to train large models to fulfill the needs that recent advances in the machine learning area create. L-BFGS (Limited-memory Broyden Fletcher Goldfarb Shanno) is a numeric optimization method that has been effectively used for parameter estimation to train various machine learning models. As the number of parameters increase, implementing this algorithm on one single machine can be insufficient, due to the limited number of computational resources available. In this paper, we present a parallelized implementation of the L-BFGS algorithm on a distributed system which includes a cluster of commodity computing machines. We use open source HPCC Systems (High-Performance Computing Cluster) platform as the underlying distributed system to implement the L-BFGS algorithm. We initially provide an overview of the HPCC Systems framework and how it allows for the parallel and distributed computations important for Big Data analytics and, subsequently, we explain our implementation of the L-BFGS algorithm on this platform. Our experimental results show that our large-scale implementation of the L-BFGS algorithm can easily scale from training models with millions of parameters to models with billions of parameters by simply increasing the number of commodity computational nodes.

Archive | 2016

HPCC Systems for Cyber Security Analytics

Flavio Villanustre; Mauricio Renzi

Many of the most daunting challenges in today’s cyber security world stem from a constant and overwhelming flow of raw network data. The volume, variety, and velocity at which this raw data is created and transmitted across networks is staggering; so staggering in fact, that the vast majority of data is typically regarded as background noise, often discarded or ignored, and thus stripped of the immense potential value that could be realized through proper analysis. When an organization is capable of comprehending this data in its totality—whether it originates from firewall logs, IDS alerts, server event logs, or other sources—then it can begin to identify and trace the markers, clues, and clusters of activity that represent threatening behavior.

Archive | 2016

Deep Learning Techniques in Big Data Analytics

Maryam M. Najafabadi; Flavio Villanustre; Taghi M. Khoshgoftaar; Naeem Seliya; Randall Wald; Edin Muharemagc

Proceedings of the 3rd annual conference on Research in information technology | 2014

Big data trends and evolution: a human perspective

Flavio Villanustre

The Big Data revolution has already happened and, through it, organizations started realizing the potential of using data to take better informed decisions, mitigate risks and overall better control their destiny. With all the benefits that Big Data brings, it also creates new challenges; the growing talent gap possibly being the most representative of them all. In order to effectively leverage Big Data, a new profession is emerging: the data scientist. Tasked with understanding the methodologies to process and analyze vast and complex data, this professional must possess knowledge in a broad spectrum of domains, including mathematics (calculus, linear algebra, statistics, probabilities and even possibly category theory), programming languages (Python and R being frequently cited), data processing and analysis expertise (profiling, parsing, cleansing, linking), machine learning techniques (supervised and unsupervised learning, dimensionality reduction, feature selection, etc.) and business domain knowledge. While it is conceivable to identify individuals that can achieve this breadth of knowledge with significant depth, it is unreasonable to expect this to be the norm, so these individuals fall usually far into the upper tail of the population distribution. To make things worse, the current toolsets available to the data scientist tend to be very involved and require considerable amounts of time to develop applications, reducing the overall effectiveness of these experts. The solution to this talent gap is certainly not to try and breed a new step up the evolutionary ladder that can cope with this vast knowledge, but to create radically different abstractions as part of the toolsets that data scientists use, to increase efficiency and reduce the scope of the basic knowledge required to build Big Data applications. During this presentation we will explore this challenge and provide a new perspective on more efficient toolsets for Big Data applications.

Explore More