Alexander G. Ororbia | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexander G. Ororbia is active.

Explore More

Publication

Featured researches published by Alexander G. Ororbia.

Ai Magazine | 2015

CiteSeerX: AI in a Digital Library Search Engine

Jian Wu; Kyle Williams; Hung-Hsuan Chen; Madian Khabsa; Cornelia Caragea; Suppawong Tuarob; Alexander G. Ororbia; Douglas Jordan; Prasenjit Mitra; C. Lee Giles

CiteSeerX is a digital library search engine providing access to more than five million scholarly documents with nearly a million users and millions of hits per day. We present key AI technologies used in the following components: document classification and de-duplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5–6 years. We show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. We also present AI technologies implemented in table and algorithm search, which are special search modes in CiteSeerX. While it is challenging to rebuild a system like CiteSeerX from scratch, many of these AI technologies are transferable to other digital libraries and/or search engines.

acm/ieee joint conference on digital libraries | 2014

Towards building a scholarly big data platform: challenges, lessons and opportunities

Zhaohui Wu; Jian Wu; Madian Khabsa; Kyle Williams; Hung-Hsuan Chen; Wenyi Huang; Suppawong Tuarob; Sagnik Ray Choudhury; Alexander G. Ororbia; Prasenjit Mitra; C. Lee Giles

We introduce a Big Data platform that provides various services for harvesting scholarly information and enabling efficient scholarly applications. The core architecture of the platform is built on a secured private cloud, crawls data using a scholarly focused crawler that leverages a dynamic scheduler, processes by utilizing a map reduce based crawl-extraction-ingestion (CEI) workflow, and is stored in distributed repositories and databases. Services such as scholarly data harvesting, information extraction, and user information and log data analytics are integrated into the platform and provided by an OAI and RESTful API. We also introduce a set of scholarly applications built on top of this platform including citation recommendation and collaborator discovery.

knowledge discovery and data mining | 2017

Adversary Resistant Deep Neural Networks with an Application to Malware Detection

Qinglong Wang; Wenbo Guo; Kaixuan Zhang; Alexander G. Ororbia; Xinyu Xing; Xue Liu; C. Lee Giles

Outside the highly publicized victories in the game of Go, there have been numerous successful applications of deep learning in the fields of information retrieval, computer vision, and speech recognition. In cybersecurity, an increasing number of companies have begun exploring the use of deep learning (DL) in a variety of security tasks with malware detection among the more popular. These companies claim that deep neural networks (DNNs) could help turn the tide in the war against malware infection. However, DNNs are vulnerable to adversarial samples, a shortcoming that plagues most, if not all, statistical and machine learning models. Recent research has demonstrated that those with malicious intent can easily circumvent deep learning-powered malware detection by exploiting this weakness. To address this problem, previous work developed defense mechanisms that are based on augmenting training data or enhancing model complexity. However, after analyzing DNN susceptibility to adversarial samples, we discover that the current defense mechanisms are limited and, more importantly, cannot provide theoretical guarantees of robustness against adversarial sampled-based attacks. As such, we propose a new adversary resistant technique that obstructs attackers from constructing impactful adversarial samples by randomly nullifying features within data vectors. Our proposed technique is evaluated on a real world dataset with 14,679 malware variants and 17,399 benign programs. We theoretically validate the robustness of our technique, and empirically show that our technique significantly boosts DNN robustness to adversarial samples while maintaining high accuracy in classification. To demonstrate the general applicability of our proposed method, we also conduct experiments using the MNIST and CIFAR-10 datasets, widely used in image recognition research.

Neural Computation | 2017

Learning Simpler Language Models with the Differential State Framework

Alexander G. Ororbia; Tomas Mikolov; David Reitter

Learning useful information across long time lags is a critical and difficult problem for temporal neural models in tasks such as language modeling. Existing architectures that address the issue are often complex and costly to train. The differential state framework (DSF) is a simple and high-performing design that unifies previously introduced gated neural models. DSF models maintain longer-term memory by learning to interpolate between a fast-changing data-driven representation and a slowly changing, implicitly stable state. Within the DSF framework, a new architecture is presented, the delta-RNN. This model requires hardly any more parameters than a classical, simple recurrent network. In language modeling at the word and character levels, the delta-RNN outperforms popular complex architectures, such as the long short-term memory (LSTM) and the gated recurrent unit (GRU), and, when regularized, performs comparably to several state-of-the-art baselines. At the subword level, the delta-RNNs performance is comparable to that of complex gated architectures.

conference on information and knowledge management | 2016

Using Prerequisites to Extract Concept Maps fromTextbooks

Shuting Wang; Alexander G. Ororbia; Zhaohui Wu; Kyle Williams; Chen Liang; Bart Pursel; C. Lee Giles

We present a framework for constructing a specific type of knowledge graph, a concept map from textbooks. Using Wikipedia, we derive prerequisite relations among these concepts. A traditional approach for concept map extraction consists of two sub-problems: key concept extraction and concept relationship identification. Previous work for the most part had considered these two sub-problems independently. We propose a framework that jointly optimizes these sub-problems and investigates methods that identify concept relationships. Experiments on concept maps that are manually extracted in six educational areas (computer networks, macroeconomics, precalculus, databases, physics, and geometry) show that our model outperforms supervised learning baselines that solve the two sub-problems separately. Moreover, we observe that incorporating textbook information helps with concept map extraction.

computer vision and pattern recognition | 2017

Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting in the Wild

Dafang He; Xiao Yang; Chen Liang; Zihan Zhou; Alexander G. Ororbia; Daniel Kifer; C. Lee Giles

Scene text detection has attracted great attention these years. Text potentially exist in a wide variety of images or videos and play an important role in understanding the scene. In this paper, we present a novel text detection algorithm which is composed of two cascaded steps: (1) a multi-scale fully convolutional neural network (FCN) is proposed to extract text block regions, (2) a novel instance (word or line) aware segmentation is designed to further remove false positives and obtain word instances. The proposed algorithm can accurately localize word or text line in arbitrary orientations, including curved text lines which cannot be handled in a lot of other frameworks. Our algorithm achieved state-of-the-art performance in ICDAR 2013 (IC13), ICDAR 2015 (IC15) and CUTE80 and Street View Text (SVT) benchmark datasets.

Neural Computation | 2017

Unifying adversarial training algorithms with data gradient regularization

Alexander G. Ororbia; Daniel Kifer; C. Lee Giles

Many previous proposals for adversarial training of deep neural nets have included directly modifying the gradient, training on a mix of original and adversarial examples, using contractive penalties, and approximately optimizing constrained adversarial objective functions. In this article, we show that these proposals are actually all instances of optimizing a general, regularized objective we call DataGrad. Our proposed DataGrad framework, which can be viewed as a deep extension of the layerwise contractive autoencoder penalty, cleanly simplifies prior work and easily allows extensions such as adversarial training with multitask cues. In our experiments, we find that the deep gradient regularization of DataGrad (which also has L1 and L2 flavors of regularization) outperforms alternative forms of regularization, including classical L1, L2, and multitask, on both the original data set and adversarial sets. Furthermore, we find that combining multitask optimization with DataGrad adversarial training results in the most robust performance.

acm ieee joint conference on digital libraries | 2017

Smart library: identifying books on library shelves using supervised deep learning for scene text reading

Xiao Yang; Dafang He; Wenyi Huang; Alexander G. Ororbia; Zihan Zhou; Daniel Kifer; C. Lee Giles

Physical library collections are valuable and long standing resources for knowledge and learning. However, managing and finding books or other volumes on a large collection of bookshelves often leads to tedious manual work, especially for large collections where books or others might be missing or misplaced. Recently, deep neural-based models have been successful in detecting and recognizing text in images taken from natural scenes. Based on this, we investigate deep learning for facilitating book management. This task introduces further challenges including image distortion and varied lighting conditions. We present a library inventory building and retrieval system based on scene text reading. We specifically design our text recognition model using rich supervision to accelerate training and achieve state-of-the- art performance on several benchmark datasets. Our proposed system has the potential to greatly reduce the amount of manual labor required for managing book inventories.

international world wide web conferences | 2015

Big Scholarly Data in CiteSeerX: Information Extraction from the Web

Alexander G. Ororbia; Jian Wu; Madian Khabsa; Kyle Williams; C.L. Giles

We examine CiteSeerX, an intelligent system designed with the goal of automatically acquiring and organizing large-scale collections of scholarly documents from the world wide web. From the perspective of automatic information extraction and modes of alternative search, we examine various functional aspects of this complex system with an eye towards ongoing and future research developments.

international conference on social computing | 2015

Error-Correction and Aggregation in Crowd-Sourcing of Geopolitical Incident Information

Alexander G. Ororbia; Yang Xu; Vito D’Orazio; David Reitter

A discriminative model is presented for crowd-sourcing the annotation of news stories to produce a structured dataset about incidents involving militarized disputes between nation-states. We used a question tree to gather partially redundant data from each crowd worker. A lattice of Bayesian Networks was then applied to error correct the individual worker annotations, the results of which were then aggregated via majority voting. The resulting hybrid model outperformed comparable, state-of-the-art aggregation models in both accuracy and computational scalability.

Explore More