Hamid Mousavi
University of California, Los Angeles
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hamid Mousavi.
international conference on data engineering | 2011
Hetal Thakkar; Nikolay Laptev; Hamid Mousavi; Barzan Mozafari; Vincenzo Russo; Carlo Zaniolo
The problem of supporting data mining applications proved to be difficult for database management systems and it is now proving to be very challenging for data stream management systems (DSMSs), where the limitations of SQL are made even more severe by the requirements of continuous queries. The major technical advances that achieved separately on DSMSs and on data stream mining algorithms have failed to converge and produce powerful data stream mining systems. Such systems, however, are essential since the traditional pull-based approach of cache mining is no longer applicable, and the push-based computing mode of data streams and their bursty traffic complicate application development. For instance, to write mining applications with quality of service (QoS) levels approaching those of DSMSs, a mining analyst would have to contend with many arduous tasks, such as support for data buffering, complex storage and retrieval methods, scheduling, fault-tolerance, synopsis-management, load shedding, and query optimization. Our Stream Mill Miner (SMM) system solves these problems by providing a data stream mining workbench that combines the ease of specifying high-level mining tasks, as in Weka, with the performance and QoS guarantees of a DSMS. This is accomplished in three main steps. The first is an open and extensible DSMS architecture where KDD queries can be easily expressed as user-defined aggregates (UDAs)—our system combines that with the efficiency of synoptic data structures and mining-aware load shedding and optimizations. The second key component of SMM is its integrated library of fast mining algorithms that are light enough to be effective on data streams. The third advanced feature of SMM is a Mining Model Definition Language (MMDL) that allows users to define the flow of mining tasks, integrated with a simple box&arrow GUI, to shield the mining analyst from the complexities of lower-level queries. SMM is the first DSMS capable of online mining and this paper describes its architecture, design, and performance on mining queries.
very large data bases | 2013
Hamid Mousavi; Shi Gao; Carlo Zaniolo
Knowledge bases and structured summaries are playing a crucial role in many applications, such as text summarization, question answering, essay grading, and semantic search. Although, many systems (e.g., DBpedia and YaGo2) provide massive knowledge bases of such summaries, they all suffer from incompleteness, inconsistencies, and inaccuracies. These problems can be addressed and much improved by combining and integrating different knowledge bases, but their very large sizes and their reliance on different terminologies and ontologies make the task very difficult. In this demo, we will demonstrate a system that is achieving good success on this task by: i) employing available interlinks in the current knowledge bases (e.g. externalLink and redirect links in DBpedia) to combine information on individual entities, and ii) using widely available text corpora (e.g. Wikipedia) and our IBminer text-mining system, to generate and verify structured information, and reconcile terminologies across different knowledge bases. We will also demonstrate two tools designed to support the integration process in close collaboration with IBminer. The first is the InfoBox Knowledge-Base Browser (IBKB) which provides structured summaries and their provenance, and the second is the InfoBox Editor (IBE), which is designed to suggest relevant attributes for a user-specified subject, whereby the user can easily improve the knowledge base without requiring any knowledge about the internal terminology of individual systems.
extending database technology | 2011
Hamid Mousavi; Carlo Zaniolo
Equi-depth histograms represent a fundamental synopsis widely used in both database and data stream applications, as they provide the cornerstone of many techniques such as query optimization, approximate query answering, distribution fitting, and parallel database partitioning. Equi-depth histograms try to partition a sequence of data in a way that every part has the same number of data items. In this paper, we present a new algorithm to estimate equi-depth histograms for high speed data streams over sliding windows. While many previous methods were based on quantile computations, we propose a new method called BAr Splitting Histogram (BASH) that provides an expected ε-approximate solution to compute the equi-depth histogram. Extensive experiments show that BASH is at least four times faster than one of the best existing approaches, while achieving similar or better accuracy and in some cases using less memory. The experimental results also indicate that BASH is more stable on data affected by frequent concept shifts.
ieee international conference semantic computing | 2014
Hamid Mousavi; Deirdre Kerr; Markus Iseli; Carlo Zaniolo
The Web has made possible many advanced text-mining applications, such as news summarization, essay grading, question answering, and semantic search. For many of such applications, statistical text-mining techniques are ineffective since they do not utilize the morphological structure of the text. Thus, many approaches use NLP-based techniques, that parse the text and use patterns to mine and analyze the parse trees which are often unnecessarily complex. Therefore, we propose a weighted-graph representation of text, called Text Graphs, which captures the grammatical and semantic relations between words and terms in the text. Text Graphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves co references by a novel technique, generates domain-specific Text Graphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining Text Graphs.
very large data bases | 2013
Hamid Mousavi; Shi Gao; Carlo Zaniolo
We propose the <i>Context-aware Synonym Suggestion System</i> (<i>CS</i><sup>3</sup>) which learns synonyms from text by using our NLP-based text mining framework, called <i>SemScape</i>, and also from existing evidence in the current knowledge bases (KBs). Using <i>CS</i><sup>3</sup> and our previously proposed knowledge extraction system <i>IBminer</i>, we integrate some of the publicly available knowledge bases into one of the superior quality and coverage, called <i>IKBstore</i>.
international conference on management of data | 2014
Hamid Mousavi; Maurizio Atzori; Shi Gao; Carlo Zaniolo
Wikipedias InfoBoxes play a crucial role in advanced applications and provide the main knowledge source for DBpedia and the powerful structured queries it supports. However, InfoBoxes, which were created by crowdsourcing for human rather than computer consumption, suffer from incompleteness, inconsistencies, and inaccuracies. To overcome these problems, we have developed (i) the IBminer system that extracts InfoBox information by text-mining Wikipedia pages, (ii) the IKBStore system that integrates the information derived by IBminer with that of DBpedia, YAGO2,WikiData,WordNet, and other sources, and (iii) SWiPE and InfoBox Editor (IBE) that provide a user-friendly interfaces for querying and revising the knowledge base. Thus, IBminer uses a deep NLP-based approach to extract from text a semantic representation structure called TextGraph from which the system detects patterns and derives subject-attribute-value relations, as well as domain-specific synonyms for the knowledge base. IKBStore and IBE complement the powerful, user-friendly, by-example structured queries of SWiPE by supporting the validation and provenance history for the information contained in the knowledge base, along with the ability of upgrading its knowledge when this is found incomplete, incorrect, or outdated.
ieee international conference semantic computing | 2014
Hamid Mousavi; Deirdre Kerr; Markus Iseli; Carlo Zaniolo
Ontologies are a vital component of most knowledge-based applications, including semantic web search, intelligent information integration, and natural language processing. In particular, we need effective tools for generating in-depth ontologies that achieve comprehensive converge of specific application domains of interest, while minimizing the time and cost of this process. Therefore we cannot rely on the manual or highly supervised approaches often used in the past, since they do not scale well. We instead propose a new approach that automatically generates domain-specific ontologies from a small corpus of documents using deep NLP-based text-mining. Starting from an initial small seed of domain concepts, our Onto Harvester system iteratively extracts ontological relations connecting existing concepts to other terms in the text, and adds strongly connected terms to the current ontology. As a result, Onto Harvester (i) remains focused on the application domain, (ii) is resistant to noise, and (iii) generates very comprehensive ontologies from modest-size document corpora. In fact, starting from a small seed, Onto Harvester produces ontologies that outperform both manually generated ontologies and ontologies generated by current techniques, even those that require very large well-focused data sets.
statistical and scientific database management | 2013
Hamid Mousavi; Carlo Zaniolo
Histograms provide effective synopses of large data sets, and are thus used in a wide variety of applications, including query optimization, approximate query answering, distribution fitting, parallel database partitioning, and data mining. Moreover, very fast approximate algorithms are needed to compute accurate histograms on fast-arriving data streams, whereby online queries can be supported within the given memory and computing resources. Many real-life applications require that the data distribution in certain regions must be modeled with greater accuracy, and Biased Histograms are designed to address this need. In this paper, we define biased histograms over data streams and sliding windows on data streams, and propose the Bar Splitting Biased Histogram (BSBH) algorithm to construct them efficiently and accurately. We prove that BSBH generates expected ∈-approximate biased histograms for data streams with stationary distributions, and our experiments show that BSBH also achieves good approximation in the presence of concept shifts, even major ones. Additionally, BSBH employs a new biased sampling technique which outperforms uniform sampling in terms of accuracy, while using about the same amount of time and memory. Therefore, BSBH outperforms previously proposed algorithms for computing biased histograms over the whole data stream, and it is the first algorithm that supports windows.
ieee international conference semantic computing | 2011
Hamid Mousavi; Deirdre Kerr; Markus Iseli
This paper introduces a new text mining framework using a tree-based Linguistic Query Language, called LQL. The framework generates more than one parse tree for each sentence using a probabilistic parser, and annotates each node of these parse trees with \textit{main-parts} information which is set of key terms from the nodes branch based on the branchs linguistic structure. Using main-parts-annotated parse trees, the system can efficiently answer individual queries as well as mine the text for a given set of queries. The framework can also support grammatical ambiguity through probabilistic rules and linguistic exceptions.
International Journal of Semantic Computing | 2014
Hamid Mousavi; Shi Gao; Deirdre Kerr; Markus Iseli; Carlo Zaniolo
The Web is making possible many advanced text-mining applications, such as news summarization, essay grading, question answering, semantic search and structured queries on corpora of Web documents. For many of such applications, statistical text-mining techniques are of limited effectiveness since they do not utilize the morphological structure of the text. On the other hand, many approaches use NLP-based techniques that parse the text into parse trees, and then use patterns to mine and analyze parse trees which are often unnecessarily complex. To reduce this complexity and ease the entire process of text mining, we propose a weighted-graph representation of text, called TextGraphs, which captures the grammatical and semantic relations between words and terms in the text. TextGraphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves coreferences by a novel technique, generates domain-specific TextGraphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining TextGraphs.