Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Arash Joorabchi is active.

Publication


Featured researches published by Arash Joorabchi.


Journal of Information Science | 2010

A citation-based approach to automatic topical indexing of scientific literature

Abdulhussain E. Mahdi; Arash Joorabchi

Topical indexing of documents with keyphrases is a common method used for revealing the subject of scientific and research documents to both human readers and information retrieval tools, such as search engines. However, scientific documents that are manually indexed with keyphrases are still in the minority. This article describes a new unsupervised method for automatic keyphrase extraction from scientific documents which yields a performance on a par with human indexers. The method is based on identifying references cited in the document to be indexed and, using the keyphrases assigned to those references, for generating a set of high-likelihood keyphrases for the document. We have evaluated the performance of the proposed method by using it to automatically index a third-party testset of research documents. Reported experimental results show that the performance of our method, measured in terms of consistency with human indexers, is competitive with that achieved by state-of-the-art supervised methods.


Journal of Information Science | 2013

Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms

Arash Joorabchi; Abdulhussain E. Mahdi

Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and information retrieval systems. This article describes a machine learning-based keyphrase annotation method for scientific documents that utilizes Wikipedia as a thesaurus for candidate selection from documents’ content. We have devised a set of 20 statistical, positional and semantical features for candidate phrases to capture and reflect various properties of those candidates that have the highest keyphraseness probability. We first introduce a simple unsupervised method for ranking and filtering the most probable keyphrases, and then evolve it into a novel supervised method using genetic algorithms. We have evaluated the performance of both methods on a third-party dataset of research papers. Reported experimental results show that the performance of our proposed methods, measured in terms of consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised and unsupervised methods.


Journal of Information Science | 2011

An unsupervised approach to automatic classification of scientific literature utilizing bibliographic metadata

Arash Joorabchi; Abdulhussain E. Mahdi

This article describes an unsupervised approach for automatic classification of scientific literature archived in digital libraries and repositories according to a standard library classification scheme. The method is based on identifying all the references cited in the document to be classified and, using the subject classification metadata of extracted references as catalogued in existing conventional libraries, inferring the most probable class for the document itself with the help of a weighting mechanism. We have demonstrated the application of the proposed method and assessed its performance by developing a prototype software system for automatic classification of scientific documents according to the Dewey Decimal Classification scheme. A dataset of 1000 research articles, papers, and reports from a well-known scientific digital library, CiteSeer, were used to evaluate the classification performance of the system. Detailed results of this experiment are presented and discussed.


Journal of Information Science | 2015

Automatic mapping of user tags to Wikipedia concepts

Arash Joorabchi; Michael English; Abdulhussain E. Mahdi

The uncontrolled nature of user-assigned tags makes them prone to various inconsistencies caused by spelling variations, synonyms, acronyms and hyponyms. These inconsistencies in turn lead to some of the common problems associated with the use of folksonomies such as the tag explosion phenomenon. Mapping user tags to their corresponding Wikipedia articles, as well-formed concepts, offers multifaceted benefits to the process of subject metadata generation and management in a wide range of online environments. These include normalization of inconsistencies, elimination of personal tags and improvement of the interchangeability of existing subject metadata. In this article, we propose a machine learning-based method capable of automatic mapping of user tags to their equivalent Wikipedia concepts. We have demonstrated the application of the proposed method and evaluated its performance using the currently most popular computer programming Q&A website, StackOverflow.com, as our test platform. Currently, around 20 million posts in StackOverflow are annotated with about 37,000 unique user tags, from which we have chosen a subset of 1256 tags to evaluate the accuracy performance of our proposed mapping method. We have evaluated the performance of our method using the standard information retrieval measures of precision, recall and F1. Depending on the machine learning-based classification algorithm used as part of the mapping process, F1 scores as high as 99.6% were achieved.


grid and cooperative computing | 2013

A new text representation scheme combining Bag-of-Words and Bag-of-Concepts approaches for automatic text classification

Alaa Alahmadi; Arash Joorabchi; Abdulhussain E. Mahdi

This paper introduces a new approach to creating text representations and apply it to a standard text classification collections. The approach is based on supplementing the well-known Bag-of-Words (BOW) representational scheme with a concept-based representation that utilises Wikipedia as a knowledge base. The proposed representations are used to generate a Vector Space Model, which in turn is fed into a Support Vector Machine classifier to categorise a collection of textual documents from two publically available datasets. Experimental results for evaluating the performance of our model in comparison to using a standard BOW scheme and a concept-based scheme, as well as recently reported similar text representations that are based on augmenting the standard BOW approach with concept-based representations.


Journal of Information Science | 2014

Towards linking libraries and Wikipedia: automatic subject indexing of library records with Wikipedia concepts

Arash Joorabchi; Abdulhussain E. Mahdi

In this article, we first argue the importance and timely need of linking libraries and Wikipedia for improving the quality of their services to information consumers, as such linkage will enrich the quality of Wikipedia articles and at the same time increase the visibility of library resources which are currently overlooked to a large degree. We then describe the development of an automatic system for subject indexing of library metadata records with Wikipedia concepts as an important step towards library–Wikipedia integration. The proposed system is based on first identifying all Wikipedia concepts occurring in the metadata elements of library records. This is then followed by training and deploying generic machine learning algorithms to automatically select those concepts which most accurately reflect the core subjects of the library materials whose records are being indexed. We have assessed the performance of the developed system using standard information retrieval measures of precision, recall and F-score on a dataset consisting of 100 library metadata records manually indexed with a total of 469 Wikipedia concepts. The evaluation results show that the developed system is capable of achieving an averaged F-score as high as 0.92.


Journal of Enterprise Information Management | 2016

Text mining stackoverflow: An insight into challenges and subject-related difficulties faced by computer science learners

Arash Joorabchi; Michael English; Abdulhussain E. Mahdi

Purpose – The use of social media and in particular community Question Answering (Q & A) websites by learners has increased significantly in recent years. The vast amounts of data posted on these sites provide an opportunity to investigate the topics under discussion and those receiving most attention. The purpose of this paper is to automatically analyse the content of a popular computer programming Q & A website, StackOverflow (SO), determine the exact topics of posted Q & As, and narrow down their categories to help determine subject difficulties of learners. By doing so, the authors have been able to rank identified topics and categories according to their frequencies, and therefore, mark the most asked about subjects and, hence, identify the most difficult and challenging topics commonly faced by learners of computer programming and software development. Design/methodology/approach – In this work the authors have adopted a heuristic research approach combined with a text mining approach to investigat...


european conference on research and advanced technology for digital libraries | 2009

Leveraging the legacy of conventional libraries for organizing digital libraries

Arash Joorabchi; Abdulhussain E. Mahdi

With the significant growth in the number of available electronic documents on the Internet, intranets, and digital libraries, the need for developing effective methods and systems to index and organize E-documents is felt more than ever. In this paper we introduce a new method for automatic text classification for categorizing E-documents by utilizing classification metadata of books, journals and other library holdings, that already exists in online catalogues of libraries. The method is based on identifying all references cited in a given document and, using the classification metadata of these references as catalogued in a physical library, devising an appropriate class for the document itself according to a standard library classification scheme with the help of a weighting mechanism. We have demonstrated the application of the proposed method and assessed its performance by developing a prototype classification system for classifying electronic syllabus documents archived in the Irish National Syllabus Repository according to the well-known Dewey Decimal Classification (DDC) scheme.


european conference on research and advanced technology for digital libraries | 2008

Development of a National Syllabus Repository for Higher Education in Ireland

Arash Joorabchi; Abdulhussain E. Mahdi

With the significant growth in electronic education materials such as syllabus documents and lecture notes available on the Internet and intranets, there is a need for developing structured central repositories of such materials to allow both educators and learners to easily share, search and access them. This paper reports on our on-going work to develop a national repository for course syllabi in Ireland. In specific, it describes a prototype syllabus repository system for higher education in Ireland that has been developed by utilising a number of information extraction and document classification techniques, including a new fully unsupervised document classification method that uses a web search engine for automatic collection of training set for the classification algorithm. Preliminary experimental results for evaluating the systems performance are presented and discussed.


digital enterprise and information systems | 2011

Automatic Subject Classification of Scientific Literature Using Citation Metadata

Abdulhussain E. Mahdi; Arash Joorabchi

This paper describes a new method for automatic classification of scientific literature archived in digital libraries and repositories according to a standard library classification scheme. The method is based on identifying all the references cited in the document to be classified and, using the subject classification metadata of extracted references as catalogued in existing conventional libraries, inferring the most probable class for the document itself with the help of a weighting mechanism. We have demonstrated the application of the proposed method and assessed its performance by developing a prototype software system for automatic classification of scientific documents according to the Dewey Decimal Classification (DDC) scheme. A dataset of one thousand research articles, papers, and reports from a well-known scientific digital library, CiteSeer, were used to evaluate the classification performance of the system. Detailed results of this experiment are presented and discussed.

Collaboration


Dive into the Arash Joorabchi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge