Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shariq Bashir is active.

Publication


Featured researches published by Shariq Bashir.


european conference on information retrieval | 2010

Improving retrievability of patents in prior-art search

Shariq Bashir; Andreas Rauber

Prior-art search is an important task in patent retrieval. The success of this task relies upon the selection of relevant search queries. Typically terms for prior-art queries are extracted from the claim fields of query patents. However, due to the complex technical structure of patents, and presence of terms mismatch and vague terms, selecting relevant terms for queries is a difficult task. During evaluating the patents retrievability coverage of prior-art queries generated from query patents, a large bias toward a subset of the collection is experienced. A large number of patents either have a very low retrievability score or can not be discovered via any query. To increase the retrievability of patents, in this paper we expand prior-art queries generated from query patents using query expansion with pseudo relevance feedback. Missing terms from query patents are discovered from feedback patents, and better patents for relevance feedback are identified using a novel approach for checking their similarity with query patents. We specifically focus on how to automatically select better terms from query patents based on their proximity distribution with prior-art queries that are used as features for computing similarity. Our results show, that the coverage of prior-art queries can be increased significantly by incorporating relevant queries terms using query expansion.


conference on information and knowledge management | 2009

Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection

Shariq Bashir; Andreas Rauber

High findability of documents within a certain cut-off rank is considered an important factor in recall-oriented application domains such as patent or legal document retrieval. Findability is hindered by two aspects, namely the inherent bias favoring some types of documents over others introduced by the retrieval model, and the failure to correctly capture and interpret the context of conventionally rather short queries. In this paper, we analyze the bias impact of different retrieval models and query expansion strategies. We furthermore propose a novel query expansion strategy based on document clustering to identify dominant relevant documents. This helps to overcome limitations of conventional query expansion strategies that suffer strongly from the noise introduced by imperfect initial query results for pseudo-relevance feedback documents selection. Experiments with different collections of patent documents suggest that clustering based document selection for pseudo-relevance feedback is an effective approach for increasing the findability of individual documents and decreasing the bias of a retrieval system.


database and expert systems applications | 2009

Analyzing Document Retrievability in Patent Retrieval Settings

Shariq Bashir; Andreas Rauber

Most information retrieval settings, such as web search, are typically precision-oriented, i.e. they focus on retrieving a small number of highly relevant documents. However, in specific domains, such as patent retrieval or law, recall becomes more relevant than precision: in these cases the goal is to find all relevant documents, requiring algorithms to be tuned more towards recall at the cost of precision. This raises important questions with respect to retrievability and search engine bias: depending on how the similarity between a query and documents is measured, certain documents may be more or less retrievable in certain systems, up to some documents not being retrievable at all within common threshold settings. Biases may be oriented towards popularity of documents (increasing weight of references), towards length of documents, favour the use of rare or common words; rely on structural information such as metadata or headings, etc. Existing accessibility measurement techniques are limited as they measure retrievability with respect to all possible queries. In this paper, we improve accessibility measurement by considering sets of relevant and irrelevant queries for each document. This simulates how recall oriented users create their queries when searching for relevant information. We evaluate retrievability scores using a corpus of patents from US Patent and Trademark Office.


international multi-topic conference | 2008

Seasonal to Inter-annual Climate Prediction Using Data Mining KNN Technique

Zahoor Jan; Muhammad Abrar; Shariq Bashir; Anwar M. Mirza

The impact of seasonal to inter-annual climate prediction on society, business, agriculture and almost all aspects of human life, force the scientist to give proper attention to the matter. The last few years show tremendous achievements in this field. All systems and techniques developed so far, use the Sea Surface Temperature (SST) as the main factor, among other seasonal climatic attributes. Statistical and mathematical models are then used for further climate predictions. In this paper, we develop a system that uses the historical weather data of a region (rain, wind speed, dew point, temperature, etc.), and apply the data-mining algorithm “K-Nearest Neighbor (KNN)” for classification of these historical data into a specific time span. The k nearest time spans (k nearest neighbors) are then taken to predict the weather. Our experiments show that the system generates accurate results within reasonable time for months in advance.


software engineering and advanced applications | 2008

Vimoware - A Toolkit for Mobile Web Services and Collaborative Computing

Hong Linh Truong; Lukasz Juszczyk; Shariq Bashir; Atif Manzoor; Schahram Dustdar

Mobile devices are considered to be very useful in ad-hoc and team collaborations, for example in disaster responses, where dedicated infrastructures are not available. Such collaborations normally require flexible and interoperable services while running on mobile devices and being integrated with various other services. Therefore, middleware and toolkits for developing mobile services which can be accessed by using standard interfaces and protocols are in demand. Due to the lack of tools, the support of the development of Web services and collaboration tools on mobile devices is still limited. This paper presents the Vimoware toolkit which allows both developers and users to develop Web services for mobile devices, to conduct ad-hoc team collaborations by executing pre-defined or on-situ flows of tasks, and to test collaboration scenarios.


Transactions on large-scale data- and knowledge-centered systems II | 2010

Improving retrievability and recall by automatic corpus partitioning

Shariq Bashir; Andreas Rauber

With increasing volumes of data, much effort has been devoted to finding the most suitable answer to an information need. However, in many domains, the question whether any specific information item can be found at all via a reasonable set of queries is essential. This concept of Retrievability of information has evolved into an important evaluation measure of IR systems in recall-oriented application domains. While several studies evaluated retrieval bias in systems, solid validation of the impact of retrieval bias and the development of methods to counter low retrievability of certain document types would be desirable. This paper provides an in-depth study of retrievability characteristics over queries of different length in a large benchmark corpus, validating previous studies. It analyzes the possibility of automatically categorizing documents into low and high retrievable documents based on document properties rather than complex retrievability analysis. We furthermore show, that this classification can be used to improve overall retrievability of documents by treating these classes as separate document corpora, combining individual retrieval results. Experiments are validated on 1.2 million patents of the TREC Chemical Retrieval Track.


pakistan section multitopic conference | 2005

HybridMiner: Mining Maximal Frequent Itemsets Using Hybrid Database Representation Approach

Shariq Bashir; Abdul Rauf Baig

In this paper we present a novel hybrid (array-based layout and vertical bitmap layout) database representation approach for mining complete maximal frequent itemset (MFI) on sparse and large datasets. Our work is novel in terms of scalability, item search order and two horizontal and vertical projection techniques. We also present a maximal algorithm using this hybrid database representation approach. Different experimental results on real and sparse benchmark datasets show that our approach is better than previous state of art maximal algorithms


Knowledge and Information Systems | 2014

Automatic ranking of retrieval models using retrievability measure

Shariq Bashir; Andreas Rauber

Analyzing retrieval model performance using retrievability (maximizing findability of documents) has recently evolved as an important measurement for recall-oriented retrieval applications. Most of the work in this domain is either focused on analyzing retrieval model bias or proposing different retrieval strategies for increasing documents retrievability. However, little is known about the relationship between retrievability and other information retrieval effectiveness measures such as precision, recall, MAP and others. In this study, we analyze the relationship between retrievability and effectiveness measures. Our experiments on TREC chemical retrieval track dataset reveal that these two independent goals of information retrieval, maximizing retrievability of documents and maximizing effectiveness of retrieval models are quite related to each other. This correlation provides an attractive alternative for evaluating, ranking or optimizing retrieval models’ effectiveness on a given corpus without requiring any ground truth available (relevance judgments).


acs/ieee international conference on computer systems and applications | 2008

Mining fault tolerant frequent patterns using pattern growth approach

Shariq Bashir; Zahid Halim; Abdul Rauf Baig

Mining fault tolerant (FT) frequent patterns from transactional datasets are very complex than mining all frequent patterns (itemsets), in terms of both search space exploration and support counting of candidate FT-patterns. Previous studies on mining FT frequent patterns adopt Apriori-like candidate set generation- and-test approach, in which a number of dataset scans are needed to declare a candidate FT-pattern frequent. First for checking its FT-pattern support, and then for checking its individual items support present in its FT- pattern which depends on the cardinality of pattern. Inspired from the pattern growth technique for mining frequent itemsets, in this paper we present a novel algorithm for mining FT frequent patterns using pattern growth approach. Our algorithm stores the original transactional dataset in a highly condensed, much smaller data structure called FT-FP-tree, and the FT-pattern support and item support of all the FT- patterns are counting directly from the FT-FP-tree, without scanning the original dataset multiple times. While costly candidate set generations are avoided by generating conditional patterns from FT-FP-tree. Our extensive experiments on benchmark datasets suggest that, mining FT frequent patterns using our algorithm is highly efficient as compared to Apriori-like approach.


Expert Systems With Applications | 2012

Improving retrievability with improved cluster-based pseudo-relevance feedback selection

Shariq Bashir

High findability of documents within a certain cut-off rank is considered an important factor in recall-oriented application domains such as patent or legal document retrieval. Findability is hindered by two aspects, namely the inherent bias favoring some types of documents over others introduced by the retrieval model, and the failure to correctly capture and interpret the context of conventionally rather short queries. In this paper, we analyze the bias impact of different retrieval models and query expansion strategies. We furthermore propose a novel query expansion strategy based on document clustering to identify dominant relevant documents. This helps to overcome limitations of conventional query expansion strategies that suffer strongly from the noise introduced by imperfect initial query results for pseudo-relevance feedback documents selection. Experiments with different collections of patent documents suggest that clustering based document selection for pseudo-relevance feedback is an effective approach for increasing the findability of individual documents and decreasing the bias of a retrieval system.

Collaboration


Dive into the Shariq Bashir's collaboration.

Top Co-Authors

Avatar

Andreas Rauber

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

A. Rauf Baig

National University of Computer and Emerging Sciences

View shared research outputs
Top Co-Authors

Avatar

Zahoor Jan

National University of Computer and Emerging Sciences

View shared research outputs
Top Co-Authors

Avatar

Atif Manzoor

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar

Hong Linh Truong

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar

Lukasz Juszczyk

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar

Schahram Dustdar

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar

Sadaf Khurshid

National University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Anwar M. Mirza

National University of Computer and Emerging Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge