Tanveer J. Siddiqui | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tanveer J. Siddiqui is active.

Explore More

Publication

Featured researches published by Tanveer J. Siddiqui.

analytics for noisy unstructured text data | 2008

An unsupervised Hindi stemmer with heuristic improvements

Amaresh Kumar Pandey; Tanveer J. Siddiqui

Stemmers are used to convert inflected words into their root or stem. Stem does not necessarily correspond to linguistic root of a word. Stemming improve performance by reducing morphologically variants into same words. This paper presents an approach is to develop unsupervised Hindi stemmer. This paper focus on the development of an unsupervised stemmer for Hindi and evaluation of approach using manually segmented words. We evaluate our approach on 1000-1000 words randomly extracted words (only) from Hindi WordNet1 data base. The training data has been constructed by extracting 106403 words extracted from EMILLE2 corpus. The observed accuracy was found to be 89.9% after applying some heuristic measures. The F-score was 94.96%. As the algorithm does not require any language specific information, it can be applied to other Indian languages as well. We also evaluate the effect of stemmer in terms of reducing size of index for Hindi information retrieval task. The results have been compared with light weight stemmer [10] and UMass stemmer [17]. Test run shows that our stemmer outperforms both the stemmer.

2012 International Conference on Information Retrieval & Knowledge Management | 2012

Evaluating effect of context window size, stemming and stop word removal on Hindi word sense disambiguation

Satyendr Singh; Tanveer J. Siddiqui

This paper investigates the effects of stemming, stop word removal and size of context window on Hindi word sense disambiguation. The evaluation has been made on a manually created sense tagged corpus consisting of Hindi words (nouns). The sense definition has been obtained from Hindi WordNet, which is an important lexical resource for Hindi language developed at IIT Bombay. The maximum observed precision of 54.81% on 1248 test instances corresponds to the case when both stemming and stop words elimination has been performed. The % improvement in precision and recall is 9.24% and 12.68% over the baseline performance.

international conference on information systems | 2009

Using syntactic and contextual information for sentiment polarity analysis

Shaishav Agrawal; Tanveer J. Siddiqui

A new method for sentiment polarity analysis is presented. The method first assigns scores to a sentence using SentiWordNet and then uses heuristics to handle context dependent sentiment expressions. Instead of using score of all synsets of a word listed in SentiWordNet we use score of synsets of the same parts of speech only. Our method shows significant improvement on movie-review dataset over the baseline.

multimedia signal processing | 2011

DCT-domain robust data hiding using chaotic sequence

Siddharth Singh; Tanveer J. Siddiqui; Rajiv Singh; Harsh Vikram Singh

In this paper, we propose DCT-domain robust data hiding algorithm using chaotic sequence. The algorithm works by dividing the cover into blocks of equal sizes and then embeds the watermark in middle band of DCT coefficient. Performance evaluation for robustness and imperceptibility of proposed algorithm has been made using the bit error rate (BER) and the peak signal to noise ratio (PSNR) value for cover image and watermarked image. This algorithm is compared to PN sequence based DCT algorithm. The proposed algorithm provides more robustness against several common image processing attacks, such as JPEG compression, low-pass filtering and addition of noise. In case of JPEG compression attacks, for low quality compression (Q-60) it has been analyzed that more than 92% of the hidden data was recovered without any error and for same quality compression in PN sequence based DCT algorithm has more than 89% of the hidden data recovered without any error.

multi disciplinary trends in artificial intelligence | 2013

Hindi Word Sense Disambiguation Using Semantic Relatedness Measure

Satyendr Singh; Vivek Singh; Tanveer J. Siddiqui

In this paper we propose and evaluate a method of Hindi word sense disambiguation that computes similarity based on the semantics. We adapt an existing measure for semantic relatedness between two lexically expressed concepts of Hindi WordNet. This measure is based on the length of paths between noun concepts in an is-a hierarchy. Instead of relying on direct overlap the algorithm uses Hindi WordNet hierarchy to learn semantics of words and exploits it in the disambiguation process. Evaluation is performed on a sense tagged dataset consisting of 20 polysemous Hindi nouns. We obtained an overall average accuracy of 60.65% using this measure.

intelligent human computer interaction | 2012

Multi-document summarization using sentence clustering

Virendra Kumar Gupta; Tanveer J. Siddiqui

This paper presents an approach to query focused multi document summarization by combining single document summary using sentence clustering. Both syntactic and semantic similarity between sentences is used for clustering. Single document summary is generated using document feature, sentence reference index feature, location feature and concept similarity feature. Sentences from single document summaries are clustered and top most sentences from each cluster are used for creating multi-document summary. We observed an average F-measure of 0.33774 on DUC 2002 multi-document dataset, which is comparable to three best performing systems reported on the same dataset.

SIRS | 2016

Singular Value Decomposition Based Image Steganography Using Integer Wavelet Transform

Siddharth Singh; Rajiv Singh; Tanveer J. Siddiqui

Transform domain Steganography techniques embed secret message in significant areas of cover image. These techniques are generally more robust against common image processing operations. In this paper, we propose an image Steganography method using singular value decomposition (SVD) and integer wavelet transform (IWT). SVD and IWT strengthen the performance of image Steganography and improve the perceptual quality of Stego images. Results have been taken over standard image data sets and compared with discrete cosine transform (DCT) and redundant discrete wavelet transform (RDWT) based image Steganography methods using peak signal to noise ratio (PSNR) correlation coefficients (CC) metrics. Experimental results show that the proposed SVD and IWT based method provides more robustness against image processing and geometric attacks, such as JPEG compression, low-pass filtering, median filtering, and addition of noise, scaling, rotation, and histogram equalization.

international conference on power, control and embedded systems | 2012

Robust image steganography technique based on redundant discrete wavelet transform

Siddharth Singh; Tanveer J. Siddiqui

In this paper, a robust image steganography technique based on redundant discrete wavelet transforms has been proposed. Steganography is the process of hiding one medium of communication like text, audio and image within another. The proposed steganography algorithm uses blind recovery approach. We have tested the proposed method on different cover and payload images. Simulated results are analyzed using PSNR (Peak to Signal Noise Ratio) and BER (Bit Error Rate). Analysis of the proposed method shows its robustness against various signal processing and geometric attacks.

International Journal of Information Technology and Decision Making | 2008

UTILIZING LOCAL CONTEXT FOR EFFECTIVE INFORMATION RETRIEVAL

Tanveer J. Siddiqui; Uma Shanker Tiwary

Our research focuses on the use of local context through relation matching to improve retrieval effectiveness. An information retrieval (IR) model that integrates relation and keyword matching has been used in this work. The model takes advantage of any existing relational similarity between documents and query to improve retrieval effectiveness. It gives high rank to a document in which the query concepts are involved in similar relationships as in the query, as compared to those in which they are related differently. A conceptual graph (CG) representation has been used to capture relationship between concepts. A simplified form of graph matching has been used to keep our model computationally tractable. Structural variations have been captured during matching through simple heuristics. Four different CG similarity measures have been proposed and used to evaluate performance of our model. We observed a maximum improvement of 7.37% in precision with the second CG similarity measure. The document collection used in this study is CACM-3204. CG similarity measure proposed by us is simple, flexible and scalable and can find application in many IR related tasks like information filtering, information extraction, question answering, document summarization, etc.

intelligent systems design and applications | 2005

Integrating notion of agency and semantics in information retrieval: an intelligent multi-agent model

Tanveer J. Siddiqui

We present an intelligent information retrieval model based on multi-agent paradigm and conceptual graphs. The growing amount of on-line information and its dynamic nature forces us to reconsider existing passive approaches for information retrieval. Because of this ever-growing size of information sources the burden of retrieving information can not be simply left on users. We attempt to handle this problem through software agents. Our model makes use of a user modeling agent (UMA), a facilitator and integrator (FACINT) module and a number of retrieval agents. UMA is responsible for creating a user profile. Actual retrieval is done by retrieval agents. FACINT is responsible for controlling and coordinating activities of various agents. It also does final ranking of the documents based on conceptual graph (CG) representation of documents. The use of CG brings semantics in making relevance judgment resulting in improved ranking. The model proposed by us is simple, efficient, scalable and can work actively as well as passively.

Explore More