Utpal Sharma | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Utpal Sharma is active.

Explore More

Publication

Featured researches published by Utpal Sharma.

international conference on communication computing security | 2011

Detection of HTTP flooding attacks in multiple scenarios

Debasish Das; Utpal Sharma; Dhruba K. Bhattacharyya

HTTP GET flooding attack is considered to be one of the most successful attacks of Application Layer Denial of Service (App-DoS). Detection of such attack is difficult due to its non-intrusive nature. This paper presents an effective method for detection of such App-DoS attack for three different scenarios. The proposed method was tested over three real-life data-sets, i.e. KDD99, LBNL and our own data-set, and has been found to perform satisfactorily.

meeting of the association for computational linguistics | 2002

Unsupervised Learning of Morphology for Building Lexicon for a Highly Inflectional Language

Utpal Sharma; Jugal K. Kalita; Rajib K. Das

Words play a crucial role in aspects of natural language understanding such as syntactic and semantic processing. Usually, a natural language understanding system either already knows the words that appear in the text, or is able to automatically learn relevant information about a word upon encountering it. Usually, a capable system---human or machine, knows a subset of the entire vocabulary of a language and morphological rules to determine attributes of words not seen before. Developing a knowledge base of legal words and morphological rules is an important task in computational linguistics. In this paper, we describe initial experiments following an approach based on unsupervised learning of morphology from a text corpus, especially developed for this purpose. It is a method for conveniently creating a dictionary and a morphology rule base, and is, especially suitable for highly inflectional languages like Assamese. Assamese is a major Indian language of the Indic branch of the Indo-European family of languages. It is used by around 15 million people.

meeting of the association for computational linguistics | 2009

Part of Speech Tagger for Assamese Text

Navanath Saharia; Dhrubajyoti Das; Utpal Sharma; Jugal K. Kalita

Assamese is a morphologically rich, agglutinative and relatively free word order Indic language. Although spoken by nearly 30 million people, very little computational linguistic work has been done for this language. In this paper, we present our work on part of speech (POS) tagging for Assamese using the well-known Hidden Markov Model. Since no well-defined suitable tagset was available, we develop a tagset of 172 tags in consultation with experts in linguistics. For successful tagging, we examine relevant linguistic issues in Assamese. For unknown words, we perform simple morphological analysis to determine probable tags. Using a manually tagged corpus of about 10000 words for training, we obtain a tagging accuracy of nearly 87% for test inputs.

ACM Transactions on Asian Language Information Processing | 2008

Acquisition of Morphology of an Indic Language from Text Corpus

Utpal Sharma; Jugal K. Kalita; Rajib K. Das

This article describes an approach to unsupervised learning ofmorphology from an unannotated corpus for a highly inflectionalIndo-European language called Assamese spoken by about 30 millionpeople. Although Assamese is one of Indias national languages, itutterly lacks computational linguistic resources. There exists noprior computational work on this language spoken widely innortheast India. The work presented is pioneering in this respect.In this article, we discuss salient issues in Assamese morphologywhere the presence of a large number of suffixal determiners,sandhi, samas, and the propensity to use suffix sequences makeapproximately 50% of the words used in written and spoken textinflected. We implement methods proposed by Gaussier and Goldsmithon acquisition of morphological knowledge, and obtain F-measureperformance below 60%. This motivates us to present a method moresuitable for handling suffix sequences, enabling us to increase theF-measure performance of morphology acquisition to almost 70%. Wedescribe how we build a morphological dictionary for Assamese fromthe text corpus. Using the morphological knowledge acquired and themorphological dictionary, we are able to process small chunks ofdata at a time as well as a large corpus. We achieve approximately85% precision and recall during the analysis of small chunks ofcoherent text.

advances in computing and communications | 2012

Analysis and evaluation of stemming algorithms: a case study with Assamese

Navanath Saharia; Utpal Sharma; Jugal K. Kalita

Stemming is the process of automatically extracting the base form of a given word of a language. Assamese is a morphologically rich, relatively free word order, Indo-Aryan language spoken in North-Eastern part of India that uses Assamese-Bengali script for writing. As it is among the less computationally studied languages, our aim is to extract stem from a given word. We adopt the suffix stripping approach along with a rule engine that generates all the possible suffix sequences. We found 82% accuracy with the suffix stripping approach after adding a root-word list of size 20,000 approximately.

International Journal of Computer Applications | 2010

An Approach to Detection of SQL Injection Vulnerabilities Based on Dynamic Query Matching

Debasish Das; Utpal Sharma; D.K. SBhattacharyya

Web is one of the most popular internet services in today’s world. In today’s world, web servers and web based applications are the popular corporate applications and become the targets of the attackers. A Large number of Web applications, especially those deployed for companies to ebusiness operation involve high reliability, efficiency and confidentiality. Such applications are written in script languages like PHP embedded in HTML allowing establish the connection to databases, retrieving data and putting them in WWW site. In order to detect known attacks, misuse detection of web based attacks consists of attack rules and descriptions. Misuse detection considers predefined signatures for intrusion detection. One of the most common in web application attack is SQL Injections. Here an attacker exploits with faulty input strings so that the dynamic queries generate by the web application changes the structure designed by the developer. Thus, the SQL injected query generated becomes maliciously crafted queries. In this paper we have tried to classify the SQL Injection attack based on their vulnerabilities in web applications. We have also reported the approaches and how implemented in recent years by some of the researcher’s in their methodologies for detection and protection of SQL Injection attacks. Our technique of classification has avoided

international conference on emerging technologies | 2008

An intrusion detection mechanism based on feature based data clustering

Debasish Das; Utpal Sharma; Dhruba K. Bhattacharyya

Recently clustering methods have gained importance in addressing network security issues, including network intrusion detection. In clustering, unsupervised anomaly detection has great utility within the context of intrusion detection system. Such a system can work without the need for massive sets of pre-labeled training data. Intrusion detection system (IDS) aims to identify attacks with a high detection rate and a low false alarm rate. This paper presents a scheme to achieve this goal. The scheme is designed based on an unsupervised clustering and a labeling technique. The technique has been found to perform with high precision at low false alarm rate over KDD99 dataset.

international conference on computational linguistics | 2013

An improved stemming approach using HMM for a highly inflectional language

Navanath Saharia; Kishori M. Konwar; Utpal Sharma; Jugal K. Kalita

Stemming is a common method for morphological normalization of natural language texts. Modern information retrieval systems rely on such normalization techniques for automatic document processing tasks. High quality stemming is difficult in highly inflectional Indic languages. Little research has been performed on designing algorithms for stemming of texts in Indic languages. In this study, we focus on the problem of stemming texts in Assamese, a low resource Indic language spoken in the North-Eastern part of India by approximately 30 million people. Stemming is hard in Assamese due to the common appearance of single letter suffixes as morphological inflections. More than 50% of the inflections in Assamese appear as single letter suffixes. Such single letter morphological inflections cause ambiguity when predicting underlying root word. Therefore, we propose a new method that combines a rule based algorithm for predicting multiple letter suffixes and an HMM based algorithm for predicting the single letter suffixes. The combined approach can predict morphologically inflected words with 92% accuracy.

ACM Transactions on Asian Language Information Processing | 2014

Stemming resource-poor Indian languages

Navanath Saharia; Utpal Sharma; Jugal K. Kalita

Stemming is a basic method for morphological normalization of natural language texts. In this study, we focus on the problem of stemming several resource-poor languages from Eastern India, viz., Assamese, Bengali, Bishnupriya Manipuri and Bodo. While Assamese, Bengali and Bishnupriya Manipuri are Indo-Aryan, Bodo is a Tibeto-Burman language. We design a rule-based approach to remove suffixes from words. To reduce over-stemming and under-stemming errors, we introduce a dictionary of frequent words. We observe that, for these languages a dominant amount of suffixes are single letters creating problems during suffix stripping. As a result, we introduce an HMM-based hybrid approach to classify the mis-matched last character. For each word, the stem is extracted by calculating the most probable path in four HMM states. At each step we measure the stemming accuracy for each language. We obtain 94% accuracy for Assamese and Bengali and 87%, and 82% for Bishnupriya Manipuri and Bodo, respectively, using the hybrid approach. We compare our work with Morfessor [Creutz and Lagus 2005]. As of now, there is no reported work on stemming for Bishnupriya Manipuri and Bodo. Our results on Assamese and Bengali show significant improvement over prior published work [Sarkar and Bandyopadhyay 2008; Sharma et al. 2002, 2003].

national conference computational intelligence | 2012

Suffix stripping based NER in Assamese for location names

Padmaja Sharma; Utpal Sharma; Jugal K. Kalita

Named Entity Recognition (NER) is the process of identifying and classifying proper nouns in text documents into pre-defined classes such as person, location and organization. It plays an important role in Natural Language Processing applications. Although NER in Indian languages is a difficult and challenging task and suffers from scarcity of resources, such work has started to appear recently. In highly inflectional languages such as Assamese, NER requires identification of the root forms of words that occur in texts. Our work reports a suffix stripping approach to identify those roots of words which are location named entities.

Explore More