Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Utpal Kumar Sikdar is active.

Publication


Featured researches published by Utpal Kumar Sikdar.


Journal of Cheminformatics | 2015

The CHEMDNER corpus of chemicals and drugs and its annotation principles

Martin Krallinger; Obdulia Rabal; Florian Leitner; Miguel Vazquez; David Salgado; Zhiyong Lu; Robert Leaman; Yanan Lu; Donghong Ji; Daniel M. Lowe; Roger A. Sayle; Riza Theresa Batista-Navarro; Rafal Rak; Torsten Huber; Tim Rocktäschel; Sérgio Matos; David Campos; Buzhou Tang; Hua Xu; Tsendsuren Munkhdalai; Keun Ho Ryu; S. V. Ramanan; Senthil Nathan; Slavko Žitnik; Marko Bajec; Lutz Weber; Matthias Irmer; Saber A. Akhondi; Jan A. Kors; Shuo Xu

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/


Proceedings of the First Workshop on Abusive Language Online | 2017

Using Convolutional Neural Networks to Classify Hate-Speech

Björn Gambäck; Utpal Kumar Sikdar

The paper introduces a deep learning-based Twitter hate-speech text classification system. The classifier assigns each tweet to one of four predefined categories: racism, sexism, both (racism and sexism) and non-hate-speech. Four Convolutional Neural Network models were trained on resp. character 4-grams, word vectors based on semantic information built using word2vec, randomly generated word vectors, and word vectors combined with character n-grams. The feature set was down-sized in the networks by max-pooling, and a softmax function used to classify tweets. Tested by 10-fold cross-validation, the model based on word2vec embeddings performed best, with higher precision than recall, and a 78.3% F-score.


SpringerPlus | 2013

Biomedical named entity extraction: some issues of corpus compatibilities

Asif Ekbal; Sriparna Saha; Utpal Kumar Sikdar

BackgroundNamed Entity (NE) extraction is one of the most fundamental and important tasks in biomedical information extraction. It involves identification of certain entities from text and their classification into some predefined categories. In the biomedical community, there is yet no general consensus regarding named entity (NE) annotation; thus, it is very difficult to compare the existing systems due to corpus incompatibilities. Due to this problem we can not also exploit the advantages of using different corpora together. In our present work we address the issues of corpus compatibilities, and use a single objective optimization (SOO) based classifier ensemble technique that uses the search capability of genetic algorithm (GA) for NE extraction in biomedicine. We hypothesize that the reliability of predictions of each classifier differs among the various output classes. We use Conditional Random Field (CRF) and Support Vector Machine (SVM) frameworks to build a number of models depending upon the various representations of the set of features and/or feature templates. It is to be noted that we tried to extract the features without using any deep domain knowledge and/or resources.ResultsIn order to assess the challenges of corpus compatibilities, we experiment with the different benchmark datasets and their various combinations. Comparison results with the existing approaches prove the efficacy of the used technique. GA based ensemble achieves around 2% performance improvements over the individual classifiers. Degradation in performance on the integrated corpus clearly shows the difficulties of the task.ConclusionsIn summary, our used ensemble based approach attains the state-of-the-art performance levels for entity extraction in three different kinds of biomedical datasets. The possible reasons behind the better performance in our used approach are the (i). use of variety and rich features as described in Subsection “Features for named entity extraction”; (ii) use of GA based classifier ensemble technique to combine the outputs of multiple classifiers.


International Journal of Machine Learning and Cybernetics | 2016

On active annotation for named entity recognition

Asif Ekbal; Sriparna Saha; Utpal Kumar Sikdar

A major constraint of machine learning techniques for solving several information extraction problems is the availability of sufficient amount of training examples, which involve huge costs and efforts to prepare. Active learning techniques select informative instances from the unlabeled data and add it to the training set in such a way that the overall classification performance improves. In random sampling approach, unlabeled data is selected for annotation at random and thus can’t yield the desired results. In contrast, active learning selects the useful data from a huge pool of unlabeled documents. The strategies used often classify the instances to belong to the incorrect classes. The classifier is confused between two classes if the test instance is located near the margin. We propose two methods for active learning, and show that these techniques favorably result in the increased performance. The first approach is based on support vector machine (SVM), whereas the second one is based on an ensemble learning which utilizes the classification capabilities of two well-known classifiers, namely SVM and conditional random field. The motivation of using these classifiers is that these are orthogonal in nature, and thereby a combination of them can produce the better results. In order to show the efficacy of the proposed approach we choose a crucial problem, namely named entity recognition (NER) in three languages, namely Bengali, Hindi and English. This is also evaluated for NER in biomedical domain. Evaluation results reveal that the proposed techniques indeed show considerable performance improvements.


Proceedings of the Workshop on Noisy User-generated Text | 2015

IITP: Multiobjective Differential Evolution based Twitter Named Entity Recognition

Shad Akhtar; Utpal Kumar Sikdar; Asif Ekbal

In this paper we propose a differential evolution (DE) based named entity recognition (NER) system in twitter data. In the first step, we develop various NER systems using different combinations of the features. We implemented these features without using any domain-specific features and/or resources. As a base classifier we use Conditional Random Field (CRF). In the second step, we propose a DE based feature selection approach to determine the most relevant set of features and its context information. The optimized feature set applied to the training set yields the precision, recall and Fmeasure values of 60.68%, 29.65% and 39.84%, respectively for the fine-grained named entity (NE) types. When we consider only the coarse-grained NE types, it shows the precision, recall and F-measure values of 63.43%, 51.44% and 56.81%, respectively.


Proceedings of the Workshop on Noisy User-generated Text | 2015

IITP: Hybrid Approach for Text Normalization in Twitter

Shad Akhtar; Utpal Kumar Sikdar; Asif Ekbal

In this paper we report our work for normalization of noisy text in Twitter data. The method we propose is hybrid in nature that combines machine learning with rules. In the first step, supervised approach based on conditional random field is developed, and in the second step a set of heuristics rules is applied to the candidate wordforms for the normalization. The classifier is trained with a set of features which were are derived without the use of any domain-specific feature and/or resource. The overall system yields the precision, recall and F-measure values of 90.26%, 71.91% and 80.05% respectively for the test dataset.


international conference on computational linguistics | 2014

Modified Differential Evolution for Biochemical Name Recognizer

Utpal Kumar Sikdar; Asif Ekbal; Sriparna Saha

In this paper we propose a modified differential evolution MDE based feature selection and ensemble learning algorithms for biochemical entity recognizer. Identification and classification of chemical entities are relatively more complex and challenging compared to the other related tasks. As chemical entities we focus on IUPAC and IUPAC related entities. The algorithm performs feature selection within the framework of a robust machine learning algorithm, namely Conditional Random Field. Features are identified and implemented mostly without using any domain specific knowledge and/or resources. In this paper we modify traditional differential evolution to perform two tasks, viz. determining relevant set of features as well as determining proper voting weights for constructing an ensemble. The feature selection technique produces a set of potential solutions on the final population. We develop many models of CRF using these feature combinations. In order to further improve the performance the outputs of these classifiers are combined together using a classifier ensemble technique based on modified DE. Our experiments with the benchmark datasets yield the recall, precision and F-measure values of 82.34%, 88.26% and 85.20%, respectively.


data mining in bioinformatics | 2015

Named entity recognition and classification in biomedical text using classifier ensemble

Sriparna Saha; Asif Ekbal; Utpal Kumar Sikdar

Named Entity Recognition and Classification (NERC) is an important task in information extraction for biomedicine domain. Biomedical Named Entities include mentions of proteins, genes, DNA, RNA, etc. which, in general, have complex structures and are difficult to recognise. In this paper, we propose a Single Objective Optimisation based classifier ensemble technique using the search capability of Genetic Algorithm (GA) for NERC in biomedical texts. Here, GA is used to quantify the amount of voting for each class in each classifier. We use diverse classification methods like Conditional Random Field and Support Vector Machine to build a number of models depending upon the various representations of the set of features and/or feature templates. The proposed technique is evaluated with two benchmark datasets, namely JNLPBA 2004 and GENETAG. Experiments yield the overall F- measure values of 75.97% and 95.90%, respectively. Comparisons with the existing systems show that our proposed system achieves state-of-the-art performance.


advances in computing and communications | 2013

Ensemble based active annotation for biomedical named entity recognition

Mridula Verma; Utpal Kumar Sikdar; Sriparna Saha; Asif Ekbal

Active Learning is an important prospect of machine learning for information extraction to deal with the problems of high cost of collecting labeled examples. It makes more efficient use of the learners time by asking them to label only instances that are most useful for the trainer. We propose a novel method for solving this problem and show that it favorably results in the increased performance. Our proposed framework is based on an ensemble approach, where Decision Tree and Memory-based Learner are used as the base learners. The proposed approach is applied for solving the problem of named entity recognition (NER) in biomedical domain. Results show that the proposed technique indeed improves the performance of the system significantly.


international conference on tools with artificial intelligence | 2010

A Genetic Approach for Biomedical Named Entity Recognition

Asif Ekbal; Sriparna Saha; Utpal Kumar Sikdar; Mohammed Hasanuzzaman

In this paper, we report a classifier ensemble technique using the search capability of genetic algorithm (GA) for Named Entity Recognition (NER) in biomedical domain. We use Maximum Entropy (ME) framework to build a number of classifiers depending upon the various representations of a set of features. The proposed technique is evaluated with the JNLPBA 2004 data sets that yield the overall recall, precision and F-measure values of 67.98\%, 71.68\% and 69.78\%, respectively.

Collaboration


Dive into the Utpal Kumar Sikdar's collaboration.

Top Co-Authors

Avatar

Asif Ekbal

Indian Institute of Technology Patna

View shared research outputs
Top Co-Authors

Avatar

Sriparna Saha

Indian Institute of Technology Patna

View shared research outputs
Top Co-Authors

Avatar

Björn Gambäck

Norwegian University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shad Akhtar

Indian Institute of Technology Patna

View shared research outputs
Top Co-Authors

Avatar

Biswanath Barik

Norwegian University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rafal Rak

University of Manchester

View shared research outputs
Researchain Logo
Decentralizing Knowledge