M. Anand Kumar
Amrita Vishwa Vidyapeetham
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by M. Anand Kumar.
advances in recent technologies in communication and computing | 2009
V. Dhanalakshmi; M. Anand Kumar; R. U. Rekha; C. Arun Kumar; K. P. Soman; S Rajendran
This paper is based on morphological analyzer using machine learning approach for complex agglutinative natural languages. Morphological analysis is concerned with retrieving the structure, the syntactic and morphological properties or the meaning of a morphologically complex word. The morphology structure of agglutinative language is unique and capturing its complexity in a machine analyzable and generatable format is a challenging job. Generally rule based approaches are used for building morphological analyzer system. In rule based approaches what works in the forward direction may not work in the backward direction. This new and state of the art machine learning approach based on sequence labeling and training by kernel methods captures the non-linear relationships in the different aspect of morphological features of natural languages in a better and simpler way. The overall accuracy obtained for the morphologically rich agglutinative language (Tamil) was really encouraging.
advances in recent technologies in communication and computing | 2009
V. Dhanalakshmi; P.a Padmavathy; M. Anand Kumar; K. P. Soman; S Rajendran
This paper presents the chunker for Tamil using Machine learning techniques. Chunking is the task of identifying and segmenting the text into syntactically correlated word groups. The chunking is done by the machine learning techniques, where the linguistical knowledge is automatically extracted from the annotated corpus. We have developed our own tagset for annotating the corpus, which is used for training and testing the POS tagger generator and the chunker. The present tagset consists of thirty tags for POS and nine tags for chunking. A corpus size of two hundred and twenty five thousand words was used for training and testing the accuracy of the Chunker. We found that CRF++ affords the most encouraging result for Tamil chunker.
international conference on data engineering | 2010
V.P.a Abeera; S.a Aparna; R. U. Rekha; M. Anand Kumar; V. Dhanalakshmi; K. P. Soman; S Rajendran
An efficient and reliable method for implementing Morphological Analyzer for Malayalam using Machine Learning approach has been presented here. A Morphological Analyzer segments words into morphemes and analyze word formation. Morphemes are smallest meaning bearing units in a language. Morphological Analysis is one of the techniques used in formal reading and writing. Rule based approaches are generally used for building Morphological Analyzer. The disadvantage of using rule based approaches are that if one rule fails it will affect the entire rule that follows, that is each rule works on the output of previous rule. The significance of using machine learning approach arises from the fact that rules are learned automatically from data, uses learning and classification algorithms to learn models and make predictions. The result shows that the system is very effective and after learning it predicts correct grammatical features even forwords which are not in the training set.
International Joint Conference on Advances in Signal Processing and Information Technology – SPIT.2011 | 2011
S.a Keerthana; V.b Dhanalakshmi; M. Anand Kumar; V.P.a Ajith; K. P. Soman
Transliteration is the process of replacing the characters in one language with the corresponding phonetically equivalent characters of the other language. India is a language diversified country where people speak and understand many languages but does not know the script of some of these languages. Transliteration plays a major role in such cases. Transliteration has been a supporting tool in machine translation and cross language information retrieval systems as most of the proper nouns are out of vocabulary words. In this paper, a sequence learning method for transliterating named entities from Tamil to Hindi is proposed. Through this approach, accuracy obtained is encouraging. This transliteration system can be embedded with Tamil to Hindi machine translation system in future.
International Joint Conference on Advances in Signal Processing and Information Technology | 2011
R. Dhivya; V.b Dhanalakshmi; M. Anand Kumar; K. P. Soman
Clause boundary identification is a very important task in natural language processing. Identifying the clauses in the sentence becomes a tough task if the clauses are embedded inside other clauses in the sentence. In our approach, we use the dependency parser to identify the boundary for the clause. The dependency tag set, contains 11 tags, and is useful for identifying the boundary of the clause along with the identification of the subject and object information of the sentence. The MALT parser is used to get the required information about the sentence.
international conference on technology for education | 2010
Velliangiri Dhanalakshmi; M. Anand Kumar; R. U. Rekha; K. P. Soman; S Rajendran
Grammar plays an important role in good communication. Learning grammar rules for Tamil language is very difficult as they have a very rich morphological structure which is agglutinative. Students get annoyed with the language rules and the old teaching methodology. Computer assisted Grammar Teaching Tools makes students to learn faster and better. NLP applications are used to generate such tools for curriculum enhancement of the students. In this paper we present the Grammar teaching tools in the sentence and word analyzing level for Tamil Language. The tools like Parts of speech Tagger, Chunker and Dependency parser for the sentence level analysis and Morphological Analyzer and Generator for the word level analysis were developed using machine learning based technology. These tools were very useful for second language learners to understand the word and sentence construction in a non-conceptual way. An user interface is developed for the practical usage of the tool.
2nd IC3T International Conference on Computer and Communication Technologies | 2016
M. Jagadeesh; M. Anand Kumar; K. P. Soman
Indian languages have very less linguistic resources, though they have a large speaker base. They are very rich in morphology, making it very difficult to do sequential tagging or any type of language analysis. In natural language processing, parts-of-speech (POS) tagging is the basic tool with which it is possible to extract terminology using linguistic patterns. The main aim of this research is to do sequential tagging for Indian languages based on the unsupervised features and distributional information of a word with its neighboring words. The results of the machine learning algorithms depend on the data representation. Not all the data contribute to creation of the model, leading a few in vain and it depends on the descriptive factors of data disparity. Data representations are designed by using domain-specific knowledge but the aim of Artificial Intelligence is to reduce these domain-dependent representations, so that it can be applied to the domains which are new to one. Recently, deep learning algorithms have acquired a substantial interest in reducing the dimension of features or extracting the latent features. Recent development and applications of deep learning algorithms are giving impressive results in several areas mostly in image and text applications.
national conference on communications | 2017
K. Manjusha; M. Anand Kumar; K. P. Soman
Feature extraction is the process of mapping input signal to informative representation that can easily be handled by the classifier systems to build decision boundary in between the participating pattern classes. Scattering representation build invariant signal representation by applying a cascade of wavelet decompositions and complex modulus, followed by low-pass filtering. The objective of this paper is to analyze the performance of scattering representation over Malayalam character recognition process. Malayalam character recognizers built from image pixel features and the features extracted from scattering network are tested over real world document images. Soft-max Regression classifier is utilized for building the classification models. Scattering representation based recognition system could achieve a 2% increase in recognition accuracy compared to image pixel value based features.
international conference on smart technologies and management for computing communication controls energy and materials | 2015
Gowri Prasad; K.K. Fousiya; M. Anand Kumar; K. P. Soman
Named Entity Recognition is an important application area of Natural Language Processing. It is the process of identifying the designators which are present in a sentence called as named entities. Named Entity Recognition can be performed using rule based approaches, machine learning based approaches and hybrid approaches. This paper proposes a method for Named Entity Recognition of Malayalam language using one of the supervised machine learning approach called Conditional Random field approach.
international conference on mining intelligence and knowledge exploration | 2015
S. Sachin Kumar; B. Premjith; M. Anand Kumar; K. P. Soman
The present work is done as part of shared task in Sentiment Analysis in Indian Languages SAIL 2015, under constrained category. The task is to classify the twitter data into three polarity categories such as positive, negative and neutral. For training, twitter dataset under three languages were provided Hindi, Bengali and Tamil. In this shared task, ours is the only team who participated in all the three languages. Each dataset contained three separate categories of twitter data namely positive, negative and neutral. The proposed method used binary features, statistical features generated from SentiWordNet, and word presence binary feature. Due to the sparse nature of the generated features, the input features were mapped to a random Fourier feature space to get a separation and performed a linear classification using regularized least square method. The proposed method identified more negative tweets in the test data provided Hindi and Bengali language. In test tweet for Tamil language, positive tweets were identified more than other two polarity categories. Due to the lack of language specific features and sentiment oriented features, the tweets under neutral were less identified and also caused misclassifications in all the three polarity categories. This motivates to take forward our research in this area with the proposed method.