Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Animesh Mukherjee is active.

Publication


Featured researches published by Animesh Mukherjee.


analytics for noisy unstructured text data | 2007

Investigation and modeling of the structure of texting language

Monojit Choudhury; Rahul Saraf; Vijit Jain; Animesh Mukherjee; Sudeshna Sarkar; Anupam Basu

Language usage over computer mediated discourses, such as chats, emails and SMS texts, significantly differs from the standard form of the language and is referred to as texting language (TL). The presence of intentional misspellings significantly decrease the accuracy of existing spell checking techniques for TL words. In this work, we formally investigate the nature and type of compressions used in SMS texts, and develop a Hidden Markov Model based word-model for TL. The model parameters have been estimated through standard machine learning techniques from a word-aligned SMS and standard English parallel corpus. The accuracy of the model in correcting TL words is 57.7%, which is almost a threefold improvement over the performance of Aspell. The use of simple bigram language model results in a 35% reduction of the relative word level error rates.


advances in social networks analysis and mining | 2013

Computer science fields as ground-truth communities: their impact, rise and fall

Tanmoy Chakraborty; Sandipan Sikdar; Vihar Tammana; Niloy Ganguly; Animesh Mukherjee

Study of community in time-varying graphs has been limited to its detection and identification across time. However, presence of time provides us with the opportunity to analyze the interaction patterns of the communities, understand how each individual community grows/shrinks, becomes important over time. This paper, for the first time, systematically studies the temporal interaction patterns of communities using a large scale citation network (directed and unweighted) of computer science. Each individual community in a citation network is naturally defined by a research field - i.e., acting as ground-truth - and their interactions through citations in real time can unfold the landscape of dynamic research trends in the computer science domain over the last fifty years. These interactions are quantified in terms of a metric called inwardness that captures the effect of local citations to express the degree of authoritativeness of a community (research field) at a particular time instance. Several arguments to unveil the reasons behind the temporal changes of inwardness of different communities are put forward using exhaustive statistical analysis. The measurements (importance of field) are compared with the project funding statistics of NSF and it is found that the two are in sync. We believe that this measurement study with a large real-world data is an important initial step towards understanding the dynamics of cluster-interactions in a temporal environment. Note that this paper, for the first time, systematically outlines a new avenue of research that one can practice post community detection.


Journal of Statistical Mechanics: Theory and Experiment | 2011

Statistical physics of language dynamics

Vittorio Loreto; Andrea Baronchelli; Animesh Mukherjee; Andrea Puglisi; Francesca Tria

Language dynamics is a rapidly growing field that focuses on all processes related to the emergence, evolution, change and extinction of languages. Recently, the study of self-organization and evolution of language and meaning has led to the idea that a community of language users can be seen as a complex dynamical system, which collectively solves the problem of developing a shared communication framework through the back-and-forth signaling between individuals. We shall review some of the progress made in the past few years and highlight potential future directions of research in this area. In particular, the emergence of a common lexicon and of a shared set of linguistic categories will be discussed, as examples corresponding to the early stages of a language. The extent to which synthetic modeling is nowadays contributing to the ongoing debate in cognitive science will be pointed out. In addition, the burst of growth of the web is providing new experimental frameworks. It makes available a huge amount of resources, both as novel tools and data to be analyzed, allowing quantitative and large-scale analysis of the processes underlying the emergence of a collective information and language dynamics.


acm/ieee joint conference on digital libraries | 2014

Towards a stratified learning approach to predict future citation counts

Tanmoy Chakraborty; Suhansanu Kumar; Pawan Goyal; Niloy Ganguly; Animesh Mukherjee

In this paper, we study the problem of predicting future citation count of a scientific article after a given time interval of its publication. To this end, we gather and conduct an exhaustive analysis on a dataset of more than 1.5 million scientific papers of computer science domain. On analysis of the dataset, we notice that the citation count of the articles over the years follows a diverse set of patterns; on closer inspection we identify six broad categories of citation patterns. This important observation motivates us to adopt stratified learning approach in the prediction task, whereby, we propose a two-stage prediction model - in the first stage, the model maps a query paper into one of the six categories, and then in the second stage a regression module is run only on the subpopulation corresponding to that category to predict the future citation count of the query paper. Experimental results show that the categorization of this huge dataset during the training phase leads to a remarkable improvement (around 50%) in comparison to the well-known baseline system.


Proceedings of the National Academy of Sciences of the United States of America | 2012

On the origin of the hierarchy of color names

Vittorio Loreto; Animesh Mukherjee; Francesca Tria

One of the fundamental problems in cognitive science is how humans categorize the visible color spectrum. The empirical evidence of the existence of universal or recurrent patterns in color naming across cultures is paralleled by the observation that color names begin to be used by individual cultures in a relatively fixed order. The origin of this hierarchy is largely unexplained. Here we resort to multiagent simulations, where a population of individuals, subject to a simple perceptual constraint shared by all humans, namely the human Just Noticeable Difference, categorizes and names colors through a purely cultural negotiation in the form of language games. We found that the time needed for a population to reach consensus on a color name depends on the region of the visible color spectrum. If color spectrum regions are ranked according to this criterion, a hierarchy with [red, (magenta)-red], [violet], [green/yellow], [blue], [orange], and [cyan], appearing in this order, is recovered, featuring an excellent quantitative agreement with the empirical observations of the WCS. Our results demonstrate a clear possible route to the emergence of hierarchical color categories, confirming that the theoretical modeling in this area has now attained the required maturity to make significant contributions to the ongoing debates concerning language universals.


EPL | 2007

Emergence of a non-scaling degree distribution in bipartite networks: A numerical and analytical study

Fernando Peruani; Monojit Choudhury; Animesh Mukherjee; Niloy Ganguly

We study the growth of bipartite networks in which the number of nodes in one of the partitions is kept fixed while the other partition is allowed to grow. We study random and preferential attachment as well as combination of both. We derive the exact analytical expression for the degree-distribution of all these different types of attachments while assuming that edges are incorporated sequentially, i.e., a single edge is added to the growing network in a time step. We also provide an approximate expression for the case when more than one edges are added in a time step. We show that depending on the relative weight between random and preferential attachments, the degree-distribution of this type of network falls into one of the four possible regimes, which range from a binomial distribution for pure random attachment to an u-shaped distribution for dominant preferential attachment.


meeting of the association for computational linguistics | 2014

That's sick dude!: Automatic identification of word sense change across different timescales

Sunny Mitra; Ritwik Mitra; Martin Riedl; Chris Biemann; Animesh Mukherjee; Pawan Goyal

In this paper, we propose an unsupervised method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books. We construct distributional thesauri based networks from data at different time points and cluster each of them separately to obtain word-centric sense clusters corresponding to the different time points. Subsequently, we compare these sense clusters of two different time points to find if (i) there is birth of a new sense or (ii) if an older sense has got split into more than one sense or (iii) if a newer sense has been formed from the joining of older senses or (iv) if a particular sense has died. We conduct a thorough evaluation of the proposed methodology both manually as well as through comparison with WordNet. Manual evaluation indicates that the algorithm could correctly identify 60.4% birth cases from a set of 48 randomly picked samples and 57% split/join cases from a set of 21 randomly picked samples. Remarkably, in 44% cases the birth of a novel sense is attested by WordNet, while in 46% cases and 43% cases split and join are respectively confirmed by WordNet. Our approach can be applied for lexicography, as well as for applications like word sense disambiguation or semantic search.


Scientific Reports | 2013

Constant Communities in Complex Networks

Tanmoy Chakraborty; Sriram Srinivasan; Niloy Ganguly; Sanjukta Bhowmick; Animesh Mukherjee

Identifying community structure is a fundamental problem in network analysis. Most community detection algorithms are based on optimizing a combinatorial parameter, for example modularity. This optimization is generally NP-hard, thus merely changing the vertex order can alter their assignments to the community. However, there has been less study on how vertex ordering influences the results of the community detection algorithms. Here we identify and study the properties of invariant groups of vertices (constant communities) whose assignment to communities are, quite remarkably, not affected by vertex ordering. The percentage of constant communities can vary across different applications and based on empirical results we propose metrics to evaluate these communities. Using constant communities as a pre-processing step, one can significantly reduce the variation of the results. Finally, we present a case study on phoneme network and illustrate that constant communities, quite strikingly, form the core functional units of the larger communities.


International Journal of Modern Physics C | 2007

Modeling the Co-occurrence Principles of the Consonant Inventories: A Complex Network Approach

Animesh Mukherjee; Monojit Choudhury; Anupam Basu; Niloy Ganguly

Speech sounds of the languages all over the world show remarkable patterns of co-occurrence. In this work, we attempt to automatically capture the patterns of co-occurrence of the consonants across languages and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the consonants are the nodes and an edge between two nodes (read consonants) signify their co-occurrence likelihood over the consonant inventories. Through this network we identify communities of consonants that essentially reflect their patterns of co-occurrence across languages. We test the goodness of the communities and observe that the constituent consonants frequently occur in such groups in real languages also. Interestingly, the consonants forming these communities reflect strong correlations in terms of their features, which indicate that the principle of feature economy acts as a driving force towards community formation. In order to measure the strength of this force we propose a theoretical information definition of feature economy and show that indeed the feature economy exhibited by the consonant communities are substantially better than that of those where the consonant inventories had evolved just by chance.


PLOS ONE | 2011

Aging in Language Dynamics

Animesh Mukherjee; Francesca Tria; Andrea Baronchelli; Andrea Puglisi; Vittorio Loreto

Human languages evolve continuously, and a puzzling problem is how to reconcile the apparent robustness of most of the deep linguistic structures we use with the evidence that they undergo possibly slow, yet ceaseless, changes. Is the state in which we observe languages today closer to what would be a dynamical attractor with statistically stationary properties or rather closer to a non-steady state slowly evolving in time? Here we address this question in the framework of the emergence of shared linguistic categories in a population of individuals interacting through language games. The observed emerging asymptotic categorization, which has been previously tested - with success - against experimental data from human languages, corresponds to a metastable state where global shifts are always possible but progressively more unlikely and the response properties depend on the age of the system. This aging mechanism exhibits striking quantitative analogies to what is observed in the statistical mechanics of glassy systems. We argue that this can be a general scenario in language dynamics where shared linguistic conventions would not emerge as attractors, but rather as metastable states.

Collaboration


Dive into the Animesh Mukherjee's collaboration.

Top Co-Authors

Avatar

Niloy Ganguly

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Pawan Goyal

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Suman Kalyan Maity

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Tanmoy Chakraborty

Indraprastha Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Anupam Basu

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Mayank Singh

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Sandipan Sikdar

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Sanjukta Bhowmick

University of Nebraska Omaha

View shared research outputs
Top Co-Authors

Avatar

Francesca Tria

Institute for Scientific Interchange

View shared research outputs
Researchain Logo
Decentralizing Knowledge