Maike Erdmann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Maike Erdmann is active.

Explore More

Publication

Featured researches published by Maike Erdmann.

ACM Transactions on Multimedia Computing, Communications, and Applications | 2009

Improving the extraction of bilingual terminology from Wikipedia

Maike Erdmann; Kotaro Nakayama; Takahiro Hara; Shojiro Nishio

Research on the automatic construction of bilingual dictionaries has achieved impressive results. Bilingual dictionaries are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. In this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an SVM classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs.

database systems for advanced applications | 2008

An approach for extracting bilingual terminology from Wikipedia

Maike Erdmann; Kotaro Nakayama; Takahiro Hara; Shojiro Nishio

With the demand of bilingual dictionaries covering domain-specific terminology, research in the field of automatic dictionary extraction has become popular. However, accuracy and coverage of dictionaries created based on bilingual text corpora are often not sufficient for domain-specific terms. Therefore, we present an approach to extracting bilingual dictionaries from the link structure of Wikipedia, a huge scale encyclopedia that contains a vast amount of links between articles in different languages. Our methods analyze not only these interlanguage links but extract even more translation candidates from redirect page and link text information. In an experiment, we proved the advantages of our methods compared to a traditional approach of extracting bilingual terminology from parallel corpora.

Journal of Information Processing | 2008

Extraction of Bilingual Terminology from a Multilingual Web-based Encyclopedia

Maike Erdmann; Kotaro Nakayama; Takahiro Hara; Shojiro Nishio

With the demand for bilingual dictionaries covering domain-specific terminology, research in the field of automatic dictionary extraction has become popular. However, the accuracy and coverage of dictionaries created based on bilingual text corpora are often not sufficient for domain-specific terms. Therefore, we present an approach for extracting bilingual dictionaries from the link structure of Wikipedia, a huge scale encyclopedia that contains a vast number of links between articles in different languages. Our methods analyze not only these interlanguage links but extract even more translations from redirect page and link text information. In an experiment which we have interpreted in detail, we proved that the combination of redirect page and link text information achieves much better results than the traditional approach of extracting bilingual terminology from parallel corpora.

advanced information networking and applications | 2012

Social Indexing of TV Programs: Detection and Labeling of Significant TV Scenes by Twitter Analysis

Masami Nakazawa; Maike Erdmann; Keiichiro Hoashi; Chihiro Ono

Technology to analyze the content of TV programs, especially the extraction and annotation of important scenes and events within a program, is beneficial for users to enjoy recorded programs. In this paper, we propose a method of detecting significant scenes in TV programs and automatically annotating the content of the extracted scenes through Twitter analysis. Experiments conducted on baseball games indicate that the proposed method is capable of detecting major events in a baseball game with an accuracy of 90.6%. Moreover, the names of persons involved in the events were detected with an accuracy of 87.2%, and labels describing the event were applied with an accuracy of 66.8%. The proposed technology is very helpful, because it enables users to skip to the highlights of a recorded program.

web information systems engineering | 2014

Feature Based Sentiment Analysis of Tweets in Multiple Languages

Maike Erdmann; Kazushi Ikeda; Hiromi Ishizaki; Gen Hattori; Yasuhiro Takishima

Feature based sentiment analysis is normally conducted using review Web sites, since it is difficult to extract accurate product features from tweets. However, Twitter users express sentiment towards a large variety of products in many different languages. Besides, sentiment expressed on Twitter is more up to date and represents the sentiment of a larger population than review articles. Therefore, we propose a method that identifies product features using review articles and then conduct sentiment analysis on tweets containing those features. In that way, we can increase the precision of feature extraction by up to 40% compared to features extracted directly from tweets. Moreover, our method translates and matches the features extracted for multiple languages and ranks them based on how frequently the features are mentioned in the tweets of each language. By doing this, we can highlight the features that are the most relevant for multilingual analysis.

database systems for advanced applications | 2008

A bilingual dictionary extracted from the Wikipedia link structure

Maike Erdmann; Kotaro Nakayama; Takahiro Hara; Shojiro Nishio

A lot of bilingual dictionaries have been released on the WWW. However, these dictionaries insufficiently cover new and domain-specific terminology. In our demonstration, we present a dictionary constructed by analyzing the link structure of Wikipedia, a huge scale encyclopedia containing a large amount of links between articles in different languages. We analyzed not only these interlanguage links but extracted even more translation candidates from redirect page and link text information. In an experiment, we already proved the advantages of our dictionary compared to manually created dictionaries as well as to extracting bilingual terminology from parallel corpora.

pacific rim international conference on artificial intelligence | 2012

Hierarchical training of multiple SVMs for personalized web filtering

Maike Erdmann; Duc-Dung Nguyen; Tomoya Takeyoshi; Gen Hattori; Kazunori Matsumoto; Chihiro Ono

The abundance of information published on the Internet makes filtering of hazardous Web pages a difficult yet important task. Supervised learning methods such as Support Vector Machines can be used to identify hazardous Web content. However, scalability is a big challenge, especially if we have to train multiple classifiers, since different policies exist on what kind of information is hazardous. We therefore propose a transfer learning approach called Hierarchical Training for Multiple SVMs. HTMSVM identifies common data among similar training sets and trains the common data sets first, in order to obtain initial solutions. These initial solutions then reduce the time for training the individual training sets without influencing classification accuracy. In an experiment, in which we trained five Web content filters with 80% of common and 20% of inconsistently labeled training examples, HTMSVM was able to predict hazardous Web pages with a training time of only 26% to 41% compared to LibSVM, but the same classification accuracy (more than 91%).

international conference on social computing | 2013

Automatic Labeling of Training Data for Collecting Tweets for Ambiguous TV Program Titles

Maike Erdmann; Erik Ward; Kazushi Ikeda; Gen Hattori; Chihiro Ono; Yasuhiro Takishima

Twitter is a popular medium for sharing opinions on TV programs, and the analysis of TV related tweets is attracting a lot of interest. However, when collecting all tweets containing a given TV program title, we obtain a large number of unrelated tweets, due to the fact that many of the TV program titles are ambiguous. Using supervised learning, TV related tweets can be collected with high accuracy. The goal of our proposed method is to automate the labeling process, in order to eliminate the cost required for data labeling without sacrificing classification accuracy. When creating the training data, we use only tweets of unambiguous TV program titles. In order to decide whether a TV program title is ambiguous, we automatically determine whether it can be used as a common expression or named entity. In two experiments, in which we collected tweets for 32 ambiguous TV program titles, we achieved the same (78.2%) or even higher classification accuracy (79.1%) with automatically labeled training data as with manually labeled data, while effectively eliminating labeling costs.

Journal of Information Processing | 2015

Assigning Keywords to Automatically Extracted Personal Cliques from Social Networks

Maike Erdmann; Tomoya Takeyoshi; Kazushi Ikeda; Gen Hattori; Chihiro Ono; Yasuhiro Takishima

In Twitter and other microblogging services, users often have large social networks formed around cliques (communities) such as friends, coworkers or former classmates. However, the membership of each user in multiple cliques makes it difficult to process information and interact with other clique members. We address this problem by automatically dividing the social network of a Twitter user into personal cliques and assigning keywords to each clique to identify the common ground of its members. In this way, the user can understand the structure of their social network and interact with the members of each clique independently. Our proposed method improves clique annotation by not only extracting keywords from the tweet history of the clique members, but individually weighting the extracted keywords of each member according to the relevance of their tweets for the clique. The keyword weight is influenced by two factors. The first factor is calculated based on the number of connections of a user within the clique, and the second factor depends on whether the user publishes personal information or information of general interest. We developed the prototype of a Twitter client with clique management functionality and conducted an experiment in which on average 46.96% of the keywords extracted from our proposed method were relevant for the cliques as opposed to 38.31% for the baseline method.

advanced information networking and applications | 2011

Calculating Wikipedia Article Similarity Using Machine Translation Evaluation Metrics

Maike Erdmann; Andrew M. Finch; Kotaro Nakayama; Eiichiro Sumita; Takahiro Hara; Shojiro Nishio

Calculating the similarity of Wikipedia articles in different languages is helpful for bilingual dictionary construction and various other research areas. However, standard methods for document similarity calculation are usually very simple. Therefore, we describe an approach of translating one Wikipedia article into the language of the other article, and then calculating article similarity with standard machine translation evaluation metrics. An experiment revealed that our approach is effective for identifying Wikipedia articles in different languages that are covering the same concept.

Explore More