Mei Kobayashi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mei Kobayashi is active.

Explore More

Publication

Featured researches published by Mei Kobayashi.

ACM Computing Surveys | 2000

Information retrieval on the web

Mei Kobayashi; Koichi Takeda

In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical figures vary, overall trends cited by the sources are consistent and point to exponential growth in the past and in the coming decade. Hence it is not surprising that about 85% of Internet users surveyed claim using search engines and search services to find specific information. The same surveys show, however, that users are not satisfied with the performance of the current generation of search engines; the slow retrieval speed, communication delays, and poor quality of retrieved results (e.g., noise and broken links) are commonly cited problems. We discuss the development of new techniques targeted to resolve some of the problems associated with Web-based information retrieval and speculate on future trends.

Journal of the Acoustical Society of America | 1998

Speech synthesis using glottal closure instants determined from adaptively-thresholded wavelet transforms

Masaharu Sakamoto; Mei Kobayashi; Takashi Saito; Masafumi Nishimura

A speech synthesis system making use of a pitch-synchronous waveform overlap method to realize stable speech synthesis processing in which pitch shaking is negligible. The present invention is characterized in that glottal closure instants are used as reference points (pitch marks) for overlapping. Since the glottal closure instants can be extracted stably and accurately by using dyadic Wavelet conversion, speech in which pitch shaking is negligible and rumbling sounds are minimized can be synthesized stably. In addition, more flexible waveform separation becomes possible by setting the reference point for overlapping and the reference point for waveform separation to different positions. The extraction of glottal closure instants is performed by searching the local peaks of the dyadic Wavelet conversion, but preferably a threshold value for searching for the local peaks of the dyadic Wavelet conversion is adaptively controlled each time dyadic Wavelet conversion is obtained.

Journal of Computational and Applied Mathematics | 2002

Matrix computations for information retrieval and major and outlier cluster detection

Mei Kobayashi; Masaki Aono; Hironori Takeuchi; Hikaru Samukawa

Abstract In this paper we introduce COV , a novel information retrieval (IR) algorithm for massive databases based on vector space modeling and spectral analysis of the covariance matrix, for the document vectors, to reduce the scale of the problem. Since the dimension of the covariance matrix depends on the attribute space and is independent of the number of documents, COV can be applied to databases that are too massive for methods based on the singular value decomposition of the document-attribute matrix, such as latent semantic indexing (LSI). In addition to improved scalability, theoretical considerations indicate that results from our algorithm tend to be more accurate than those from LSI, particularly in detecting subtle differences in document vectors. We demonstrate the power and accuracy of COV through an important topic in data mining, known as outlier cluster detection. We propose two new algorithms for detecting major and outlier clusters in databases—the first is based on LSI, and the second on COV. Our implementation studies indicate that our cluster detection algorithms outperform the basic LSI and COV algorithm in detecting outlier clusters.

Computers & Mathematics With Applications | 2001

Estimation of singular values of very large matrices using random sampling

Mei Kobayashi; G. Dupret; O. King; H. Samukawa

Abstract The singular value decomposition (SVD) has enjoyed a long and rich history. Although it was introduced in the 1870s by Beltrami and Jordan for its own intrinsic interest, it has become an invaluable tool in applied mathematics and mathematical modeling. Singular value analysis has been applied in a wide variety of disciplines, most notably for least squares fitting of data. More recently, it is being used in data mining applications and by search engines to rank documents in very large databases, including the Web. Recently, the dimensions of matrices which are used in many mathematical models are becoming so large that classical algorithms for computing the SVD cannot be used. We present a new method to determine the largest 10%–25% of the singular values of matrices which are so enormous that use of standard algorithms and computational packages will strain computational resources available to the average user. In our method, rows from the matrix are randomly selected, and a smaller matrix is constructed from the selected rows. Next, we compute the singular values of the smaller matrix. This process of random sampling and computing singular values is repeated as many times as necessary (usually a few hundred times) to generate a set of set of training data for neutral net analysis. Our method is a type of randomized algorithm, i.e., algorithms which solve problems using randomly selected samples of data which are too large to be processed by conventional means. These algorithms output correct (or nearly correct) answers most of the time as long as the input has certain desirable properties. We list these properties and show that matrices which appear in information retrieval are fairly well suited for processing using randomized algorithms. We note, however, that the probability of arriving at an incorrect answer, however small, is not naught since an unrepresentative sample may be drawn from the data.

Knowledge and Information Systems | 2006

Exploring overlapping clusters using dynamic re-scaling and sampling

Mei Kobayashi; Masaki Aono

Until recently, the aim of most text-mining work has been to understand major topics and clusters. Minor topics and clusters have been relatively neglected even though they may represent important information on rare events. We present a novel method for exploring overlapping clusters of heterogeneous sizes, which is based on vector space modeling, covariance matrix analysis, random sampling, and dynamic re-weighting of document vectors in massive databases. Our system addresses a combination of difficult issues in database analysis, such as synonymy and polysemy, identification of minor clusters, accommodation of cluster overlap, automatic labeling of clusters based on their document contents, and the user-controlled trade-off between speed of computation and quality of results. We conducted implementation studies with new articles from the Reuters and LA Times TREC data sets and artificially generated data with a known cluster structure to demonstrate the effectiveness of our system.

IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 1998

Wavelet analysis used in text-to-speech synthesis

Mei Kobayashi; Masaharu Sakamoto; Takashi Saito; Yasuhide Hashimoto; Masafumi Nishimura; Kazuhiro Suzuki

This brief describes the use of wavelet analysis in the development of a Japanese text-to-speech (TTS) system for personal computers. The quality of synthesized speech is one of the most important features of any TTS system. Synthesis methods which are based on manipulation of the speech signal spectrum (e.g,, linear predictive coding synthesis and formant synthesis) produce comprehensible but unnatural sounding output. The lack of naturalness commonly associated with these methods results from the use of oversimplified speech models, small synthesis unit inventories, and poor handling of text parsing for prosody control. We developed four new technologies to overcome these difficulties and improve the quality of output from TTS systems: accurate pitch mark determination by wavelet analysis, speech waveform generation using a modified time domain pitch synchronous overlap-add method, speech synthesis unit selection using a context dependent clustering method, and efficient prosody control using a 3-phrase parser. All four technologies will be described; however, those which rely on wavelet techniques will be emphasized.

knowledge discovery and data mining | 2014

Resources for Studying Statistical Analysis of Biomedical Data and R

Mei Kobayashi

The past decade has seen explosive growth in digitized medical data. This trend offers medical practitioners an unparalleled opportunity to identify effectiveness of treatments for patients using summary statistics and to offer patients more personalized medical treatments based on predictive analytics. To exploit this opportunity, statisticians and computer scientists need to work and communicate effectively with medical practitioners to ensure proper measurement data, collection of sufficient volumes of heterogeneous data to ensure patient privacy, and understanding of probabilities and sources of errors associated with data sampling. Interdisciplinary collaborations between scientists are likely to lead to the development of more effective methods for explaining probabilities, possible errors, and risks associated with treatment options to patients. This chapter introduces some online resources to help medical practitioners with little or no background in summary and predictive statistics learn basic statistical concepts and implement data analysis on their personal computers using R, a high-level computer language that requires relatively little training. Readers who are only interested in understanding basic statistical concepts may want to skip the subsection on R.

Archive | 2014

Text Document Cluster Analysis Through Visualization of 3D Projections

Masaki Aono; Mei Kobayashi

Clustering has been used as a tool for understanding the content of large text document sets. As the volume of stored data has increased, so has the need for tools to understand output from clustering algorithms. We developed a new visual interface to meet this demand. Our interface helps non-technical users understand documents and clusters in massive databases (e.g., document content, cluster sizes, distances between clusters, similarities of documents within clusters, extent of cluster overlaps) and evaluate the quality of output from different clustering algorithms. When a user inputs a keyword query describing his/her interests, our system retrieves and displays documents and clusters in three dimensions. More specifically, given a set of documents modeled as vectors in an orthogonal coordinate system and a query, our system finds three orthogonal coordinate axes that are most relevant to generate a display (or users may choose any three orthogonal axes). We conducted implementation studies to demonstrate the value of our system with an artificial data set and a de facto benchmark news article dataset from the United States NIST Text REtrieval Competitions (TREC).

Archive | 2012

Blogging Around the Globe: Motivations, Privacy Concerns, and Social Networking

Mei Kobayashi

Blogging has become popular since its introduction the late 1990s, and the practice continues to grow. Blogs exemplify internet-age, user-generated content that is changing the way people access information, form social networks, and interact with acquaintances. Moreover, blogs are associated with extensive social communities defined by interconnecting references, and are considered to be one of the early catalysts for propelling the popularity of online social networking. This paper traces the development of blogs and the blogosphere around the globe. National surveys conducted by numerous international teams of researchers indicate motivations for blogging and attitudes regarding privacy are strikingly different in countries with large blogging communities. These differences are reflected in the content of blogs and profoundly influence blog-based social networks, which tend to be region-centric.

knowledge discovery and data mining | 2008

Tracking topic evolution in on-line postings: 2006 IBM innovation Jam data

Mei Kobayashi; Raylene Yung

Participants in on-line discussion forums and decision makers are interested in understanding real-time communications between large numbers of parties on the internet and intranet. As a first step towards addressing this challenge, we developed a prototype to quickly identify and track topics in large, dynamic data sets based on assignment of documents to time slices, fast approximation of cluster centroids to identify discussion topics, and inter-slice correspondence mappings of topics. To verify our method, we conducted implementation studies with data from Innovation Jam 2006, an on-line brainstorming session, in which participants around the globe posted more than 37,000 opinions. Results from our prototype are consistent with the text in the postings, and would have required considerable effort to discover manually.

Explore More