Shinichi Shimozono | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shinichi Shimozono is active.

Explore More

Publication

Featured researches published by Shinichi Shimozono.

research in computational molecular biology | 2000

On approximation algorithms for local multiple alignment

Tatsuya Akutsu; Hiroki Arimura; Shinichi Shimozono

This paper studies the local multiple alignment problem, which is also known as the general consensus patterns problem. Local multiple alignment is, given protein or DNA sequences, to locate a region (i.e., a substring) of fixed length from each sequence so that the score determined from the set of regions is optimized. We consider the following scoring schemes. the score indicating the average information content, the score defined by Li et al, and the sum-of-pairs score We prove that multiple local alignment is NP-hard under each of these scoring schemes. In addition, we prove that multiple local alignment is APX-hard under the average information content scoring. It implies that unless P = NP there is no polynomial time algorithm whose worst case approximation error can be arbitrarily specified (precisely, a polynomial time approximation scheme). Several related theoretical results are provided. We also made computational experiments on approximation algorithms for local multiple alignment under the average information content scoring. The results suggest that the Gibbs sampling algorithm proposed by Lawrence et al. is the best.

New Generation Computing | 2000

Efficient discovery of optimal word-association patterns in large text databases

Shinichi Shimozono; Hiroki Arimura; Setsuo Arikawa

We study efficient discovery of proximity word-association patterns, defined by a sequence of strings and a proximity gap, from a collection of texts with the positive and the negative labels. We present an algorithm that finds alld-stringsk-proximity word-association patterns that maximize the number of texts whose matching agree with their labels. It runs in expected time complexityO(kd−1n logdn) and spaceO(kd−1n) with the total lengthn of texts, if texts are uniformly random strings. We also show that the problem to find one of the best word-association patterns with arbitrarily many strings in MAX SNP-hard.

discovery science | 2007

Time and space efficient discovery of maximal geometric graphs

Hiroki Arimura; Takeaki Uno; Shinichi Shimozono

A geometric graph is a labeled graph whose vertices are points in the 2D plane with an isomorphism invariant under geometric transformations such as translation, rotation, and scaling. While Kuramochi and Karypis (ICDM2002) extensively studied the frequent pattern mining problem for geometric subgraphs, the maximal graph mining has not been considered so far. In this paper, we study the maximal (or closed) graph mining problem for the general class of geometric graphs in the 2D plane by extending the framework of Kuramochi and Karypis. Combining techniques of canonical encoding and a depth-first search tree for the class of maximal patterns, we present a polynomial delay and polynomial space algorithm, MaxGeo, that enumerates all maximal subgraphs in a given input geometric graph without duplicates. This is the first result establishing the output-sensitive complexity of closed graph mining for geometric graphs. We also show that the frequent graph mining problem is also solvable in polynomial delay and polynomial time.

hawaii international conference on system sciences | 1993

Finding alphabet indexing for decision trees over regular patterns: an approach to bioinformatical knowledge acquisition

Shinichi Shimozono; Ayumi Shinohara; Takeshi Shinohara; Satoru Miyano; Setsuo Arikawa

Considers a transformation from an alphabet to a smaller alphabet which does not lose any positive and negative information of the original examples. Such a transformation is called indexing. A method which exploits indexing by a local search technique for learning decision trees over regular patterns is proposed. From positive and negative examples, the system produces, as a hypothesis, an indexing-decision tree pair. The authors also report some experimental results obtained by this machine learning system on the following identification problems: transmembrane domains, and signal peptides. For transmembrane domains, the system discovered an indexing by two symbols and a decision tree with just three nodes that achieves 92% accuracy. The indexing was almost the same as that biased on the hydropathy index of Kyte and Doolittle (1982). For signal peptides, the system also found sufficiently good hypotheses.<<ETX>>

international symposium on algorithms and computation | 1998

Maximizing Agreement with a Classification by Bounded or Unbounded Number of Associated Words

Hiroki Arimura; Shinichi Shimozono

We study the efficient discovery of word-association patterns, defined by a sequence of strings and a proximity gap, from a collection of texts with binary labels. We present an algorithm that finds all d strings and k proximity word-association patterns that maximizes agreement with the labels. It runs in expected time complexity O(kd-1n logd+1 n) and O(kd-1n) space with the total length n of texts, if texts are uniformly random strings. We also show that the problem to find a best word-association pattern with arbitrarily many strings is MAX SNP-hard.

string processing and information retrieval | 2004

A Space-Saving Linear-Time Algorithm for Grammar-Based Compression

Hiroshi Sakamoto; Takuya Kida; Shinichi Shimozono

A space-efficient linear-time approximation algorithm for the grammar-based compression problem, which requests for a given string to find a smallest context-free grammar deriving the string, is presented. The algorithm consumes only O(g * log g *) space and achieves the worst-case approximation ratio O(log g * log n), with the size n of an input and the optimum grammar size g *. Experimental results for typical benchmarks demonstrate that our algorithm is practical and efficient.

PLOS ONE | 2012

Application of approximate pattern matching in two dimensional spaces to grid layout for biochemical network maps.

Kentaro Inoue; Shinichi Shimozono; Hideaki Yoshida; Hiroyuki Kurata

Background For visualizing large-scale biochemical network maps, it is important to calculate the coordinates of molecular nodes quickly and to enhance the understanding or traceability of them. The grid layout is effective in drawing compact, orderly, balanced network maps with node label spaces, but existing grid layout algorithms often require a high computational cost because they have to consider complicated positional constraints through the entire optimization process. Results We propose a hybrid grid layout algorithm that consists of a non-grid, fast layout (preprocessor) algorithm and an approximate pattern matching algorithm that distributes the resultant preprocessed nodes on square grid points. To demonstrate the feasibility of the hybrid layout algorithm, it is characterized in terms of the calculation time, numbers of edge-edge and node-edge crossings, relative edge lengths, and F-measures. The proposed algorithm achieves outstanding performances compared with other existing grid layouts. Conclusions Use of an approximate pattern matching algorithm quickly redistributes the laid-out nodes by fast, non-grid algorithms on the square grid points, while preserving the topological relationships among the nodes. The proposed algorithm is a novel use of the pattern matching, thereby providing a breakthrough for grid layout. This application program can be freely downloaded from http://www.cadlive.jp/hybridlayout/hybridlayout.html.

international symposium on algorithms and computation | 2009

Fragmentary Pattern Matching: Complexity, Algorithms and Applications for Analyzing Classic Literary Works

Hideaki Hori; Shinichi Shimozono; Masayuki Takeda; Ayumi Shinohara

A fragmentary pattern is a multiset of non-empty strings, and it matches a string w if all the strings in it occur within w without any overlaps. We study some fundamental issues on computational complexity related to the matching of fragmentary patterns. We show that the fragmentary pattern matching problem is NP-complete, and the problem to find a fragmentary pattern common to two strings that maximizes the pattern score is NP-hard. Moreover, we propose a polynomialtime approximation algorithm for the fragmentary pattern matching, and show that it achieves a constant worst-case approximation ratio if either the strings in a pattern have the same length, or the importance weights of strings in a pattern are proportional to their lengths.

acm international conference on digital libraries | 2000

Text data mining: discovery of important keywords in the cyberspace

Hiroki Arimura; Junichiro Abe; Ryoichi Fujino; Hiroshi Sakamoto; Shinichi Shimozono; Setsuo Arikawa

This paper describes applications of the optimized pattern discovery framework to text and Web mining. In particular, we introduce a class of simple combinatorial patterns over phrases, called proximity phrase association patterns, and consider the problem of finding the patterns that optimize a given statistical measure within the whole class of patterns in a large collection of unstructured texts. For this class of patterns, we develop fast and robust text mining algorithms based on techniques in computational geometry and string matching. Finally, we successfully apply the developed text mining algorithms to the experiments on interactive document browsing in a large text database and keyword discovery from Web bases.

Theoretical Computer Science | 1999

Alphabet indexing for approximating features of symbols

Shinichi Shimozono

We consider two maximization problems to find a mapping from a large alphabet forming given two sets of strings to a set of a very few symbols specifying a symbol wise transformation of strings. First we show that the problem to find a mapping that transforms the most of the strings as to form disjoint sets cannot be approximated within a ratio n116 in polynomial time, unless P = NP. Next we consider a mapping that retains the difference of the maximum number of pairs of strings over the given sets. We present a polynomial-time approximation algorithm that guarantees a ratio k(k − 1) for mappings to k symbols, as well as proving that the problem is hard to approximate within an arbitrary small ratio in polynomial time. Furthermore, we extend this algorithm as to deal with not only pairs but also tuples of strings and show that it achieves a constant approximation ratio.

Explore More