Aleksandr Drozd
Tokyo Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Aleksandr Drozd.
workshop on evaluating vector space representations for nlp | 2016
Anna Gladkova; Aleksandr Drozd
This paper presents an analysis of existing methods for the intrinsic evaluation of word embeddings. We show that the main methodological premise of such evaluations is “interpretability” of word embeddings: a “good” embedding produces results that make sense in terms of traditional linguistic categories. This approach is not only of limited practical use, but also fails to do justice to the strengths of distributional meaning representations. We argue for a shift from abstract ratings of word embedding “quality” to exploration of their strengths and weaknesses.
north american chapter of the association for computational linguistics | 2016
Anna Gladkova; Aleksandr Drozd; Satoshi Matsuoka
Following up on numerous reports of analogybased identification of “linguistic regularities” in word embeddings, this study applies the widely used vector offset method to 4 types of linguistic relations: inflectional and derivational morphology, and lexicographic and encyclopedic semantics. We present a balanced test set with 99,200 questions in 40 categories, and we systematically examine how accuracy for different categories is affected by window size and dimensionality of the SVD-based word embeddings. We also show that GloVe and SVD yield similar patterns of results for different categories, offering further evidence for conceptual similarity between count-based and neural-net based models.
international conference on big data | 2014
Hideyuki Shamoto; Koichi Shirahata; Aleksandr Drozd; Hitoshi Sato; Satoshi Matsuoka
Splitter-based parallel sorting algorithms are known to be highly efficient for distributed sorting due to their low communication complexity. Although using GPU accelerators could help to reduce the computation cost in general, their effectiveness in distributed sorting algorithms on large-scale heterogeneous GPU-based systems remains unclear. We investigate applicability of using GPU devices to the splitter-based algorithms and extend HykSort, an existing splitter-based algorithm by offloading costly computation phases to GPUs. We also handle GPU memory overflows by introducing an iterative approach which sorts multiple chunks and merges them into one array. We evaluate the performance of our implementation with local sort acceleration on the TSUBAME2.5 supercomputer that comprises over 4000 NVIDIA K20x GPUs. Performance evaluation of weak scaling shows that we achieve 389 times speedup with 0.25TB/s throughput when sorting 4TB 64bit integer on 1024 nodes compared to running on 1 node; on the other hand, for CPU vs. GPU comparison, our implementation achieves only 1.40 times speedup using 1024 nodes. Detailed analysis however reveals that the limitation is almost entirely due to the bottleneck in CPU-GPU host-to-device bandwidth. With orders of magnitude improvements planned for next generation GPUs, the performance boost will be tremendous in accordance with other successful GPU accelerations.
Proceedings of the 5th Workshop on Python for High-Performance and Scientific Computing | 2015
Aleksandr Drozd; Anna Gladkova; Satoshi Matsuoka
We present a case study of Python-based workflow for a data-intensive natural language processing problem, namely word classification with vector space model methodology. Problems in the area of natural language processing are typically solved in many steps which require transformation of the data to vastly different formats (in our case, raw text to sparse matrices to dense vectors). A Python implementation for each of these steps would require a different solution. We survey existing approaches to using Python for high-performance processing of large volumes of data, and we propose a sample solution for each step for our case study (aspectual classification of Russian verbs), attempting to preserve both efficiency and user-friendliness. For the most computationally intensive part of the workflow we develop a prototype distributed implementation of co-occurrence extraction module using IPython.parallel cluster.
ieee international conference on high performance computing data and analytics | 2012
Aleksandr Drozd; Naoya Maruyama; Satoshi Matsuoka
This paper describes a performance model for read alignment problem, one of the most computationally intensive tasks in bioinformatics. We adapted Burrows Wheeler transform based index to be used with GPUs to reduce overall memory footprint. A mathematical model of computation and communication costs was developed to find optimal memory partitioning for index and queries. Last we explored the possibility of using multiple GPUs to reduce data transfers and achieved super-linear speedup. Performance evaluation of experimental implementation supports our claims and shows more than 10fold performance gain per device.
IEEE Transactions on Big Data | 2016
Hideyuki Shamoto; Koichi Shirahata; Aleksandr Drozd; Hitoshi Sato; Satoshi Matsuoka
Splitter-based parallel sorting algorithms are known to be highly efficient for distributed sorting due to their low communication complexity. Although using GPU accelerators could help to reduce the computation cost in general, their effectiveness in distributed sorting algorithms remains unclear. We investigate applicability of using GPU devices to the splitter-based algorithms and extend HykSort, an existing splitter-based algorithm by offloading costly computation phases to GPUs. To cope with the volumes of data exceeding the GPU memory capacity, out-of-core local sort is used with small overhead about 7.5 percent when the data size is tripled. We evaluate the performance of our implementation on the TSUBAME2.5 supercomputer that comprises over 4,000 NVIDIA K20x GPUs. Weak scaling analysis shows 389 times speedup with 0.25 TB/s throughput when sorting 4 TB of 64 bit integer values on 1,024 nodes compared to running on one node; this is 1.40 times faster than the reference CPU implementation. Detailed analysis however reveals that the performance is mostly bottlenecked by the CPU-GPU host-to-device bandwidth. With orders of magnitude improvements announced for next generation GPUs, the performance boost will be tremendous in accordance with other successful GPU accelerations.
international parallel and distributed processing symposium | 2012
Aleksandr Drozd; Naoya Maruyama; Satoshi Matsuoka
Bioinformatics is a quickly emerging area of science with many important applications to human life. Sequence alignment in various forms is one of the main instruments used in bioinformatics. This work is motivated by the ever-increasing amount of sequence data that requires more and more computation power for its processing. This task calls for new GPU-based systems and their higher computational potential and energy efficiency as compared to CPUs. We address the problem of facilitating faster sequence alignment using modern multi-GPU clusters. Our initial step was to develop a fast and scalable GPU exact short sequence aligner. We used matching algorithm with small memory footprint based on Burrows-Wheeler transform. We developed a mathematical model of computation and communication costs to find optimal memory partitioning strategy for index and queries. Our solution achieves 10 times speedup over previous implementation based on suffix array on one GPU and scales to multiple GPUs. Our next step will be to adapt the suggested data structure and performance model for multi-node multi-GPU approximate sequence alignment. It is also planned to use exact matching to detect common regions in large sequences and use it as an intermediate step in full-scale genome comparison.
Artificial Life and Robotics | 2016
Aleksandr Drozd; Olaf Witkowski; Satoshi Matsuoka; Takashi Ikegami
We extend an abstract agent-based swarming model based on the evolution of neural network controllers, to explore further the emergence of swarming. Our model is grounded in the ecological situation, in which agents can access some information from the environment about the resource location, but through a noisy channel. Swarming critically improves the efficiency of group foraging, by allowing agents to reach resource areas much more easily by correcting individual mistakes in group dynamics. As high levels of noise may make the emergence of collective behavior depend on a critical mass of agents, it is crucial to reach sufficient computing power to allow for the evolution of the whole set of dynamics in simulation. Since simulating neural controllers and information exchanges between agents are computationally intensive, to scale up simulations to model critical masses of individuals, the implementation requires careful optimization. We apply techniques from astrophysics known as treecodes to compute the signal propagation, and efficiently parallelize for multi-core architectures. Our results open up future research on signal-based emergent collective behavior as a valid collective strategy for uninformed search over a domain space.
ieee international conference on data science and data intensive systems | 2015
Aleksandr Drozd; Anna Gladkova; Satoshi Matsuoka
This paper presents a case study of discovering and classifying verbs in large web-corpora. Many tasks in natural language processing require corpora containing billions of words, and with such volumes of data co-occurrence extraction becomes one of the performance bottlenecks in the Vector Space Models of computational linguistics. We propose a co-occurrence extraction kernel based on ternary trees as an alternative (or a complimentary stage) to conventional map-reduce based approach, this kernel achieves an order of magnitude improvement in memory footprint and processing speed. Our classifier successfully and efficiently identified verbs in a 1.2-billion words untagged corpus of Russian fiction and distinguished between their two aspectual classes. The model proved efficient even for low-frequency vocabulary, including nonce verbs and neologisms.
international congress on big data | 2014
Aleksandr Drozd; Miquel Pericàs; Satoshi Matsuoka
This paper addresses the issue of efficient sorting of strings on multi-and many-core processors. We propose CPU and GPU implementations of the most-significant digit radix sort algorithm using different parallelization strategies on various stages of the execution to achieve good workload balance and optimal use of system resources. We evaluate the performance of our solution on both architectures and compare efficiency of the sorting algorithm on various key lengths. For the GPU implementation we introduce a communication-reducing strategy to overcome the limitations of the PCIe bus bandwidth. Both implementations achieve sorting rates up to 70 million keys per second sorting throughput with good scalability.