Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yukino Baba is active.

Publication


Featured researches published by Yukino Baba.


knowledge discovery and data mining | 2013

Statistical quality estimation for general crowdsourcing tasks

Yukino Baba; Hisashi Kashima

One of the biggest challenges for requesters and platform providers of crowdsourcing is quality control, which is to expect high-quality results from crowd workers who are neither necessarily very capable nor motivated. A common approach to tackle this problem is to introduce redundancy, that is, to request multiple workers to work on the same tasks. For simple multiple-choice tasks, several statistical methods to aggregate the multiple answers have been proposed. However, these methods cannot always be applied to more general tasks with unstructured response formats such as article writing, program coding, and logo designing, which occupy the majority on most crowdsourcing marketplaces. In this paper, we propose an unsupervised statistical quality estimation method for such general crowdsourcing tasks. Our method is based on the two-stage procedure; multiple workers are first requested to work on the same tasks in the creation stage, and then another set of workers review and grade each artifact in the review stage. We model the ability of each author and the bias of each reviewer, and propose a two-stage probabilistic generative model using the graded response model in the item response theory. Experiments using several general crowdsourcing tasks show that our method outperforms popular vote aggregation methods, which implies that our method can deliver high quality results with lower costs.


Journal of Medical Internet Research | 2015

Health Checkup and Telemedical Intervention Program for Preventive Medicine in Developing Countries: Verification Study

Yasunobu Nohara; Eiko Kai; Partha Pratim Ghosh; Rafiqul Islam; Ashir Ahmed; Masahiro Kuroda; Sozo Inoue; Tatsuo Hiramatsu; Michio Kimura; Shuji Shimizu; Kunihisa Kobayashi; Yukino Baba; Hisashi Kashima; Koji Tsuda; Masashi Sugiyama; Mathieu Blondel; Naonori Ueda; Masaru Kitsuregawa; Naoki Nakashima

Background The prevalence of non-communicable diseases is increasing throughout the world, including developing countries. Objective The intent was to conduct a study of a preventive medical service in a developing country, combining eHealth checkups and teleconsultation as well as assess stratification rules and the short-term effects of intervention. Methods We developed an eHealth system that comprises a set of sensor devices in an attaché case, a data transmission system linked to a mobile network, and a data management application. We provided eHealth checkups for the populations of five villages and the employees of five factories/offices in Bangladesh. Individual health condition was automatically categorized into four grades based on international diagnostic standards: green (healthy), yellow (caution), orange (affected), and red (emergent). We provided teleconsultation for orange- and red-grade subjects and we provided teleprescription for these subjects as required. Results The first checkup was provided to 16,741 subjects. After one year, 2361 subjects participated in the second checkup and the systolic blood pressure of these subjects was significantly decreased from an average of 121 mmHg to an average of 116 mmHg (P<.001). Based on these results, we propose a cost-effective method using a machine learning technique (random forest method) using the medical interview, subject profiles, and checkup results as predictor to avoid costly measurements of blood sugar, to ensure sustainability of the program in developing countries. Conclusions The results of this study demonstrate the benefits of an eHealth checkup and teleconsultation program as an effective health care system in developing countries.


database systems for advanced applications | 2014

Skill Ontology-Based Model for Quality Assurance in Crowdsourcing

Kinda El Maarry; Wolf-Tilo Balke; Hyunsouk Cho; Seung-won Hwang; Yukino Baba

Crowdsourcing continues to gain more momentum as its potential becomes more recognized. Nevertheless, the associated quality aspect remains a valid concern, which introduces uncertainty in the results obtained from the crowd. We identify the different aspects that dynamically affect the overall quality of a crowdsourcing task. Accordingly, we propose a skill ontology-based model that caters for these aspects, as a management technique to be adopted by crowdsourcing platforms. The model maintains a dynamically evolving ontology of skills, with libraries of standardized and personalized assessments for awarding workers skills. Aligning a worker’s set of skills to that required by a task, boosts the ultimate resulting quality. We visualize the model’s components and workflow, and consider how to guard it against malicious or unqualified workers, whose responses introduce this uncertainty and degrade the overall quality.


knowledge discovery and data mining | 2015

Predictive Approaches for Low-Cost Preventive Medicine Program in Developing Countries

Yukino Baba; Hisashi Kashima; Yasunobu Nohara; Eiko Kai; Partha Pratim Ghosh; Rafiqul Islam; Ashir Ahmed; Masahiro Kuroda; Sozo Inoue; Tatsuo Hiramatsu; Michio Kimura; Shuji Shimizu; Kunihisa Kobayashi; Koji Tsuda; Masashi Sugiyama; Mathieu Blondel; Naonori Ueda; Masaru Kitsuregawa; Naoki Nakashima

Non-communicable diseases (NCDs) are no longer just a problem for high-income countries, but they are also a problem that affects developing countries. Preventive medicine is definitely the key to combat NCDs; however, the cost of preventive programs is a critical issue affecting the popularization of these medicine programs in developing countries. In this study, we investigate predictive modeling for providing a low-cost preventive medicine program. In our two-year-long field study in Bangladesh, we collected the health checkup results of 15,075 subjects, the data of 6,607 prescriptions, and the follow-up examination results of 2,109 subjects. We address three prediction problems, namely subject risk prediction, drug recommendation, and future risk prediction, by using machine learning techniques; our multiple-classifier approach successfully reduced the costs of health checkups, a multi-task learning method provided accurate recommendation for specific types of drugs, and an active learning method achieved an efficient assignment of healthcare workers for the follow-up care of subjects.


ieee international conference on data science and advanced analytics | 2015

From one star to three stars: Upgrading legacy open data using crowdsourcing

Satoshi Oyama; Yukino Baba; Ikki Ohmukai; Hiroaki Dokoshi; Hisashi Kashima

Despite recent open data initiatives in many countries, a significant percentage of the data provided is in non-machine-readable formats like image format rather than in a machine-readable electronic format, thereby restricting their usability. This paper describes the first unified framework for converting legacy open data in image format into a machine-readable and reusable format by using crowdsourcing. Crowd workers are asked not only to extract data from an image of a chart but also to reproduce the chart objects in spreadsheets. The properties of the reconstructed chart objects give their data structures including series names and values, which are useful for automatic processing of data by computer. Since results produced by crowdsourcing inherently contain errors, a quality control mechanism was developed that improves the accuracy of extracted tables by aggregating tables created by different workers for the same chart image and by utilizing the data structures obtained from the reproduced chart objects. Experimental results demonstrated that the proposed framework and mechanism are effective.


Expert Systems With Applications | 2016

Participation recommendation system for crowdsourcing contests

Yukino Baba; Kei Kinoshita; Hisashi Kashima

Statistical models of winner determination for crowdsourcing contests are proposed.The use of auxiliary information improves the accuracy of contest recommendation.Transfer learning is beneficial to address the sparsity of contest data. We propose a novel participation recommendation approach for crowdsourcing contests including probabilistic modeling of contest participation and winner determination. Our method estimates the winning and participation probability of each worker and offers ranked lists of recommended contests. Since there is only one winner in most contests, standard recommendation techniques fail to estimate the accurate winning probability using only the extremely sparse winning information of completed contests. Our solution is to utilize contest participation information and features of workers and contests as auxiliary information. We use the concept of a transfer learning method for matrices and a feature-based matrix factorization method. Experiments conducted using real crowdsourcing contest datasets show that the use of auxiliary information is crucial for improving the performance of contest recommendation, and also reveal several important common skills.


pacific-asia conference on knowledge discovery and data mining | 2015

Quality Control for Crowdsourced POI Collection

Shunsuke Kajimura; Yukino Baba; Hiroshi Kajino; Hisashi Kashima

Crowdsourcing allows human intelligence tasks to be outsourced to a large number of unspecified people at low costs. However, because of the uneven ability and diligence of crowd workers, the quality of their submitted work is also uneven and sometimes quite low. Therefore, quality control is one of the central issues in crowdsourcing research. In this paper, we consider a quality control problem of POI (points of interest) collection tasks, in which workers are asked to enumerate location information of POIs. Since workers neither necessarily provide correct answers nor provide exactly the same answers even if the answers indicate the same place, we propose a two-stage quality control method consisting of an answer clustering stage and a reliability estimation stage. Implemented with a new constrained exemplar clustering and a modified HITS algorithm, the effectiveness of our method is demonstrated as compared to baseline methods on several real crowdsourcing datasets.


conference on information and knowledge management | 2017

Hyper Questions: Unsupervised Targeting of a Few Experts in Crowdsourcing

Jiyi Li; Yukino Baba; Hisashi Kashima

Quality control is one of the major problems in crowdsourcing. One of the primary approaches to rectify this issue is to assign the same task to different workers and then aggregate their answers to obtain a reliable answer. In addition to simple aggregation approaches such as majority voting, various sophisticated probabilistic models have been proposed. However, given that most of the existing methods operate by strengthening the opinions of the majority, these models often fail when the tasks require highly specialized knowledge and the ability of a large majority of the workers is inadequate. In this paper, we focus on an important class of answer aggregation problems in which majority voting fails and propose the concept of hyper questions to devise effective aggregation methods. A hyper question is a set of single questions, and our key idea is that experts are more likely to provide correct answers to all of the single questions included in a hyper question than non-experts. Thus, experts are more likely to reach consensus on the hyper questions than non-experts, which strengthen their influences. We incorporate the concept of hyper questions into existing answer aggregation methods. The results of our experiments conducted using both synthetic datasets and real datasets demonstrate that our simple and easily usable approach works effectively in cases where only a few experts are available.


international conference on data mining | 2015

Quality Control for Crowdsourced Hierarchical Classification

Naoki Otani; Yukino Baba; Hisashi Kashima

Repeated labeling is a widely adopted quality control method in crowdsourcing. This method is based on selecting one reliable label from multiple labels collected by workers because a single label from only one worker has a wide variance of accuracy. Hierarchical classification, where each class has a hierarchical relationship, is a typical task in crowdsourcing. However, direct applications of existing methods designed for multi-class classification have the disadvantage of discriminating among a large number of classes. In this paper, we propose a label aggregation method for hierarchical classification tasks. Our method takes the hierarchical structure into account to handle a large number of classes and estimate worker abilities more precisely. Our method is inspired by the steps model based on item response theory, which models responses of examinees to sequentially dependent questions. We considered hierarchical classification to be a question consisting of a sequence of subquestions and built a worker response model for hierarchical classification. We conducted experiments using real crowdsourced hierarchical classification tasks and demonstrated the benefit of incorporating a hierarchical structure to improve the label aggregation accuracy.


international joint conference on artificial intelligence | 2018

Simultaneous Clustering and Ranking from Pairwise Comparisons

Jiyi Li; Yukino Baba; Hisashi Kashima

When people make decisions with a number of ideas, designs, or other kinds of objects, one attempt is probably to organize them into several groups of objects and to prioritize them according to some preference. The grouping task is referred to as clustering and the prioritizing task is called as ranking. These tasks are often outsourced with the help of human judgments in the form of pairwise comparisons. Two objects are compared on whether they are similar in the clustering problem, while the object of higher priority is determined in the ranking problem. Our research question in this paper is whether the pairwise comparisons for clustering also help ranking (and vice versa). Instead of solving the two tasks separately, we propose a unified formulation to bridge the two types of pairwise comparisons. Our formulation simultaneously estimates the object embeddings and the preference criterion vector. The experiments using real datasets support our hypothesis; our approach can generate better neighbor and preference estimation results than the approaches that only focus on a single type of pairwise comparisons.

Collaboration


Dive into the Yukino Baba's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Fuyuki Ishikawa

National Institute of Informatics

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge