Mathieu Blondel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mathieu Blondel is active.

Explore More

Publication

Featured researches published by Mathieu Blondel.

european conference on machine learning | 2015

Convex Factorization Machines

Mathieu Blondel; Akinori Fujino; Naonori Ueda

Factorization machines are a generic framework which allows to mimic many factorization models simply by feature engineering. In this way, they combine the high predictive accuracy of factorization models with the flexibility of feature engineering. Unfortunately, factorization machines involve a non-convex optimization problem and are thus subject to bad local minima. In this paper, we propose a convex formulation of factorization machines based on the nuclear norm. Our formulation imposes fewer restrictions on the learned model and is thus more general than the original formulation. To solve the corresponding optimization problem, we present an efficient globally-convergent two-block coordinate descent algorithm. Empirically, we demonstrate that our approach achieves comparable or better predictive accuracy than the original factorization machines on 4 recommendation tasks and scales to datasets with 10 million samples.

PLOS ONE | 2015

A Ranking Approach to Genomic Selection

Mathieu Blondel; Akio Onogi; Hiroyoshi Iwata; Naonori Ueda

Background Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual’s breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used. Contributions In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value. Results We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.

Journal of Medical Internet Research | 2015

Health Checkup and Telemedical Intervention Program for Preventive Medicine in Developing Countries: Verification Study

Yasunobu Nohara; Eiko Kai; Partha Pratim Ghosh; Rafiqul Islam; Ashir Ahmed; Masahiro Kuroda; Sozo Inoue; Tatsuo Hiramatsu; Michio Kimura; Shuji Shimizu; Kunihisa Kobayashi; Yukino Baba; Hisashi Kashima; Koji Tsuda; Masashi Sugiyama; Mathieu Blondel; Naonori Ueda; Masaru Kitsuregawa; Naoki Nakashima

Background The prevalence of non-communicable diseases is increasing throughout the world, including developing countries. Objective The intent was to conduct a study of a preventive medical service in a developing country, combining eHealth checkups and teleconsultation as well as assess stratification rules and the short-term effects of intervention. Methods We developed an eHealth system that comprises a set of sensor devices in an attaché case, a data transmission system linked to a mobile network, and a data management application. We provided eHealth checkups for the populations of five villages and the employees of five factories/offices in Bangladesh. Individual health condition was automatically categorized into four grades based on international diagnostic standards: green (healthy), yellow (caution), orange (affected), and red (emergent). We provided teleconsultation for orange- and red-grade subjects and we provided teleprescription for these subjects as required. Results The first checkup was provided to 16,741 subjects. After one year, 2361 subjects participated in the second checkup and the systolic blood pressure of these subjects was significantly decreased from an average of 121 mmHg to an average of 116 mmHg (P<.001). Based on these results, we propose a cost-effective method using a machine learning technique (random forest method) using the medical interview, subject profiles, and checkup results as predictor to avoid costly measurements of blood sugar, to ensure sustainability of the program in developing countries. Conclusions The results of this study demonstrate the benefits of an eHealth checkup and teleconsultation program as an effective health care system in developing countries.

knowledge discovery and data mining | 2015

Predictive Approaches for Low-Cost Preventive Medicine Program in Developing Countries

Yukino Baba; Hisashi Kashima; Yasunobu Nohara; Eiko Kai; Partha Pratim Ghosh; Rafiqul Islam; Ashir Ahmed; Masahiro Kuroda; Sozo Inoue; Tatsuo Hiramatsu; Michio Kimura; Shuji Shimizu; Kunihisa Kobayashi; Koji Tsuda; Masashi Sugiyama; Mathieu Blondel; Naonori Ueda; Masaru Kitsuregawa; Naoki Nakashima

Non-communicable diseases (NCDs) are no longer just a problem for high-income countries, but they are also a problem that affects developing countries. Preventive medicine is definitely the key to combat NCDs; however, the cost of preventive programs is a critical issue affecting the popularization of these medicine programs in developing countries. In this study, we investigate predictive modeling for providing a low-cost preventive medicine program. In our two-year-long field study in Bangladesh, we collected the health checkup results of 15,075 subjects, the data of 6,607 prescriptions, and the follow-up examination results of 2,109 subjects. We address three prediction problems, namely subject risk prediction, drug recommendation, and future risk prediction, by using machine learning techniques; our multiple-classifier approach successfully reduced the costs of health checkups, a multi-task learning method provided accurate recommendation for specific types of drugs, and an active learning method achieved an efficient assignment of healthcare workers for the follow-up care of subjects.

international conference on pattern recognition | 2014

Large-Scale Multiclass Support Vector Machine Training via Euclidean Projection onto the Simplex

Mathieu Blondel; Akinori Fujino; Naonori Ueda

Dual decomposition methods are the current state-of-the-art for training multiclass formulations of Support Vector Machines (SVMs). At every iteration, dual decomposition methods update a small subset of dual variables by solving a restricted optimization problem. In this paper, we propose an exact and efficient method for solving the restricted problem. In our method, the restricted problem is reduced to the well-known problem of Euclidean projection onto the positive simplex, which we can solve exactly in expected O(k) time, where k is the number of classes. We demonstrate that our method empirically achieves state-of-the-art convergence on several large-scale high-dimensional datasets.

EURASIP Journal on Advances in Signal Processing | 2018

Blind source separation with optimal transport non-negative matrix factorization

Antoine Rolet; Vivien Seguy; Mathieu Blondel; Hiroshi Sawada

Optimal transport as a loss for machine learning optimization problems has recently gained a lot of attention. Building upon recent advances in computational optimal transport, we develop an optimal transport non-negative matrix factorization (NMF) algorithm for supervised speech blind source separation (BSS). Optimal transport allows us to design and leverage a cost between short-time Fourier transform (STFT) spectrogram frequencies, which takes into account how humans perceive sound. We give empirical evidence that using our proposed optimal transport, NMF leads to perceptually better results than NMF with other losses, for both isolated voice reconstruction and speech denoising using BSS. Finally, we demonstrate how to use optimal transport for cross-domain sound processing tasks, where frequencies represented in the input spectrograms may be different from one spectrogram to another.

international joint conference on artificial intelligence | 2017

SVD-Based Screening for the Graphical Lasso

Yasuhiro Fujiwara; Naoki Marumo; Mathieu Blondel; Koh Takeuchi; Hideaki Kim; Tomoharu Iwata; Naonori Ueda

The graphical lasso is the most popular approach to estimating the inverse covariance matrix of highdimension data. It iteratively estimates each row and column of the matrix in a round-robin style until convergence. However, the graphical lasso is infeasible due to its high computation cost for large size of datasets. This paper proposes Sting, a fast approach to the graphical lasso. In order to reduce the computation cost, it efficiently identifies blocks in the estimated matrix that have nonzero elements before entering the iterations by exploiting the singular value decomposition of data matrix. In addition, it selectively updates elements of the estimated matrix expected to have nonzero values. Theoretically, it guarantees to converge to the same result as the original algorithm of the graphical lasso. Experiments show that our approach is faster than existing approaches.

international conference on management of data | 2017

Scaling Locally Linear Embedding

Yasuhiro Fujiwara; Naoki Marumo; Mathieu Blondel; Koh Takeuchi; Hideaki Kim; Tomoharu Iwata; Naonori Ueda

Locally Linear Embedding (LLE) is a popular approach to dimensionality reduction as it can effectively represent nonlinear structures of high-dimensional data. For dimensionality reduction, it computes a nearest neighbor graph from a given dataset where edge weights are obtained by applying the Lagrange multiplier method, and it then computes eigenvectors of the LLE kernel where the edge weights are used to obtain the kernel. Although LLE is used in many applications, its computation cost is significantly high. This is because, in obtaining edge weights, its computation cost is cubic in the number of edges to each data point. In addition, the computation cost in obtaining the eigenvectors of the LLE kernel is cubic in the number of data points. Our approach, Ripple, is based on two ideas: (1) it incrementally updates the edge weights by exploiting the Woodbury formula and (2) it efficiently computes eigenvectors of the LLE kernel by exploiting the LU decomposition-based inverse power method. Experiments show that Ripple is significantly faster than the original approach of LLE by guaranteeing the same results of dimensionality reduction.

neural information processing systems | 2016