# Wasserstein regularization for sparse multi-task regression

@article{Janati2019WassersteinRF, title={Wasserstein regularization for sparse multi-task regression}, author={Hicham Janati and Marco Cuturi and Alexandre Gramfort}, journal={ArXiv}, year={2019}, volume={abs/1805.07833} }

We focus in this paper on high-dimensional regression problems where each regressor can be associated to a location in a physical space, or more generally a generic geometric space. Such problems often employ sparse priors, which promote models using a small subset of regressors. To increase statistical power, the so-called multi-task techniques were proposed, which consist in the simultaneous estimation of several related models. Combined with sparsity assumptions, it lead to models enforcing… Expand

#### Supplemental Code

#### 33 Citations

Decentralised Sparse Multi-Task Regression.

- Mathematics
- 2019

We consider a sparse multi-task regression framework for fitting a collection of related sparse models. Representing models as nodes in a graph with edges between related models, a framework that… Expand

Sinkhorn Regression

- Computer Science
- IJCAI
- 2020

This paper proposes an efficient algorithm to solve the relaxed model and establish its complete statistical guarantees under mild conditions, and leverage Kullback-Leibler divergence to relax the proposed model with marginal constraints into its unbalanced formulation to adapt more types of features. Expand

Sliced Multi-Marginal Optimal Transport

- Computer Science, Mathematics
- ArXiv
- 2021

The sliced multimarginal discrepancy is massively scalable for a large number of probability measures with support as large as 10 samples and can be applied to solving problems such as barycentric averaging, multi-task density estimation and multi- task reinforcement learning. Expand

Multi-source Deep Gaussian Process Kernel Learning

- Computer Science
- ArXiv
- 2020

The approximation of prior-posterior DGP can be considered a novel kernel composition which blends the kernels in different layers and have explicit dependence on the data, suggesting that data-informed approximate DGPs are a powerful tool for integrating data across sources. Expand

A Principled Approach for Learning Task Similarity in Multitask Learning

- Computer Science, Mathematics
- IJCAI
- 2019

An upper bound on the generalization error of multitask learning is provided, showing the benefit of explicit and implicit task similarity knowledge, and a new training algorithm is proposed to learn the task relation coefficients and neural network parameters iteratively. Expand

Manifold optimization for non-linear optimal transport problems

- Computer Science, Mathematics
- 2021

This work discusses optimization-related ingredients that allow modeling the OT problem on smooth Riemannian manifolds by exploiting the geometry of the search space and makes available the Manifold optimization-based Optimal Transport repository, or MOT, repository with codes useful in solving OT problems in Python and Matlab. Expand

FEATURE-ROBUST OPTIMAL TRANSPORT

- 2020

Optimal transport is a machine learning problem with applications including distribution comparison, feature selection, and generative adversarial networks. In this paper, we propose feature-robust… Expand

Multi-subject MEG/EEG source imaging with sparse multi-task regression

- Computer Science, Medicine
- NeuroImage
- 2020

This analysis of a multimodal dataset shows how multi-subject source localization reduces the gap between MEG and fMRI for brain mapping and proposes the Minimum Wasserstein Estimates (MWE), a new joint regression method based on optimal transport metrics that promotes spatial proximity on the cortical mantle. Expand

Estimation of Wasserstein distances in the Spiked Transport Model

- Mathematics
- 2019

We propose a new statistical model, the spiked transport model, which formalizes the assumption that two probability distributions differ only on a low-dimensional subspace. We study the minimax rate… Expand

Manifold optimization for optimal transport

- Computer Science
- ArXiv
- 2021

This work discusses optimization-related ingredients that allow modeling the OT problem on smooth Riemannian manifolds by exploiting the geometry of the search space and makes available the Manifold optimization-based Optimal Transport repository with codes useful in solving OT problems in Python and Matlab. Expand

#### References

SHOWING 1-10 OF 36 REFERENCES

A Dirty Model for Multi-task Learning

- Computer Science, Mathematics
- NIPS
- 2010

We consider multi-task learning in the setting of multiple linear regression, and where some relevant features could be shared across the tasks. Recent research has studied the use of l1/lq norm… Expand

Multi-level Lasso for Sparse Multi-task Regression

- Computer Science
- ICML
- 2012

The approach is based on an intuitive decomposition of the regression coe_cients into a product between a component that is common to all tasks and another component that captures task-specificity that yields the Multi-level Lasso objective. Expand

Multi-Task Feature Learning

- Mathematics, Computer Science
- NIPS
- 2006

The method builds upon the well-known 1-norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks, and develops an iterative algorithm for solving it. Expand

A Convex Feature Learning Formulation for Latent Task Structure Discovery

- Computer Science, Mathematics
- ICML
- 2012

The main contribution is a convex formulation that employs a graph-based regularizer and simultaneously discovers few groups of related tasks, having close-by task parameters, as well as the feature space shared within each group. Expand

Multi-task feature selection

- 2006

We address the problem of joint feature selection across a group of related classification or regression tasks. We propose a novel type of joint regularization of the model parameters in order to… Expand

Joint support recovery under high-dimensional scaling: Benefits and perils of ℓ 1,∞ -regularization

- Mathematics
- NIPS 2008
- 2008

Given a collection of r ≥ 2 linear regression problems in p dimensions, suppose that the regression coefficients share partially common supports. This set-up suggests the use of l1/l∞-regularized… Expand

Learning with a Wasserstein Loss

- Computer Science, Mathematics
- NIPS
- 2015

An efficient learning algorithm based on this regularization, as well as a novel extension of the Wasserstein distance from probability measures to unnormalized measures, which can encourage smoothness of the predictions with respect to a chosen metric on the output space. Expand

Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning

- Mathematics, Computer Science
- SIAM J. Imaging Sci.
- 2018

A new nonlinear dictionary learning method for histograms in the probability simplex that leverages optimal transport theory, relying on Wasserstein barycenters instead of the usual matrix product between dictionary and codes, allowing for nonlinear relationships between atoms and the reconstruction of input data. Expand

Sparse Group Lasso: Consistency and Climate Applications

- Computer Science
- SDM
- 2012

In this paper, theoretical statistical consistency of estimators with tree-structured norm regularizers is proved, which proves that the SGL model provides better predictive performance than the current state-of-the-art, remains climatologically interpretable, and is robust in its variable selection. Expand

Learning Tree Structure in Multi-Task Learning

- Mathematics, Computer Science
- KDD
- 2015

A TAsk Tree (TAT) model for MTL is developed, which devise sequential constraints to make the distance between the parameters in the component matrices corresponding to each pair of tasks decrease over layers, and hence the component parameters will keep fused until the topmost layer, once they become fused in a layer. Expand