# c-lasso: a Python package for constrained sparse regression and classification
=========
c-lasso is a Python package that enables sparse and robust linear regression and classification with linear equality
constraints on the model parameters. The forward model is assumed to be:
<img src="https://latex.codecogs.com/gif.latex?y=X\beta+\sigma\epsilon\qquad\text{s.t.}\qquad&space;C\beta=0" title="y=X\beta+\sigma\epsilon\qquad\text{s.t.}\qquad C\beta=0" />
Here, y and X are given outcome and predictor data. The vector y can be continuous (for regression) or binary (for classification). C is a general constraint matrix. The vector β comprises the unknown coefficients and σ an
unknown scale.
The package handles several different estimators for inferring β (and σ), including
the constrained Lasso, the constrained scaled Lasso, and sparse Huber M-estimation with linear equality constraints.
Several different algorithmic strategies, including path and proximal splitting algorithms, are implemented to solve
the underlying convex optimization problems.
We also include two model selection strategies for determining the sparsity of the model parameters: k-fold cross-validation and stability selection.
This package is intended to fill the gap between popular python tools such as [scikit-learn](https://scikit-learn.org/stable/) which CANNOT solve sparse constrained problems and general-purpose optimization solvers that do not scale well for the considered problems.
Below we show several use cases of the package, including an application of sparse *log-contrast*
regression tasks for *compositional* microbiome data.
The code builds on results from several papers which can be found in the [References](#references).
## Table of Contents
* [Installation](#installation)
* [Regression and classification problems](#regression-and-classification-problems)
* [Getting started](#getting-started)
* [Log-contrast regression for microbiome data](#log-contrast-regression-for-microbiome-data)
* [Optimization schemes](#optimization-schemes)
* [Structure of the code](#structure-of-the-code)
* [References](#references)
## Installation
c-lasso is available on pip. You can install the package
in the shell using
```shell
pip install c_lasso
```
To use the c-lasso package in Python, type
```python
from classo import *
```
The c-lasso package depends on several standard Python packages.
The dependencies are included in the package. Those are, namely :
`numpy` ;
`matplotlib` ;
`scipy` ;
`pandas` ;
`h5py` .
## Regression and classification problems
The c-lasso package can solve six different types of estimation problems:
four regression-type and two classification-type formulations.
#### [R1] Standard constrained Lasso regression:
<img src="https://latex.codecogs.com/gif.latex?\arg\min_{\beta\in&space;R^d}&space;||&space;X\beta-y&space;||^2&space;+&space;\lambda&space;||\beta||_1&space;\qquad\mbox{s.t.}\qquad&space;C\beta=0" />
This is the standard Lasso problem with linear equality constraints on the β vector.
The objective function combines Least-Squares for model fitting with l1 penalty for sparsity.
#### [R2] Contrained sparse Huber regression:
<img src="https://latex.codecogs.com/gif.latex?\arg\min_{\beta\in&space;R^d}&space;h_{\rho}(X\beta-y&space;)&space;+&space;\lambda&space;||\beta||_1&space;\qquad\mbox{s.t.}\qquad&space;C\beta=0" />
This regression problem uses the [Huber loss](https://en.wikipedia.org/wiki/Huber_loss) as objective function
for robust model fitting with l1 and linear equality constraints on the β vector. The parameter ρ=1.345.
#### [R3] Contrained scaled Lasso regression:
<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d,&space;\sigma&space;>&space;0}&space;\frac{||&space;X\beta&space;-&space;y||^2}{\sigma}&space;+&space;\frac{n}{2}&space;\sigma+&space;\lambda&space;||\beta||_1&space;\qquad&space;\mbox{s.t.}&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d, \sigma > 0} \frac{|| X\beta - y||^2}{\sigma} + \frac{n}{2} \sigma+ \lambda ||\beta||_1 \qquad \mbox{s.t.} \qquad C\beta = 0" />
This formulation is similar to [R1] but allows for joint estimation of the (constrained) β vector and
the standard deviation σ in a concomitant fashion (see [References](#references) [4,5] for further info).
This is the default problem formulation in c-lasso.
#### [R4] Contrained sparse Huber regression with concomitant scale estimation:
<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d,&space;\sigma&space;>&space;0}&space;\left(&space;h_{\rho}&space;\left(&space;\frac{&space;X\beta&space;-&space;y}{\sigma}&space;\right)+&space;n&space;\right)&space;\sigma+&space;\lambda&space;||\beta||_1&space;\qquad&space;\mbox{s.t.}&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d, \sigma > 0} \left( h_{\rho} \left( \frac{ X\beta - y}{\sigma} \right)+ n \right) \sigma+ \lambda ||\beta||_1 \qquad \mbox{s.t.} \qquad C\beta = 0" />
This formulation combines [R2] and [R3] to allow robust joint estimation of the (constrained) β vector and
the scale σ in a concomitant fashion (see [References](#references) [4,5] for further info).
#### [C1] Contrained sparse classification with Square Hinge loss:
<img src="https://latex.codecogs.com/gif.latex?\arg\min_{\beta\in&space;R^d}&space;l(y^TX\beta)&space;+&space;\lambda&space;||\beta||_1&space;\qquad\mbox{s.t.}\qquad&space;C\beta=0" />
where l is defined as :
<img src="https://latex.codecogs.com/gif.latex?l(r)=\max(1-r,0)^2" />
This formulation is similar to [R1] but adapted for classification tasks using the Square Hinge loss
with (constrained) sparse β vector estimation.
#### [C2] Contrained sparse classification with Huberized Square Hinge loss:
<img src="https://latex.codecogs.com/gif.latex?\arg\min_{\beta\in&space;R^d}&space;l_{\rho}(y^TX\beta)&space;+&space;\lambda&space;||\beta||_1&space;\qquad\mbox{s.t.}\qquad&space;C\beta=0" />
where l is defined as :
<img src="https://latex.codecogs.com/gif.latex?l_{\rho}(r)&space;=&space;\begin{cases}&space;(1-r)^2&space;&\mbox{if&space;}&space;\rho&space;\leq&space;r&space;\leq&space;1&space;\\&space;(1-\rho)(1+\rho-2r)&space;&\mbox{if&space;}&space;r&space;\leq&space;\rho&space;\\&space;0&space;&\mbox{if&space;}&space;r&space;\geq&space;1&space;\end{cases}" title="l_{\rho}(r) = \begin{cases} (1-r)^2 &\mbox{if } \rho \leq r \leq 1 \\ (1-\rho)(1+\rho-2r) &\mbox{if } r \leq \rho \\ 0 &\mbox{if } r \geq 1 \end{cases}" />
This formulation is similar to [C1] but uses the Huberized Square Hinge loss for robust classification
with (constrained) sparse β vector estimation.
## Getting started
#### Basic example
We begin with a basic example that shows how to run c-lasso on synthetic data. The c-lasso package includes
the routine ```random_data``` that allows you to generate problem instances using normally distributed data.
```python
n,d,d_nonzero,k,sigma =100,100,5,1,0.5
(X,C,y),sol = random_data(n,d,d_nonzero,k,sigma,zerosum=True)
```
This code snippet generates a problem instance with sparse β in dimension
d=100 (sparsity d_nonzero=5). The design matrix X comprises n=100 samples generated from an i.i.d standard normal
distribution. The dimension of the constraint matrix C is d x k matrix. The noise level is σ=0.5.
The input ```zerosum=True``` implies that C is the all-ones vector and Cβ=0. The n-dimensional outcome vector y
and the regression vector β is then generated to satisfy the given constraints.
Next we can define a default c-lasso problem instance with the generated data:
```python
problem = classo_problem(X,y,C)
```
You can look at the generated problem instance by typing:
```p
挣扎的蓝藻
- 粉丝: 8w+
- 资源: 15万+
会员权益专享
最新资源
- 关于node.js初体验. 如何搭建并完成一个简单的后台, 配合mongodb数据库, 实现信息的增删改查功能
- 第3章课后编程题答案(1).py
- 消费者行为分析-精准营销项目数据集
- update.jsp
- 关于node.js初体验. 如何搭建并完成一个简单的后台, 配合mongodb数据库, 实现信息的增删改查功能
- feimajsq_baidu_1.8.8.apk
- 个人免签支付源码下载-服务监控模块强大后台
- igb-uio源码,igb-uio源码,igb-uio源码
- 纵横支付Q币NDF抖音虎牙YY陪玩支付系统/游戏支付通道/腾讯游戏支付通道/多功能支付系统
- 3 数学4班 李炜健 sy32.py. 使用turtle绘制一个包含九个同心圆的靶盘.py
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈


