# c-lasso: a Python package for constrained sparse regression and classification
=========
c-lasso is a Python package that enables sparse and robust linear regression and classification with linear equality
constraints on the model parameters. The forward model is assumed to be:
<img src="https://latex.codecogs.com/gif.latex?y=X\beta+\sigma\epsilon\qquad\text{s.t.}\qquad&space;C\beta=0" title="y=X\beta+\sigma\epsilon\qquad\text{s.t.}\qquad C\beta=0" />
Here, y and X are given outcome and predictor data. The vector y can be continuous (for regression) or binary (for classification). C is a general constraint matrix. The vector β comprises the unknown coefficients and σ an
unknown scale.
The package handles several different estimators for inferring β (and σ), including
the constrained Lasso, the constrained scaled Lasso, and sparse Huber M-estimation with linear equality constraints.
Several different algorithmic strategies, including path and proximal splitting algorithms, are implemented to solve
the underlying convex optimization problems.
We also include two model selection strategies for determining the sparsity of the model parameters: k-fold cross-validation and stability selection.
This package is intended to fill the gap between popular python tools such as [scikit-learn](https://scikit-learn.org/stable/) which CANNOT solve sparse constrained problems and general-purpose optimization solvers that do not scale well for the considered problems.
Below we show several use cases of the package, including an application of sparse *log-contrast*
regression tasks for *compositional* microbiome data.
The code builds on results from several papers which can be found in the [References](#references).
## Table of Contents
* [Installation](#installation)
* [Regression and classification problems](#regression-and-classification-problems)
* [Getting started](#getting-started)
* [Log-contrast regression for microbiome data](#log-contrast-regression-for-microbiome-data)
* [Optimization schemes](#optimization-schemes)
* [Structure of the code](#structure-of-the-code)
* [References](#references)
## Installation
c-lasso is available on pip. You can install the package
in the shell using
```shell
pip install c_lasso
```
To use the c-lasso package in Python, type
```python
from classo import *
```
The c-lasso package depends on several standard Python packages.
The dependencies are included in the package. Those are, namely :
`numpy` ;
`matplotlib` ;
`scipy` ;
`pandas` ;
`h5py` .
## Regression and classification problems
The c-lasso package can solve six different types of estimation problems:
four regression-type and two classification-type formulations.
#### [R1] Standard constrained Lasso regression:
<img src="https://latex.codecogs.com/gif.latex?\arg\min_{\beta\in&space;R^d}&space;||&space;X\beta-y&space;||^2&space;+&space;\lambda&space;||\beta||_1&space;\qquad\mbox{s.t.}\qquad&space;C\beta=0" />
This is the standard Lasso problem with linear equality constraints on the β vector.
The objective function combines Least-Squares for model fitting with l1 penalty for sparsity.
#### [R2] Contrained sparse Huber regression:
<img src="https://latex.codecogs.com/gif.latex?\arg\min_{\beta\in&space;R^d}&space;h_{\rho}(X\beta-y&space;)&space;+&space;\lambda&space;||\beta||_1&space;\qquad\mbox{s.t.}\qquad&space;C\beta=0" />
This regression problem uses the [Huber loss](https://en.wikipedia.org/wiki/Huber_loss) as objective function
for robust model fitting with l1 and linear equality constraints on the β vector. The parameter ρ=1.345.
#### [R3] Contrained scaled Lasso regression:
<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d,&space;\sigma&space;>&space;0}&space;\frac{||&space;X\beta&space;-&space;y||^2}{\sigma}&space;+&space;\frac{n}{2}&space;\sigma+&space;\lambda&space;||\beta||_1&space;\qquad&space;\mbox{s.t.}&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d, \sigma > 0} \frac{|| X\beta - y||^2}{\sigma} + \frac{n}{2} \sigma+ \lambda ||\beta||_1 \qquad \mbox{s.t.} \qquad C\beta = 0" />
This formulation is similar to [R1] but allows for joint estimation of the (constrained) β vector and
the standard deviation σ in a concomitant fashion (see [References](#references) [4,5] for further info).
This is the default problem formulation in c-lasso.
#### [R4] Contrained sparse Huber regression with concomitant scale estimation:
<img src="https://latex.codecogs.com/gif.latex?\arg&space;\min_{\beta&space;\in&space;\mathbb{R}^d,&space;\sigma&space;>&space;0}&space;\left(&space;h_{\rho}&space;\left(&space;\frac{&space;X\beta&space;-&space;y}{\sigma}&space;\right)+&space;n&space;\right)&space;\sigma+&space;\lambda&space;||\beta||_1&space;\qquad&space;\mbox{s.t.}&space;\qquad&space;C\beta&space;=&space;0" title="\arg \min_{\beta \in \mathbb{R}^d, \sigma > 0} \left( h_{\rho} \left( \frac{ X\beta - y}{\sigma} \right)+ n \right) \sigma+ \lambda ||\beta||_1 \qquad \mbox{s.t.} \qquad C\beta = 0" />
This formulation combines [R2] and [R3] to allow robust joint estimation of the (constrained) β vector and
the scale σ in a concomitant fashion (see [References](#references) [4,5] for further info).
#### [C1] Contrained sparse classification with Square Hinge loss:
<img src="https://latex.codecogs.com/gif.latex?\arg\min_{\beta\in&space;R^d}&space;l(y^TX\beta)&space;+&space;\lambda&space;||\beta||_1&space;\qquad\mbox{s.t.}\qquad&space;C\beta=0" />
where l is defined as :
<img src="https://latex.codecogs.com/gif.latex?l(r)=\max(1-r,0)^2" />
This formulation is similar to [R1] but adapted for classification tasks using the Square Hinge loss
with (constrained) sparse β vector estimation.
#### [C2] Contrained sparse classification with Huberized Square Hinge loss:
<img src="https://latex.codecogs.com/gif.latex?\arg\min_{\beta\in&space;R^d}&space;l_{\rho}(y^TX\beta)&space;+&space;\lambda&space;||\beta||_1&space;\qquad\mbox{s.t.}\qquad&space;C\beta=0" />
where l is defined as :
<img src="https://latex.codecogs.com/gif.latex?l_{\rho}(r)&space;=&space;\begin{cases}&space;(1-r)^2&space;&\mbox{if&space;}&space;\rho&space;\leq&space;r&space;\leq&space;1&space;\\&space;(1-\rho)(1+\rho-2r)&space;&\mbox{if&space;}&space;r&space;\leq&space;\rho&space;\\&space;0&space;&\mbox{if&space;}&space;r&space;\geq&space;1&space;\end{cases}" title="l_{\rho}(r) = \begin{cases} (1-r)^2 &\mbox{if } \rho \leq r \leq 1 \\ (1-\rho)(1+\rho-2r) &\mbox{if } r \leq \rho \\ 0 &\mbox{if } r \geq 1 \end{cases}" />
This formulation is similar to [C1] but uses the Huberized Square Hinge loss for robust classification
with (constrained) sparse β vector estimation.
## Getting started
#### Basic example
We begin with a basic example that shows how to run c-lasso on synthetic data. The c-lasso package includes
the routine ```random_data``` that allows you to generate problem instances using normally distributed data.
```python
n,d,d_nonzero,k,sigma =100,100,5,1,0.5
(X,C,y),sol = random_data(n,d,d_nonzero,k,sigma,zerosum=True)
```
This code snippet generates a problem instance with sparse β in dimension
d=100 (sparsity d_nonzero=5). The design matrix X comprises n=100 samples generated from an i.i.d standard normal
distribution. The dimension of the constraint matrix C is d x k matrix. The noise level is σ=0.5.
The input ```zerosum=True``` implies that C is the all-ones vector and Cβ=0. The n-dimensional outcome vector y
and the regression vector β is then generated to satisfy the given constraints.
Next we can define a default c-lasso problem instance with the generated data:
```python
problem = classo_problem(X,y,C)
```
You can look at the generated problem instance by typing:
```p
没有合适的资源?快使用搜索试试~ 我知道了~
PyPI 官网下载 | c_lasso-0.3.0.30.tar.gz
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 116 浏览量
2022-01-11
17:18:50
上传
评论
收藏 5.18MB GZ 举报
温馨提示
共59个文件
py:18个
csv:9个
mat:7个
资源来自pypi官网。 资源全名:c_lasso-0.3.0.30.tar.gz
资源推荐
资源详情
资源评论
收起资源包目录
c_lasso-0.3.0.30.tar.gz (59个子文件)
c_lasso-0.3.0.30
setup.py 697B
MANIFEST.in 128B
setup.cfg 38B
examples
Testing parameters.html 2.39MB
.ipynb_checkpoints
Testing parameters-checkpoint.ipynb 2.21MB
Notebook-checkpoint.ipynb 164KB
debug-checkpoint.ipynb 4KB
debug.ipynb 6KB
data
FatData.csv 782B
CaloriData.csv 790B
GeneraFilteredPhylo.csv 4KB
COMBODataForLeo.mat 29KB
CFiltered.mat 1KB
logX_check.xlsx 49KB
GeneraPhylo.csv 8KB
logX_pHdata.mat 9KB
otu_table.csv 31KB
BMI.csv 664B
.DS_Store 6KB
GeneraFilteredCounts.csv 11KB
pHData.mat 10KB
GeneraPhylo.mat 258KB
indoxylSulfate.csv 726B
COMBO_README.txt 2KB
GeneraCounts.csv 19KB
taxTablepHData.mat 4KB
BMIData_subset.mat 689KB
inline-supplementary-material-2.xlsx 56KB
matchOTU_IDS.m 2KB
VariableLabelsCOMBO.xlsx 12KB
example2.py 2KB
Notebook.ipynb 164KB
.DS_Store 10KB
exampleJOSS.py 1KB
Testing parameters.ipynb 2.21MB
example1.py 438B
example_PH.py 894B
example3.py 422B
example_COMBO.py 2KB
README.md 21KB
PKG-INFO 15KB
README-for-pypi.md 13KB
c_lasso.egg-info
top_level.txt 7B
SOURCES.txt 2KB
PKG-INFO 15KB
not-zip-safe 1B
dependency_links.txt 1B
requires.txt 35B
classo
path_alg.py 20KB
solve_R4.py 9KB
__init__.py 291B
cross_validation.py 4KB
compact_func.py 6KB
misc_functions.py 13KB
stability_selection.py 5KB
solve_R2.py 9KB
solve_R1.py 9KB
solve_R3.py 10KB
solver.py 45KB
共 59 条
- 1
资源评论
挣扎的蓝藻
- 粉丝: 12w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功