# Poverty Prediction by Selected Remote Sensing CNN Features
## Getting Started
For this project, we deliver a research on the poverty prediction using remote sensing CNN features. By carefully choosing features from the 4096 features provided by the CNN, we train a model that can predict wealth indices better than using the nightlights intensities. We conduct our research in 2 parts, feature selection and model training. We use correlation-based, Lasso-based and forward search methods to select features. And we use linear regression, ridge regression, Lasso regression and XGBoost to train our model and compare the performance. You can use the code we provide to go through this process.
### Prerequisites
* The feature selection methods and basic regression models are developed using built-in functions provided by MATLAB.
* "all_countries_dhs.mat" is the file containing all the training data and training sets.
* To run our XGBoost code and VAE code in Python, you need:
- Python 2.7
- [XGBoost](http://xgboost.readthedocs.io/en/latest/build.html)
- [scikit-learn](http://scikit-learn.org/stable/install.html)
- [scipy kit](https://www.scipy.org/install.html)
- [keras](https://keras.io/#installation)
### Installing
Please refer to the link above on how to install the dependencies.
For MacOS, if you have pip installed on your computer, you can do:
```
pip install xgboost
pip install -U scikit-learn
python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose
pip install keras
```
You should be able to run our project code after installing the dependencies. Note you might need to install Tensorflow for Keras
## Running the tests
All the code in this project can be run easily without compiling.
### Feature selection and regression models (Matlab)
We have separated MATLAB files implementing all the feature selections and regression models. We will start by extracting useful CNN features from the provided dataset.
* Run "PCA_select.m", "correlation_feature_selection.m", "feature_forward_search.m" to get selected features with these three methods. In "correlation_feature_selection.m", it may be necessary to change the initial array length to get more selected features. For forward search, it takes a long time. Possible outcome is "feature_forward_search_314.mat", which contains 314 features (numbered from 4-4099). Features appear in the order they are selected. Run "All_lasso.m" to see features selected by Lasso regression. This takes a long time. Possible outcome is "Features_from_Super_Big_Lasso_67-33.mat", in which the selected features are indexed from 4-4099. (First 3 features are not used.)
* After we obtained the selected features, we can use them to train different regression models:
- Linear regression:
Enter subdirectory "Linear reg". "All_linearReg.m" is running linear regression on all features, which will lead to overfitting. "Nightlights_linear.m" uses only nightlight feature, which is our baseline. "Select_linearReg.m" is basically the same as "All_linearReg.m", except that it requires the user overwrite the "select_features" variable (otherwise it cannot run).
- Ridge regression:
Enter subdirectory "ridge reg". This is similar to the linear regression directory, except that ridge regression is used instead of linear regression. The user is expected to overwrite "select_featuers" variable in "select_ridge.m".
- Lasso regression:
Enter subdirectory "Lasso reg". This is similar to the linear regression directory, except that lasso regression is used instead of linear regression. The user is expected to overwrite "select_featuers" variable in "select_lasso.m". The user should change arguments to the lasso() function to speed up.
* To generate intersection and union features:
To do this part, the user is expected to run "correlation_feature_selection.m" first to generate a feature set of desired length, then use standard MATLAB functions to compute unions and intersections of the correlation-based feature set, the Lasso feature set and the forward search feature set. The latter two can be found in "feature_forward_search_314.mat" and "Features_from_Super_Big_Lasso_67-33.mat". As a convenience, the union and intersection used in our experiment is included in "intersection_union.mat". The user can directly put variables "union" and "in2" to the regression code to overwrite "select_features" as instructed above.
### XGBoost (Python)
For the XGBoost part, we have tuned the parameters for the best performance. And for the default setting, we use the features from the forward search method. You can see a list of parameters we choose and the train/test R2 score.
If you want to see results from other methods, you can simply change the boolean on line 48-51. We provide 4 different set of features to test.
```
python testxgboost.py
> depth: 2 child: 9 estimator: 7480 aplha: 1 lambda: 2 subsample: 0.5 gamma: 1 train_score: 0.874769893898 test_score: 0.509435929336
```
### VAE (Python)
You can simply run the command to see the result.
```
python VAEdhs.py
```
## Deployment
There might be version problems with Tensorflow, Keras and XGBoost. If you are having trouble compiling the code, you might need to use an older version of the library.
## Authors
* **Zhihan Jiang** - *XGBoost and documentation*
* **Yicheng Li** - *Feature Selection and Regression on MATLAB*
* **Zhaozhuo Xu** - *CNN Data and VAE*
## Acknowledgments
* We used some of the skeleton or demo code from the official site.
* This project is inspired by Jean, N.; Combining satellite imagery and machine learning to predict poverty. Science 353(6301):790–794.
* We love CS229
没有合适的资源?快使用搜索试试~ 我知道了~
xgboost代码回归matlab-CS229_Project:通过遥感CNN功能预测贫困
共29个文件
mat:12个
m:11个
py:2个
需积分: 48 52 下载量 42 浏览量
2021-05-26
22:40:45
上传
评论 3
收藏 112.99MB ZIP 举报
温馨提示
xgboost代码回归matlab 通过遥感CNN功能预测贫困 入门 对于此项目,我们提供了使用遥感CNN功能进行贫困预测的研究。 通过从CNN提供的4096个特征中精心选择特征,我们训练了一个模型,该模型可以比使用夜灯强度更好地预测财富指数。 我们分两部分进行研究,即特征选择和模型训练。 我们使用基于相关性,基于套索的和正向搜索方法来选择特征。 我们使用线性回归,岭回归,Lasso回归和XGBoost来训练我们的模型并比较性能。 您可以使用我们提供的代码来完成此过程。 先决条件 使用MATLAB提供的内置函数来开发特征选择方法和基本回归模型。 “ all_countries_dhs.mat”是包含所有训练数据和训练集的文件。 要在Python中运行XGBoost代码和VAE代码,您需要: Python 2.7 正在安装 请参考上面的链接,了解如何安装依赖项。 对于MacOS,如果您在计算机上安装了pip,则可以执行以下操作: pip install xgboost pip install -U scikit-learn python -m pip install --user num
资源推荐
资源详情
资源评论
收起资源包目录
CS229_Project-master.zip (29个子文件)
CS229_Project-master
testxgboost.py 7KB
feature_lasso_ultimate.mat 730B
.gitattributes 66B
load_data.m 511B
Feature_forward_314.mat 773B
features_correlation.mat 217B
NGdhs.mat 25.16MB
Regression Code
Linear reg
All_linearReg.m 543B
Nightlights_linear.m 538B
Select_linearReg.m 1000B
Ridge reg
Select_Ridge.m 987B
All_Ridge.m 537B
Lasso reg
Select_Lasso.m 1KB
All_Lasso.m 787B
intersect_union.mat 4KB
Milestone.pdf 478KB
VAEdhs.py 6KB
PCA_select.txt 848B
Features_from_Super_Big_Lasso_67-33.mat 730B
all_countries_dhs.mat 87.41MB
README.md 6KB
feature_lasso_all.mat 420B
All_Lasso.m 787B
Feature_forward_search_314.mat 773B
Feature_forward.mat 247B
correlation_feature_selection.m 401B
Feature_Forward_Search.m 1KB
feature_lasso.mat 446B
feature_lasso_all66.mat 707B
共 29 条
- 1
资源评论
weixin_38687199
- 粉丝: 4
- 资源: 924
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功