基于波士顿房价数据集的RMSE趋势分析报告_波士顿房价数据集资源-CSDN文库

共4个文件

ipynb：2个

docx：1个

csv：1个

版权申诉

5星 · 超过95%的资源 195 浏览量 2023-08-18 15:09:37 上传评论收藏 1.93MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

package

r000150.zip （4个子文件）

新建 DOCX 文档.docx 1.27MB

boston.csv 34KB

code.ipynb 524KB

folder

.ipynb_checkpoints

code-checkpoint.ipynb 524KB

2 Programming Question

Step 1: use pandas library to check the data in the dataset. Process incomplete

data point such as ’NaN’ or Null’. Briefly summarize the characteristics of

this dataset and guess which is the most relevant attribute for MEDV.

From the df.info() output, we can see that the data types of the columns are as follows:

 crim, zn, indus, nox, age, dis, ptratio, b, lstat, and medv are of type float64

 chas and rad are of type int64

 tax is of type int64

All the columns have 506 non-null values, which means that there are no missing values in the

dataset. This is a good sign as missing values can affect the accuracy of any model built using this

dataset.

From the sorted correlation matrix, we can see that the most relevant attribute for MEDV is

LSTAT with a correlation of -0.737663. This means that as the value of LSTAT increases, the

value of MEDV decreases. The second most relevant attribute is RM with a correlation of

0.695360, which means that as the value of RM increases, the value of MEDV increases.

These two attributes, LSTAT and RM, are likely to be the most important for predicting the

median value of owner-occupied homes in the area of Boston. Other attributes such as ZN, B, and

DIS also have some correlation with MEDV, but the correlation is not as strong as that of LSTAT

and RM.

Step 2: use seaborn library to visualize the dataset. Plot the MEDV distributions

over each attribute. Briefly analyze the characteristics of the attributes and

revise the assumption in Step 1 if necessary.

create a joint plot of LSTAT and MEDV using the seaborn library. The plot will show the

distribution of MEDV with LSTAT and also display a regression line to visualize the relationship

between the two variables. This plot can provide additional insight into the relationship between

LSTAT and MEDV and help us determine if LSTAT is indeed the most relevant attribute for

predicting MEDV.

The distribution of MEDV with LSTAT shows a strong negative linear relationship, meaning that

as the value of LSTAT increases, the value of MEDV decreases. This confirms that LSTAT is

indeed a strong predictor of MEDV.

From the joint plot, we can see that the data points form a clear downward trend and that the

regression line accurately represents this trend. There are a few outliers in the data, but they do not

greatly affect the overall relationship between MEDV and LSTAT. The distribution of LSTAT is

not perfectly normal, but it is close enough that it can still be used as a predictor of MEDV.

Step 3: use seaborn.heatmap function to plot the pairwise correlation on data.

Select the good attributes which are good indications of using as predictors.

Report your findings.

From the heatmap, we can select the good attributes that are good indications of being used as

predictors. A good attribute is one that has a strong correlation with the target variable MEDV and

a weak correlation with the other attributes.

Based on this analysis, the good attributes for predicting MEDV are RM, LSTAT, and to a lesser

extent, PTRATIO. These attributes have a strong correlation with MEDV and a weak correlation

with the other attributes. By using these attributes as predictors, we can build a more accurate

model for predicting the median value of owner-occupied homes in the area of Boston.

Step 4: use sklearn.preprocessing.MinMaxScaler function to scale the columns

you select in Step 3. Then use seaborn.regplot to plot the relevance of these

columns against MEDV with 95% confidence interval.

内容反馈

版权申诉

m0_68475747

2023-12-02

资源值得借鉴的内容很多，那就浅学一下吧，值得下载！

小夕Coding

粉丝: 5887
资源: 461

最新资源

资源上传下载、课程学习等过程中有任何疑问或建议，欢迎提出宝贵意见哦~我们会及时处理！点击此处反馈

feedback-tip