# data-mining-R
从网站爬取口红销售数据,分析影响销售数据的重要因素以及根据销售因素建模预测其销售量。
本文先将数据进行预处理得到实验数据,
然后着重分析朴素贝叶斯判别分析算法、 AdaBoost 算法以及随机森林算法在口红销量预测中的效果, 并在随机森林算法中进行模型优化。
通过实验结果表明总评价数、 价格和描述分这三个因素对销售量的影响较大,
对三个算法对比分析得出随机森林算法预测错误率最低,有较好的预测效果。
Crawling lipstick sales data from the website,
analyzing the important factors affecting sales data and predicting sales volume according to sales factors modeling.
In this paper,
we first preprocess the data to get the experimental data,
and then focus on the analysis of Naive Bayesian Discriminant Analysis (Naive Bayesian Discriminant Analysis),
AdaBoost algorithm and random forest algorithm in lipstick sales forecasting effect,
and in the random forest algorithm to optimize the model.
The experimental results show that the total evaluation number,
price and description of these three factors have a greater impact on sales.
The comparison of the three algorithms shows that the random forest algorithm has the lowest prediction error rate,
and has a better prediction effect.
main.R文件是 代码的源文件。