动态股票推荐的实用机器学习方法。IEEETrustCom2018。_JupyterNotebook_Python

共52个文件

xlsx：17个

png：14个

ds_store：5个

版权申诉

67 浏览量 2023-04-23 09:52:53 上传评论收藏 106.29MB ZIP 举报

在本项目中，"动态股票推荐的实用机器学习方法。IEEETrustCom2018。_Jupyter Notebook_Python_下载.zip" 提供了一个利用机器学习进行股票推荐的研究实例，这是在2018年IEEE TrustCom会议上发表的成果。Jupyter Notebook是数据科学家常用的交互式开发环境，它允许我们将代码、解释和可视化结果集成到一个文档中，便于理解和分享。Python作为主要编程语言，因其强大的数据处理和科学计算库（如Pandas、NumPy和Scikit-learn）而被广泛用于金融领域。 1. **动态股票推荐**：股票推荐系统基于实时市场数据，运用统计和预测模型，为投资者提供买入或卖出的建议。动态性意味着模型会根据市场的最新变化不断更新和调整预测。 2. **机器学习基础**：机器学习是人工智能的一个分支，通过分析大量数据来构建预测模型。在这个项目中，可能使用了监督学习（如线性回归、决策树、随机森林或支持向量机）或无监督学习（如聚类分析）来预测股票价格或走势。 3. **数据预处理**：在应用任何机器学习算法之前，数据通常需要清洗和转换。这包括处理缺失值、异常值，以及对数据进行归一化或标准化，以便不同特征在同一尺度上比较。 4. **特征工程**：在股票推荐中，特征可能包括历史股价、交易量、市盈率、公司财务指标、行业动态等。特征选择和构造对于模型的性能至关重要。 5. **模型训练与评估**：使用历史数据训练模型，然后使用交叉验证来评估其性能。常见的评估指标有准确率、精确率、召回率、F1分数以及回测中的收益指标，如夏普比率、信息比率等。 6. **Jupyter Notebook**：在Notebook中，研究人员可以清晰地展示数据导入、预处理、模型训练、结果可视化等步骤，使得其他研究者能轻松复现实验。 7. **Python库应用**：Pandas用于数据读取和清洗，NumPy处理数值计算，Matplotlib和Seaborn用于数据可视化，Scikit-learn则提供了各种机器学习算法的实现。 8. **模型优化**：可能涉及到参数调优（如网格搜索、随机搜索），或者使用集成学习（如梯度提升机、随机森林）提高预测性能。 9. **实时预测**：项目可能还涵盖了如何将训练好的模型部署到实时系统中，以应对快速变化的股市数据。 10. **风险控制**：除了预测，机器学习模型还需要考虑风险因素，如波动性、市场情绪和投资者偏好，以提供更稳健的推荐。这个项目涉及了从数据获取、预处理、特征工程、机器学习模型构建、模型评估到实际应用的全过程，对于理解如何在金融领域应用机器学习具有很高的参考价值。

资源推荐

资源详情

资源评论

收起资源包目录

动态股票推荐的实用机器学习方法。IEEETrustCom2018。_Jupyter Notebook_Python_下载.zip （52个子文件）

Machine-Learning-for-Stock-Recommendation-IEEE-2018-master

.DS_Store 6KB

.gitattributes 66B

fundamental_portfolio.ipynb 32KB

figs

efficient1.jpg 212KB

chart1_datasetPeriod.PNG 24KB

chart3_modelError.PNG 26KB

chart10_insample.PNG 8KB

chart8_PnL.png 87KB

chart9_TotalValue.png 83KB

rolling_windows.vsdx 89KB

dataperiod.png 80KB

chart2_rolling_windows.PNG 62KB

chart4_predictedReturn2.PNG 26KB

chart6_selectedStocks.PNG 28KB

transaction cost.PNG 24KB

chart11_overallPerformance.PNG 23KB

chart5_coefficient.PNG 34KB

chart7_efficient1.PNG 93KB

pnl1.jpg 269KB

chart4_predictedReturn1.PNG 26KB

fundamental_run_model.py 4KB

fundamental_back_testing.ipynb 180KB

README.md 6KB

code

.DS_Store 6KB

old_Rcode

.DS_Store 6KB

fundamental_ML_model.R 7KB

fundamental_select_stock.R 740B

fundamental_run_model.R 11KB

ml_model.py 26KB

Data

stocks_weight_table.xlsx 476KB

all_return_table.pickle 87.58MB

all_stocks_info.pickle 307KB

fundamental_final_table.xlsx 19.06MB

2-portfolio_data

.DS_Store 6KB

stocks_selected_total_user8.csv 318KB

equally_weighted_user8.xlsx 319KB

minimum_weighted_user8.xlsx 343KB

mean_weighted_user8.xlsx 341KB

1-sp500_adj_price.csv.zip 42.36MB

1-focasting_data

.DS_Store 6KB

sector50_clean.xlsx 299KB

sector35_clean.xlsx 1.71MB

sector55_clean.xlsx 1.01MB

sector10_clean.xlsx 1.35MB

sector20_clean.xlsx 1.91MB

sector25_clean.xlsx 2.45MB

sector15_clean.xlsx 1.32MB

sector60_clean.xlsx 544KB

sector40_clean.xlsx 1.35MB

sector45_clean.xlsx 2.66MB

sector30_clean.xlsx 1.22MB

1-spx_price.xlsx 105KB

# Dynamic-Stock-Recommendation-Machine_Learning ## An IEEE TrustCom 2018 Paper (http://www.cloud-conf.net/trustcom18/) Hongyang Yang, Xiao-Yang Liu, and Qingwei Wu. 2018. A practical machine learn-ing approach for dynamic stock recommendation. In IEEE TrustCom/BiDataSE,2018.1693–1697. Download from (https://ieeexplore.ieee.org/abstract/document/8456121) and (https://ssrn.com/abstract=3302088) ## Abstract: Stock recommendation is vital to investment companies and investors. However, no single stock selection strategy will always win while analysts may not have enough time to check all S&P 500 stocks (the Standard & Poor’s 500). In this paper, we propose a practical scheme that recommends stocks from S&P 500 using machine learning. Our basic idea is to buy and hold the top 20% stocks dynamically. First, we select representative stock indicators with good explanatory power. Secondly, we take five frequently used machine learning methods, including linear regression, ridge regression, stepwise regression, random forest and generalized boosted regression, to model stock indicators and quarterly log-return in a rolling window. Thirdly, we choose the model with the lowest Mean Square Error in each period to rank stocks. Finally, we test the selected stocks by conducting portfolio allocation methods such as equally weighted, mean- variance, and minimum-variance. Our empirical results show that the proposed scheme outperforms the long-only strategy on the S&P 500 index in terms of Sharpe ratio and cumulative returns. ## Index Term: Stock recommendation, fundamental value investing, machine learning, model selection, risk management ## Project summary： + We developed a practical approach to using machine-learning methods selecting S&P 500 stocks based on financial ratios (e.g., EPS, ROA, ROE, etc). Outperformed the S&P 500 index on out of sample data, achieved a Sharpe ratio of 0.5 (0.19 on SPX). + We performed feature selection by 11 GICS sectors based on a rolling window to choose the lowest MSE model among Linear Regression, Stepwise Regression, Regression with Ridge, Random Forest, and GBM. Applied a model ensemble method. <img src=figs/chart10_insample.PNG width="500"> <img src=figs/chart11_overallPerformance.PNG width="500"> ## Data: Retrieved from __WRDS (Wharton Research Data Services)__, Compustat Industrial [27 years daily and quarterly Data] <img src=figs/chart1_datasetPeriod.PNG width="500"> + __S&P 500 Fundamental Quarterly Data__ ([fundamental_final_table.xlsx](Data/fundamental_final_table.xlsx)) + Database: Compustat North America (Fundamentals Quarterly) and (Index Constituents) + Timeline: 27 years (1990-2017) + Tickers: 1193 stock (all historical S&P 500 component stocks) + Value: 20 financial ratios calculated from raw accouting report data + __S&P 500 Historical Component Stocks Adjusted Daily Price__ ([1-sp500_adj_price.csv.zip](Data/1-sp500_adj_price.csv.zip)) + Database: Compustat North America (Security Daily) + Timeline: 27 years (1990-2017) + Tickers: 1193 stock (all historical S&P 500 component stocks) + Value: Adjusted Daily Close Price + __S&P 500 Index Daily Price__ ([1-spx_price.xlsx](Data/1-spx_price.xlsx)) + Database: Yahoo Finance + Timeline: 27 years (1990-2017) + Tickers: SPX + Value: Adjusted Daily Close Price ## Code: ### __Focasting Model__: + __Input__: 11 Excel files of cleaned data about fundamental financial ratios (sector 10-Energy, sector 15-Materials, sector 20-Industrials, sector 25-Consumer Discretionary, sector 30-Consumer Staples, sector 35-Health Care, sector 40-Financials, sector 45-Information Technology, sector 50-Telecommunication Services, sector 55-Utilities, sector 60-Real Estate) + __Python Script__: 2 Scripts + [ml_model.py](code/ml_model.py): The forecasting function (cornerstone of this project) + [fundamental_run_model.py](fundamental_run_model.py): The main function to run the forecasting model ```shell python3 fundamental_run_model.py \ -sector_name sector10 \ -fundamental Data/fundamental_final_table.xlsx \ -sector Data/1-focasting_data/sector10_clean.xlsx ``` + __Old R Script__: 3 R Scripts + [fundamental_run_model.R](code/fundamental_run_model.R): The main function to run the forecasting model + [fundamental_ML_model.R](code/fundamental_ML_model.R): The forecasting function (cornerstone of this project) + [fundamental_select_stock.R](code/fundamental_select_stock.R): The function to select top 20% stocks in each sector + __Output__: [a CSV file](Data/2-portfolio_data/stocks_selected_total_user8.csv) includes __tic__: the stock name, __predicted_return__: predicted return of next quarter by our model, __trade_date__: the date to execute the trades ### __Portfolio Allocation__: + __Input__: 2 files + The [CSV file](Data/2-portfolio_data/stocks_selected_total_user8.csv) generated by forecasting model + The [adjusted close price data of S&P 500 stocks](Data/1-sp500_adj_price.csv.zip) to calculate covariance matrix + __Script__: [fundamental_portfolio.ipynb](fundamental_portfolio.ipynb) + __Output__: 3 Excel files each with the following 4 columns 1. __tic__: the stock name 2. __predicted_return__: predicted return of next quarter by our model 3. __weights__: the weights to trade 4. __trade_date__: the date to execute the trades ### __Back-testing Model__: + __Input__: 5 files + [equally_weighted](Data/2-portfolio_data/equally_weighted_user8.xlsx): equally-weighted portfolio (Portfolio Benchmark) + [mean_weighted](Data/2-portfolio_data/mean_weighted_user8.xlsx): mean-variance portfolio + [minimum_weighted](Data/2-portfolio_data/minimum_weighted_user8.xlsx): minimum-variance portfolio (our model) + [adjusted daily close price of S&P 500 stocks](Data/1-sp500_adj_price.csv.zip): to calcualte quarterly return + [SPX adjusted daily close price](Data/1-spx_price.xlsx): The Market Index (Overall Benchmark) + __Script__: 1 Python jupyter notebook Script + [fundamental_back_testing.ipynb](code/fundamental_back_testing.ipynb): The back-testing function + __Output__: 1. Quarterly return of our portfolio with transaction cost 2. Performance Evaluation: total return, annulized return and standard deviation, maximum drawdown, Sharpe ratio

评论收藏

内容反馈

版权申诉