PyPI官网下载|time_series_transform-0.0.5.tar.gz资源-CSDN文库

版权申诉

60 浏览量 2022-02-02 07:17:53 上传评论收藏 25KB GZ 举报

共24个文件

py：16个

txt：4个

pkg-info：2个

资源推荐

资源详情

资源评论

收起资源包目录

time_series_transform-0.0.5.tar.gz （24个子文件）

time_series_transform-0.0.5

PKG-INFO 13KB

time_series_transform.egg-info

PKG-INFO 13KB

requires.txt 148B

SOURCES.txt 1009B

top_level.txt 128B

dependency_links.txt 1B

setup.cfg 79B

time_series_transform

transform_core_api

time_series_transformer.py 13KB

util.py 5KB

__init__.py 525B

tensorflow_adopter.py 6KB

base.py 7KB

stock_transform

util.py 3KB

__init__.py 433B

stock_extractor.py 8KB

plot.py 9KB

base.py 8KB

test

test_stock_base.py 7KB

test_time_series_transform.py 8KB

test_tensorflow_adopter.py 1KB

__init__.py 0B

__init__.py 276B

setup.py 2KB

README.md 10KB

# Time Series Transformer [![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/) ![Build](https://github.com/allen-chiang/Time-Series-Transformer/workflows/Build/badge.svg) [![Build Status](https://dev.azure.com/kuanlunchiang/Time%20Series%20Transformer/_apis/build/status/allen-chiang.Time-Series-Transformer?branchName=master)](https://dev.azure.com/kuanlunchiang/Time%20Series%20Transformer/_build/latest?definitionId=3&branchName=master) [![Board Status](https://dev.azure.com/kuanlunchiang/4514fff7-ad24-4603-9373-c28efeaada71/b19741c8-3782-44ee-8a92-2805fbeb49f9/_apis/work/boardbadge/e0f238c1-381a-4686-a599-43174bf8237f)](https://dev.azure.com/kuanlunchiang/4514fff7-ad24-4603-9373-c28efeaada71/_boards/board/t/b19741c8-3782-44ee-8a92-2805fbeb49f9/Microsoft.RequirementCategory) Time Series Transformer is designed to handle time series data pre-processing for machine learning and deep learning. The general use case includes making lag/lead features, denoising data, and making various moving average (geometric and arithmetic ). Furthermore, this package provides extra features to different time series data area, including: 1. stock - extract data from different engine (only support yahoo finance in the current version) - plot interative charts for stock analsys - various technical indicator transformation i.e. MACD, stochastic oscillator, William %, relative strength index ## Pre-Processing for Machine Learning ```python import numpy as np import pandas as pd import time_series_transform as tst from time_series_transform.transform_core_api.time_series_transformer import Pandas_Time_Series_Panel_Dataset ``` This example uses the stock api to extract stock googl for one year. Subsequently, it shows how to create multiple lags data as predictors and lead data as target variable. ```python stock_ext = tst.Stock_Extractor('googl','yahoo') df = stock_ext.get_stock_period(period='1y').df print(df.head()) ``` Date Open High Low Close Volume Dividends \ 0 2019-08-19 1191.83 1209.39 1190.40 1200.44 1222500 0 1 2019-08-20 1195.35 1198.00 1183.05 1183.53 1010300 0 2 2019-08-21 1195.82 1200.56 1187.92 1191.58 707600 0 3 2019-08-22 1193.80 1198.78 1178.91 1191.52 867600 0 4 2019-08-23 1185.17 1195.67 1150.00 1153.58 1812700 0 Stock Splits 0 0 1 0 2 0 3 0 4 0 Pandas_Time_Series_Panel_Dataset takes dataframe as input. make_slide_window is the function used for making multiple lagging data, and make_lead_column is to create lead data for specific column. indexCol is the time series column, and this column is used for sorting the dataframe before lag/lead feature generations. ```python panel_transform = Pandas_Time_Series_Panel_Dataset(df) panel_transform.make_slide_window( indexCol= 'Date',windowSize = 2,colList = ['Open','High','Low','Close'] ).make_lead_column(indexCol = 'Date',baseCol = 'Open',leadNum=1) panel_transform.df.info() ``` <class 'pandas.core.frame.DataFrame'> Int64Index: 252 entries, 251 to 0 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 252 non-null object 1 Open 252 non-null float64 2 High 252 non-null float64 3 Low 252 non-null float64 4 Close 252 non-null float64 5 Volume 252 non-null int64 6 Dividends 252 non-null int64 7 Stock Splits 252 non-null int64 8 Open_lag1 251 non-null float64 9 Open_lag2 250 non-null float64 10 High_lag1 251 non-null float64 11 High_lag2 250 non-null float64 12 Low_lag1 251 non-null float64 13 Low_lag2 250 non-null float64 14 Close_lag1 251 non-null float64 15 Close_lag2 250 non-null float64 16 Open_lead1 251 non-null float64 dtypes: float64(13), int64(3), object(1) memory usage: 35.4+ KB To obtain the data, the df attribute of Pandas_Time_Series_Panel_Dataset can be retrieved. ```python lead_lag_stock = panel_transform.df print(lead_lag_stock[['Date','Open','Open_lag1','Open_lead1']].sort_values('Date').head()) ``` Date Open Open_lag1 Open_lead1 0 2019-08-19 1191.83 NaN 1195.35 1 2019-08-20 1195.35 1191.83 1195.82 2 2019-08-21 1195.82 1195.35 1193.80 3 2019-08-22 1193.80 1195.82 1185.17 4 2019-08-23 1185.17 1193.80 1159.45 Sometimes, there cuold be different categories or item in the dataset. Pandas_Time_Series_Panel_Dataset the groupby parameter can serve the advanced data manipulation for lead and lag data making. The following example is going to construct a dataframe with multiple stocks, and each stock can be represented as one item. ```python df = tst.Portfolio_Extractor(['googl','aapl'],'yahoo').get_portfolio_period('1y').get_portfolio_dataFrame() print(df.head()) ``` Date Open High Low Close Volume Dividends \ 0 2019-08-19 1191.83 1209.39 1190.40 1200.44 1222500 0.0 1 2019-08-20 1195.35 1198.00 1183.05 1183.53 1010300 0.0 2 2019-08-21 1195.82 1200.56 1187.92 1191.58 707600 0.0 3 2019-08-22 1193.80 1198.78 1178.91 1191.52 867600 0.0 4 2019-08-23 1185.17 1195.67 1150.00 1153.58 1812700 0.0 Stock Splits symbol 0 0 googl 1 0 googl 2 0 googl 3 0 googl 4 0 googl ```python panel_transform = Pandas_Time_Series_Panel_Dataset(df) panel_transform.make_slide_window( indexCol= 'Date',windowSize = 2,colList = ['Open','High','Low','Close'],groupby='symbol' ).make_lead_column(indexCol = 'Date',baseCol = 'Open',leadNum=1,groupby='symbol') panel_transform.df.info() ``` <class 'pandas.core.frame.DataFrame'> Int64Index: 504 entries, 251 to 0 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 504 non-null object 1 Open 504 non-null float64 2 High 504 non-null float64 3 Low 504 non-null float64 4 Close 504 non-null float64 5 Volume 504 non-null int64 6 Dividends 504 non-null float64 7 Stock Splits 504 non-null int64 8 symbol 504 non-null object 9 Open_lag1 502 non-null float64 10 Open_lag2 500 non-null float64 11 High_lag1 502 non-null float64 12 High_lag2 500 non-null float64 13 Low_lag1 502 non-null float64 14 Low_lag2 500 non-null float64 15 Close_lag1 502 non-null float64 16 Close_lag2 500 non-null float64 17 Open_lead1 502 non-null float64 dtypes: float64(14), int64(2), object(2) memory usage: 74.8+ KB ```python lead_lag_stock = panel_transform.df print(lead_lag_stock[['Date','symbol','Open','Open_lag1','Open_lead1']].sort_values('Date').head()) ``` Date symbol Open Open_lag1 Open_lead1 0 2019-08-19 aapl 208.55 NaN 208.81 0 2019-08-19 googl 1191.83 NaN 1195.35 1 2019-08-20 aapl 208.81 208.55 210.90 1 2019-08-20 googl 1195.35 1191.83 1195.82 2 2019-08-21 googl 1195.82 1195.35 1193.80 Note: Some other use cases could be inventory. Inventory data is usually associate with multiple categories s

评论收藏

内容反馈

版权申诉