# synthetic_sample
synthetic_sample is a data generation application for producing synthetic sales transactions over a time series, including associated shipment and product data
## Usage
Sample data is generated by running `synthetic_sample_generator.py` and using
```
python3 synthetic_sample_generator.py --json_filepath JSON_FILEPATH --output_directory OUTPUT_DIRECTORY --create_records
```
where
- `json_filepath` is the filepath to the input JSON (see Request Requirements below)
- `output_directory` is the directory to save output data to, in CSV format
- `create_records` is a flag that indicates that raw record data should also be saved to the output directory. Running without this
flag results in only aggregate output data
## Request Requirements
The required input format is a JSON with the following fields:
- Required:
- `start_date`: date in the first period to include, e.g. if 2020/02/15 is provided, the full week of that date will be included
- `end_date`: date in the last period to include, e.g. if 2020/02/15 is provided, the full week of that date will be included
- `annual_growth_factor`: year over year growth factor, 10% growth corresponds to a value of 1.1
- `period_type`: indicates what type of curve to generate, supports "month" or "week"
- at least one of
- `total_sales`: total number of sales for the period
- `total_packages`: total number of packages shipped for the period
- `total_quantity`: total number of items sold for the period
- `annual_sales`: annualized number of sales for the period
- `annual_packages`: annualized number of packages shipped for the period
- `annual_quantity`: annualized number of items sold for the period
- `curve_definition`: Definition of the curve to create, either as a list of dictionaries with each feature or as a
string indicating the name of the default curve to use.
- If a list of dictionaries is provided, they must adhere to the following structure
- Required Keys:
- `anchor_type`: Type of annual anchor used to define the feature
- Possible Values: "holiday", "week_of_year", "month_of_year", "day_of_year"
- `anchor_point`: Annual point to define the feature
- Possible values: (string) - holiday name, (int) - week or day of year
- `anchor_value`: Cumulative percent of total sales (0.0-1.0) completed by the end of the period of the anchor_point
- Optional Keys:
- `relative_start`: Number of periods before the anchor_point to define a relative cumulative percent value
- `start_value`: Cumulative percent of total sales (0.0-1.0) completed by the end of the period indicated by relative_start
- `relative_end`: Number of periods before the anchor_point to define a relative cumulative percent value
- `end_value`: Cumulative percent of total sales (0.0-1.0) completed by the end of the period indicated by relative_end
- If a string is provided, it must correspond to a default in `synthetic_sample/defaults/curves/{period_type}/{curve_definition}.json`
- Initial set of available curves are
- `modern_brand`
- `modern_distributor`
- `traditional_brand`
- `traditional_distributor`
- Optional:
- `default_type`: string indicating the type of defaults to use, these can be found as JSON in `synthetic_sample/defaults/lib/`
- `product_distribution`: dictionary of product labels (i.e. SKUs) and their relative weights
- `week_distribution`: dictionary of weeks of the month (where 1 is the first week and -1 is the last) and their relative weights
- `weekday_distribution`: dictionary of weekdays (where 0 is Monday and 6 is Sunday) and their relative weights
- `seasonal_distribution`: dictionary of seasons ("Q1"..."Q4") and their relative weights
- `modifiers`: list of any modifiers to apply.
- "covid": Applies a 33% boost to all periods between 2020/3/26 and 2021/9/1
### Example:
The below request will generate data for each month starting 2018-06 and ending 2020-12.
```JSON
{
"start_date": "2018-06-01",
"end_date": "2020-12-31",
"total_sales": 1000000,
"total_packages": 1500000,
"total_quantity": 6000000,
"annual_growth_factor": 1.15,
"product_distribution": {
"AAA-01" : 1,
"AAA-02" : 2.5,
"AAA-11" : 5.6,
"BBB-10" : 0.5,
"BBB-20" : 1
},
"week_distribution": {
"1": 0.1,
"-1": 0.5
},
"weekday_distribution": {
"0": 0.0,
"1": 0.0,
"2": 0.0,
"3": 0.0,
"4": 0.0,
"5": 2.0,
"6": 1.0
},
"seasonal_distribution": {
"Q1": 1,
"Q2": 1,
"Q3": 1,
"Q4": 1
},
"period_type": "month",
"curve_definition": [
{
"anchor_type": "month_of_year",
"anchor_point": 1,
"anchor_value": 0.0424
},
{
"anchor_type": "month_of_year",
"anchor_point": 2,
"anchor_value": 0.103
},
{
"anchor_type": "month_of_year",
"anchor_point": 3,
"anchor_value": 0.203
},
{
"anchor_type": "month_of_year",
"anchor_point": 4,
"anchor_value": 0.3152
},
{
"anchor_type": "month_of_year",
"anchor_point": 5,
"anchor_value": 0.4139
},
{
"anchor_type": "month_of_year",
"anchor_point": 6,
"anchor_value": 0.4776
},
{
"anchor_type": "month_of_year",
"anchor_point": 7,
"anchor_value": 0.5321
},
{
"anchor_type": "month_of_year",
"anchor_point": 8,
"anchor_value": 0.5897
},
{
"anchor_type": "month_of_year",
"anchor_point": 9,
"anchor_value": 0.6715
},
{
"anchor_type": "month_of_year",
"anchor_point": 10,
"anchor_value": 0.7836
},
{
"anchor_type": "month_of_year",
"anchor_point": 11,
"anchor_value": 0.9018
},
{
"anchor_type": "month_of_year",
"anchor_point": 12,
"anchor_value": 1.0
}
]
}
```
PyPI 官网下载 | synthetic_sample-1.0.1.tar.gz
版权申诉
142 浏览量
2022-01-16
15:27:44
上传
评论
收藏 19KB GZ 举报
挣扎的蓝藻
- 粉丝: 13w+
- 资源: 15万+
最新资源
- 基于matlab实现电磁优化计算功能,进行线型规划优化电磁设计.rar
- 基于matlab实现带精英策略的非支配排序遗传算法matlab 源码.rar
- 基于matlab实现差分进化算法,最新的用于替代遗传算法,是以后的主要发展方法.rar
- VSCode配置c/c++环境教程.md
- 基于matlab实现标准合作型协同进化遗传算法matlab源程序
- 七下人教.zip
- 基于matlab实现本份代码能对图像进行gabor滤波处理,结合指纹方向图以及指纹沟壑频率特性,对指纹图像进行增强.rar
- 基于matlab实现RBM神经网络实现了手写数字体识别的GUI程序.rar
- 基于matlab实现蝙蝠算法优化相关向量机建模对数据进行建模和预测.rar
- 基于matlab实现编写的禁忌搜索算法,解决了TSP问题,对初学者有重要的参考价值.rar
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈