# Pandas pipeline in graphviz
Python package to build a nice explanative schema of a data processing pipeline in pandas.
It's heavily inspired by [dask's `.visualize` method](https://docs.dask.org/en/latest/graphviz.html), but improved with 2 useful features:
- visualize columns names in data nodes
- highlight created columns at each task
Here is an example from the [examples folder](examples):
![](examples/03_apply_pandas_pipeline_decorator.png)
## Installation
### Pip
Install with pip:
```bash
$ pip install pandas-pipeline-graphviz
```
### Manual installation
Install manually:
- git clone
- use `python setup.py`
## Usage
### Disclaimer
#### ⚠️ WARNING — it's a hack!
There are no reliable methods in python to get variables names, either as input or as output. The methods used in this package are quite _hacky_, as discussed in this [stackoverflow thread](https://stackoverflow.com/questions/2749796/how-to-get-the-original-variable-name-of-variable-passed-to-a-function).
To build the graph, this package makes use of:
- `globals()` **to get the names of input dataframes**, doing a comparison between the input dataframes and all the variables available in the global variables.
- `inspect.stack()` **to get the name of the output dataframe**, gathering the code lines calling the function and parsing it to find the output. Currently it supports only single-output transformations.
Both methods should be considered as experimental and the behavior of the decorator is expected to break easily if it's not used as presented in the [examples](examples).
#### Conditions for use
- do not use several decorators on your function, only this decorator, otherwise it will break the output dataframe name detection through `inspect.stack()`
- use only single output transformation functions, i.e. functions which return only 1 dataframe.
### Examples
See [examples folder](examples) in the repository.
没有合适的资源?快使用搜索试试~ 我知道了~
pandas-pipeline-graphviz-0.1.5.tar.gz
需积分: 1 0 下载量 62 浏览量
2024-03-07
12:45:39
上传
评论
收藏 4KB GZ 举报
温馨提示
Python库是一组预先编写的代码模块,旨在帮助开发者实现特定的编程任务,无需从零开始编写代码。这些库可以包括各种功能,如数学运算、文件操作、数据分析和网络编程等。Python社区提供了大量的第三方库,如NumPy、Pandas和Requests,极大地丰富了Python的应用领域,从数据科学到Web开发。Python库的丰富性是Python成为最受欢迎的编程语言之一的关键原因之一。这些库不仅为初学者提供了快速入门的途径,而且为经验丰富的开发者提供了强大的工具,以高效率、高质量地完成复杂任务。例如,Matplotlib和Seaborn库在数据可视化领域内非常受欢迎,它们提供了广泛的工具和技术,可以创建高度定制化的图表和图形,帮助数据科学家和分析师在数据探索和结果展示中更有效地传达信息。
资源推荐
资源详情
资源评论
收起资源包目录
pandas-pipeline-graphviz-0.1.5.tar.gz (12个子文件)
pandas-pipeline-graphviz-0.1.5
setup.py 831B
pandas_pipeline_graphviz.egg-info
SOURCES.txt 383B
top_level.txt 25B
PKG-INFO 3KB
requires.txt 40B
dependency_links.txt 1B
PKG-INFO 3KB
setup.cfg 38B
README.md 2KB
pandas_pipeline_graphviz
__init__.py 185B
logging_decorator.py 4KB
pipeline_graph.py 3KB
共 12 条
- 1
资源评论
程序员Chino的日记
- 粉丝: 3684
- 资源: 5万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功