# PandasAI ð¼
[![Release](https://img.shields.io/pypi/v/pandasai?label=Release&style=flat-square)](https://pypi.org/project/pandasai/)
[![CI](https://github.com/gventuri/pandas-ai/actions/workflows/ci.yml/badge.svg)](https://github.com/gventuri/pandas-ai/actions/workflows/ci.yml/badge.svg)
[![CD](https://github.com/gventuri/pandas-ai/actions/workflows/cd.yml/badge.svg)](https://github.com/gventuri/pandas-ai/actions/workflows/cd.yml/badge.svg)
[![Coverage](https://codecov.io/gh/gventuri/pandas-ai/branch/main/graph/badge.svg)](https://codecov.io/gh/gventuri/pandas-ai)
[![Documentation Status](https://readthedocs.org/projects/pandas-ai/badge/?version=latest)](https://pandas-ai.readthedocs.io/en/latest/?badge=latest)
[![Discord](https://dcbadge.vercel.app/api/server/kF7FqH2FwS?style=flat&compact=true)](https://discord.gg/kF7FqH2FwS)
[![Downloads](https://static.pepy.tech/badge/pandasai)](https://pepy.tech/project/pandasai) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Open in Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/1ZnO-njhL7TBOYPZaqvMvGtsjckZKrv2E?usp=sharing)
PandasAI is a Python library that adds Generative AI capabilities to [pandas](https://github.com/pandas-dev/pandas), the popular data analysis and manipulation tool. It is designed to be used in conjunction with pandas, and is not a replacement for it.
<!-- Add images/pandas-ai.png -->
![PandasAI](images/pandas-ai.png?raw=true)
## ð§ Quick install
```bash
pip install pandasai
```
## ð Demo
Try out PandasAI in your browser:
[![Open in Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/1ZnO-njhL7TBOYPZaqvMvGtsjckZKrv2E?usp=sharing)
## ð Documentation
The documentation for PandasAI can be found [here](https://pandas-ai.readthedocs.io/en/latest/).
## ð» Usage
> Disclaimer: GDP data was collected from [this source](https://ourworldindata.org/grapher/gross-domestic-product?tab=table), published by World Development Indicators - World Bank (2022.05.26) and collected at National accounts data - World Bank / OECD. It relates to the year of 2020. Happiness indexes were extracted from [the World Happiness Report](https://ftnnews.com/images/stories/documents/2020/WHR20.pdf). Another useful [link](https://data.world/makeovermonday/2020w19-world-happiness-report-2020).
PandasAI is designed to be used in conjunction with pandas. It makes pandas conversational, allowing you to ask questions to your data in natural language.
### Queries
For example, you can ask PandasAI to find all the rows in a DataFrame where the value of a column is greater than 5, and it will return a DataFrame containing only those rows:
```python
import pandas as pd
from pandasai import SmartDataframe
# Sample DataFrame
df = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
"happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})
# Instantiate a LLM
from pandasai.llm import OpenAI
llm = OpenAI(api_token="YOUR_API_TOKEN")
df = SmartDataframe(df, config={"llm": llm})
df.chat('Which are the 5 happiest countries?')
```
The above code will return the following:
```
6 Canada
7 Australia
1 United Kingdom
3 Germany
0 United States
Name: country, dtype: object
```
Of course, you can also ask PandasAI to perform more complex queries. For example, you can ask PandasAI to find the sum of the GDPs of the 2 unhappiest countries:
```python
df.chat('What is the sum of the GDPs of the 2 unhappiest countries?')
```
The above code will return the following:
```
19012600725504
```
### Charts
You can also ask PandasAI to draw a graph:
```python
df.chat(
"Plot the histogram of countries showing for each the gdp, using different colors for each bar",
)
```
![Chart](images/histogram-chart.png?raw=true)
You can save any charts generated by PandasAI by setting the `save_charts` parameter to `True` in the `PandasAI` constructor. For example, `PandasAI(llm, save_charts=True)`. Charts are saved in `./pandasai/exports/charts` .
### Multiple DataFrames
Additionally, you can also pass in multiple dataframes to PandasAI and ask questions relating them.
```python
import pandas as pd
from pandasai import SmartDatalake
from pandasai.llm import OpenAI
employees_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}
salaries_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Salary': [5000, 6000, 4500, 7000, 5500]
}
employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)
llm = OpenAI()
dl = SmartDatalake([employees_df, salaries_df], config={"llm": llm})
dl.chat("Who gets paid the most?")
```
The above code will return the following:
```
Oh, Olivia gets paid the most.
```
You can find more examples in the [examples](examples) directory.
### â¡ï¸ Shortcuts
PandasAI also provides a number of shortcuts (beta) to make it easier to ask questions to your data. For example, you can ask PandasAI to `clean_data`, `impute_missing_values`, `generate_features`, `plot_histogram`, and many many more.
```python
# Clean data
df.clean_data()
# Impute missing values
df.impute_missing_values()
# Generate features
df.generate_features()
# Plot histogram
df.plot_histogram(column="gdp")
```
Learn more about the shortcuts [here](https://pandas-ai.readthedocs.io/en/latest/shortcuts/).
## ð Privacy & Security
In order to generate the Python code to run, we take the dataframe head, we randomize it (using random generation for sensitive data and shuffling for non-sensitive data) and send just the head.
Also, if you want to enforce further your privacy you can instantiate PandasAI with `enforce_privacy = True` which will not send the head (but just column names) to the LLM.
## ð¤ Contributing
Contributions are welcome! Please check out the todos below, and feel free to open a pull request.
For more information, please see the [contributing guidelines](CONTRIBUTING.md).
After installing the virtual environment, please remember to install `pre-commit` to be compliant with our standards:
```bash
pre-commit install
```
## Contributors
[![Contributors](https://contrib.rocks/image?repo=gventuri/pandas-ai)](https://github.com/gventuri/pandas-ai/graphs/contributors)
## ð License
PandasAI is licensed under the MIT License. See the LICENSE file for more details.
## Acknowledgements
- This project is based on the [pandas](https://github.com/pandas-dev/pandas) library by independent contributors, but it's in no way affiliated with the pandas project.
- This project is meant to be used as a tool for data exploration and analysis, and it's not meant to be used for production purposes. Please use it responsibly.
没有合适的资源?快使用搜索试试~ 我知道了~
pandasai-1.5.5.tar.gz
需积分: 1 0 下载量 161 浏览量
2024-03-09
13:48:40
上传
评论
收藏 74KB GZ 举报
温馨提示
Python库是一组预先编写的代码模块,旨在帮助开发者实现特定的编程任务,无需从零开始编写代码。这些库可以包括各种功能,如数学运算、文件操作、数据分析和网络编程等。Python社区提供了大量的第三方库,如NumPy、Pandas和Requests,极大地丰富了Python的应用领域,从数据科学到Web开发。Python库的丰富性是Python成为最受欢迎的编程语言之一的关键原因之一。这些库不仅为初学者提供了快速入门的途径,而且为经验丰富的开发者提供了强大的工具,以高效率、高质量地完成复杂任务。例如,Matplotlib和Seaborn库在数据可视化领域内非常受欢迎,它们提供了广泛的工具和技术,可以创建高度定制化的图表和图形,帮助数据科学家和分析师在数据探索和结果展示中更有效地传达信息。
资源推荐
资源详情
资源评论
收起资源包目录
pandasai-1.5.5.tar.gz (104个子文件)
LICENSE 1KB
README.md 7KB
PKG-INFO 10KB
__init__.py 22KB
__init__.py 22KB
code_manager.py 20KB
sql.py 17KB
base.py 15KB
query_exec_tracker.py 9KB
airtable.py 8KB
abstract_df.py 8KB
shortcuts.py 8KB
azure_openai.py 7KB
__init__.py 7KB
openai_info.py 6KB
base.py 6KB
yahoo_finance.py 5KB
df_config_manager.py 5KB
_output_types.py 4KB
anonymizer.py 4KB
logger.py 4KB
exceptions.py 4KB
optional.py 4KB
google_vertexai.py 4KB
code_execution.py 4KB
from_google_sheets.py 4KB
snowflake.py 3KB
df_validator.py 3KB
base.py 3KB
pipeline.py 3KB
openai.py 3KB
response_parser.py 3KB
cache.py 3KB
huggingface_text_gen.py 3KB
generate_python_code.py 3KB
databricks.py 3KB
google_palm.py 3KB
data_sampler.py 2KB
prompt_generation.py 2KB
skills_manager.py 2KB
direct_sql_prompt.py 2KB
result_validation.py 2KB
pipeline_context.py 2KB
memory.py 2KB
__init__.py 2KB
path.py 2KB
__init__.py 2KB
_viz_library_types.py 2KB
result_parsing.py 2KB
constants.py 2KB
generate_smart_datalake_pipeline.py 2KB
cache_lookup.py 1KB
df_config.py 1KB
clarification_questions_prompt.py 1KB
code_generator.py 1KB
config.py 1KB
file_based_prompt.py 1KB
synthetic_df_prompt.py 1KB
generate_sdf_pipeline.py 1KB
rephase_query_prompt.py 1KB
streamlit_response.py 1KB
save_chart.py 1016B
starcoder.py 1011B
falcon.py 1006B
cache_population.py 998B
base_logic_unit.py 944B
context.py 871B
sdf_code_executor.py 823B
__init__.py 767B
__init__.py 721B
explain_prompt.py 714B
output_logic_unit.py 700B
df_info.py 649B
__init__.py 596B
langchain.py 588B
fake.py 575B
__init__.py 530B
env.py 524B
check_if_relevant_to_conversation.py 514B
prompt_execution.py 510B
base.py 474B
correct_error_prompt.py 465B
node_visitors.py 447B
__init__.py 446B
code_executor.py 421B
abstract_pipeline.py 353B
__init__.py 311B
__init__.py 310B
__init__.py 281B
generate_synthetic_df_prompt.py 222B
openai.py 213B
__init__.py 16B
clarification_questions_prompt.tmpl 913B
rephrase_query_prompt.tmpl 394B
direct_sql_connector.tmpl 382B
generate_synthetic_data.tmpl 381B
explain_prompt.tmpl 302B
correct_error_prompt.tmpl 219B
generate_python_code.tmpl 175B
check_if_relevant_to_conversation.tmpl 174B
共 104 条
- 1
- 2
资源评论
程序员Chino的日记
- 粉丝: 3685
- 资源: 5万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- BluetoothPrinterDemoCE
- YOLOv11(博主专栏同款)
- 医疗信息管理领域的基于SpringBoot的医院管理系统的分析与实现
- 技术资料分享uCOS-II软件定时器的分析与测试很好的技术资料.zip
- acline_P(1).sql
- 基于MLP、RNN、LSTM的锂电池寿命预测Python实现源码+数据集(高分项目)
- 技术资料分享ucOS-II入门教程(任哲)很好的技术资料.zip
- 技术资料分享UCOSII 2.90 ReleaseNotes很好的技术资料.zip
- 技术资料分享Ucos-II-中文注释版很好的技术资料.zip
- 技术资料分享uCGUI的性能与资源占用很好的技术资料.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功