<h1 align="center">
<img src="https://raw.githubusercontent.com/pola-rs/polars-static/master/logos/polars_github_logo_rect_dark_name.svg" alt="Polars logo">
<br>
</h1>
<div align="center">
<a href="https://crates.io/crates/polars">
<img src="https://img.shields.io/crates/v/polars.svg" alt="crates.io Latest Release"/>
</a>
<a href="https://pypi.org/project/polars/">
<img src="https://img.shields.io/pypi/v/polars.svg" alt="PyPi Latest Release"/>
</a>
<a href="https://www.npmjs.com/package/nodejs-polars">
<img src="https://img.shields.io/npm/v/nodejs-polars.svg" alt="NPM Latest Release"/>
</a>
<a href="https://rpolars.r-universe.dev">
<img src="https://rpolars.r-universe.dev/badges/polars" alt="R-universe Latest Release"/>
</a>
<a href="https://doi.org/10.5281/zenodo.7697217">
<img src="https://zenodo.org/badge/DOI/10.5281/zenodo.7697217.svg" alt="DOI Latest Release"/>
</a>
</div>
<p align="center">
<b>Documentation</b>:
<a href="https://docs.pola.rs/py-polars/html/reference/index.html">Python</a>
-
<a href="https://docs.rs/polars/latest/polars/">Rust</a>
-
<a href="https://pola-rs.github.io/nodejs-polars/index.html">Node.js</a>
-
<a href="https://rpolars.github.io/index.html">R</a>
|
<b>StackOverflow</b>:
<a href="https://stackoverflow.com/questions/tagged/python-polars">Python</a>
-
<a href="https://stackoverflow.com/questions/tagged/rust-polars">Rust</a>
-
<a href="https://stackoverflow.com/questions/tagged/nodejs-polars">Node.js</a>
-
<a href="https://stackoverflow.com/questions/tagged/r-polars">R</a>
|
<a href="https://docs.pola.rs/">User guide</a>
|
<a href="https://discord.gg/4UfP5cfBE7">Discord</a>
</p>
## Polars: Blazingly fast DataFrames in Rust, Python, Node.js, R, and SQL
Polars is a DataFrame interface on top of an OLAP Query Engine implemented in Rust using
[Apache Arrow Columnar Format](https://arrow.apache.org/docs/format/Columnar.html) as the memory model.
- Lazy | eager execution
- Multi-threaded
- SIMD
- Query optimization
- Powerful expression API
- Hybrid Streaming (larger-than-RAM datasets)
- Rust | Python | NodeJS | R | ...
To learn more, read the [user guide](https://docs.pola.rs/).
## Python
```python
>>> import polars as pl
>>> df = pl.DataFrame(
... {
... "A": [1, 2, 3, 4, 5],
... "fruits": ["banana", "banana", "apple", "apple", "banana"],
... "B": [5, 4, 3, 2, 1],
... "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
... }
... )
# embarrassingly parallel execution & very expressive query language
>>> df.sort("fruits").select(
... "fruits",
... "cars",
... pl.lit("fruits").alias("literal_string_fruits"),
... pl.col("B").filter(pl.col("cars") == "beetle").sum(),
... pl.col("A").filter(pl.col("B") > 2).sum().over("cars").alias("sum_A_by_cars"),
... pl.col("A").sum().over("fruits").alias("sum_A_by_fruits"),
... pl.col("A").reverse().over("fruits").alias("rev_A_by_fruits"),
... pl.col("A").sort_by("B").over("fruits").alias("sort_A_by_B_by_fruits"),
... )
shape: (5, 8)
┌──────────┬──────────┬──────────────┬─────┬─────────────┬─────────────┬─────────────┬─────────────┐
│ fruits ┆ cars ┆ literal_stri ┆ B ┆ sum_A_by_ca ┆ sum_A_by_fr ┆ rev_A_by_fr ┆ sort_A_by_B │
│ --- ┆ --- ┆ ng_fruits ┆ --- ┆ rs ┆ uits ┆ uits ┆ _by_fruits │
│ str ┆ str ┆ --- ┆ i64 ┆ --- ┆ --- ┆ --- ┆ --- │
│ ┆ ┆ str ┆ ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════════╪══════════╪══════════════╪═════╪═════════════╪═════════════╪═════════════╪═════════════╡
│ "apple" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 7 ┆ 4 ┆ 4 │
│ "apple" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 7 ┆ 3 ┆ 3 │
│ "banana" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 8 ┆ 5 ┆ 5 │
│ "banana" ┆ "audi" ┆ "fruits" ┆ 11 ┆ 2 ┆ 8 ┆ 2 ┆ 2 │
│ "banana" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 8 ┆ 1 ┆ 1 │
└──────────┴──────────┴──────────────┴─────┴─────────────┴─────────────┴─────────────┴─────────────┘
```
## SQL
```python
>>> df = pl.scan_ipc("file.arrow")
>>> # create a SQL context, registering the frame as a table
>>> sql = pl.SQLContext(my_table=df)
>>> # create a SQL query to execute
>>> query = """
... SELECT sum(v1) as sum_v1, min(v2) as min_v2 FROM my_table
... WHERE id1 = 'id016'
... LIMIT 10
... """
>>> ## OPTION 1
>>> # run the query, materializing as a DataFrame
>>> sql.execute(query, eager=True)
shape: (1, 2)
┌────────┬────────┐
│ sum_v1 ┆ min_v2 │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞════════╪════════╡
│ 298268 ┆ 1 │
└────────┴────────┘
>>> ## OPTION 2
>>> # run the query but don't immediately materialize the result.
>>> # this returns a LazyFrame that you can continue to operate on.
>>> lf = sql.execute(query)
>>> (lf.join(other_table)
... .group_by("foo")
... .agg(
... pl.col("sum_v1").count()
... ).collect())
```
SQL commands can also be run directly from your terminal using the Polars CLI:
```bash
# run an inline SQL query
> polars -c "SELECT sum(v1) as sum_v1, min(v2) as min_v2 FROM read_ipc('file.arrow') WHERE id1 = 'id016' LIMIT 10"
# run interactively
> polars
Polars CLI v0.3.0
Type .help for help.
> SELECT sum(v1) as sum_v1, min(v2) as min_v2 FROM read_ipc('file.arrow') WHERE id1 = 'id016' LIMIT 10;
```
Refer to the [Polars CLI repository](https://github.com/pola-rs/polars-cli) for more information.
## Performance 🚀🚀
### Blazingly fast
Polars is very fast. In fact, it is one of the best performing solutions available. See the [TPC-H benchmarks](https://www.pola.rs/benchmarks.html) results.
### Lightweight
Polars is also very lightweight. It comes with zero required dependencies, and this shows in the import times:
- polars: 70ms
- numpy: 104ms
- pandas: 520ms
### Handles larger-than-RAM data
If you have data that does not fit into memory, Polars' query engine is able to process your query (or parts of your query) in a streaming fashion.
This drastically reduces memory requirements, so you might be able to process your 250GB dataset on your laptop.
Collect with `collect(streaming=True)` to run the query streaming.
(This might be a little slower, but it is still very fast!)
## Setup
### Python
Install the latest Polars version with:
```sh
pip install polars
```
We also have a conda package (`conda install -c conda-forge polars`), however pip is the preferred way to install Polars.
Install Polars with all optional dependencies.
```sh
pip install 'polars[all]'
```
You can also install a subset of all optional dependencies.
```sh
pip install 'polars[numpy,pandas,pyarrow]'
```
See the [User Guide](https://docs.pola.rs/user-guide/installation/#feature-flags) for more details on optional dependencies
To see the current Polars version and a
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
一个高性能的多线程数据处理库,采用 Apache Arrow 作为底层技术,提供了低内存占用和高处理速度的优势。它设计了简单易用的 API,支持处理大数据集(例如50GB),使数据处理更加高效。Polars 支持包括 Rust 和 Python 在内的多种编程语言,适合需要处理大规模数据集的开发者使用。
资源推荐
资源详情
资源评论
收起资源包目录
高性能多线程数据处理库 (2000个子文件)
CODEOWNERS 344B
extra.css 6KB
custom.css 2KB
reddit.csv 5KB
iris.csv 4KB
apple_stock.csv 2KB
null_nutriscore.csv 586B
foods1.csv 457B
foods4.csv 457B
foods3.csv 455B
foods2.csv 455B
foods5.csv 452B
lineitem.feather 5KB
customer.feather 4KB
part.feather 4KB
orders.feather 4KB
supplier.feather 3KB
partsupp.feather 3KB
nation.feather 2KB
region.feather 1KB
.gitattributes 31B
.gitignore 335B
.gitignore 25B
.gitignore 2B
404.html 15KB
sidebar-nav-bs.html 406B
api_redirect.html 318B
v2.metadata.json 3KB
foods1.json 2KB
00000000000000000001.json 936B
00000000000000000000.json 905B
version_switcher.json 530B
dprint.json 391B
mlc-config.json 164B
LICENSE 11KB
LICENSE 1KB
LICENSE 13B
LICENSE 13B
LICENSE 13B
LICENSE 13B
LICENSE 13B
LICENSE 13B
LICENSE 13B
LICENSE 13B
LICENSE 13B
LICENSE 13B
LICENSE 13B
Cargo.lock 117KB
Makefile 5KB
Makefile 4KB
index.md 16KB
0.20.md 14KB
pandas.md 13KB
README.md 11KB
joins.md 10KB
categoricals.md 10KB
plugins.md 10KB
installation.md 9KB
getting-started.md 7KB
user-defined-functions.md 7KB
execution.md 7KB
missing-data.md 7KB
multiprocessing.md 7KB
lists.md 6KB
test.md 6KB
rolling.md 6KB
column-selections.md 6KB
aggregation.md 6KB
overview.md 5KB
casting.md 5KB
0.19.md 5KB
versioning.md 5KB
intro.md 5KB
window.md 5KB
ide.md 5KB
structs.md 4KB
strings.md 4KB
database.md 4KB
ecosystem.md 4KB
functions.md 3KB
comparison.md 3KB
index.md 3KB
contexts.md 3KB
README.md 3KB
CODE_OF_CONDUCT.md 3KB
spark.md 3KB
concatenation.md 3KB
schemas.md 3KB
README.md 3KB
query-plan.md 3KB
ci.md 3KB
lib.md 3KB
code-style.md 3KB
select.md 3KB
parsing.md 3KB
data-structures.md 3KB
cloud-storage.md 3KB
excel.md 2KB
visualization.md 2KB
expressions.md 2KB
共 2000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 20
资源评论
UnknownToKnown
- 粉丝: 1w+
- 资源: 590
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- vscode配置c/c 环境教程
- vscode配置c/c 环境教程
- 基于matlab实现电磁优化计算功能,进行线型规划优化电磁设计.rar
- 基于matlab实现带精英策略的非支配排序遗传算法matlab 源码.rar
- 基于matlab实现差分进化算法,最新的用于替代遗传算法,是以后的主要发展方法.rar
- VSCode配置c/c++环境教程.md
- 基于matlab实现标准合作型协同进化遗传算法matlab源程序
- 七下人教.zip
- 基于matlab实现本份代码能对图像进行gabor滤波处理,结合指纹方向图以及指纹沟壑频率特性,对指纹图像进行增强.rar
- 基于matlab实现RBM神经网络实现了手写数字体识别的GUI程序.rar
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功