pandasql
========
`pandasql` allows you to query `pandas` DataFrames using SQL syntax. It works
similarly to `sqldf` in R. `pandasql` seeks to provide a more familiar way of
manipulating and cleaning data for people new to Python or `pandas`.
#### Installation
```
$ pip install -U pandasql
```
#### Basics
The main function used in pandasql is `sqldf`. `sqldf` accepts 2 parametrs
- a sql query string
- a set of session/environment variables (`locals()` or `globals()`)
Specifying `locals()` or `globals()` can get tedious. You can define a short
helper function to fix this.
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
#### Querying
`pandasql` uses [SQLite syntax](http://www.sqlite.org/lang.html). Any `pandas`
dataframes will be automatically detected by `pandasql`. You can query them as
you would any regular SQL table.
```
$ python
>>> from pandasql import sqldf, load_meat, load_births
>>> pysqldf = lambda q: sqldf(q, globals())
>>> meat = load_meat()
>>> births = load_births()
>>> print pysqldf("SELECT * FROM meat LIMIT 10;").head()
date beef veal pork lamb_and_mutton broilers other_chicken turkey
0 1944-01-01 00:00:00 751 85 1280 89 None None None
1 1944-02-01 00:00:00 713 77 1169 72 None None None
2 1944-03-01 00:00:00 741 90 1128 75 None None None
3 1944-04-01 00:00:00 650 89 978 66 None None None
4 1944-05-01 00:00:00 681 106 1029 78 None None None
```
joins and aggregations are also supported
```
>>> q = """SELECT
m.date, m.beef, b.births
FROM
meats m
INNER JOIN
births b
ON m.date = b.date;"""
>>> joined = pyqldf(q)
>>> print joined.head()
date beef births
403 2012-07-01 00:00:00 2200.8 368450
404 2012-08-01 00:00:00 2367.5 359554
405 2012-09-01 00:00:00 2016.0 361922
406 2012-10-01 00:00:00 2343.7 347625
407 2012-11-01 00:00:00 2206.6 320195
>>> q = "select
strftime('%Y', date) as year
, SUM(beef) as beef_total
FROM
meat
GROUP BY
year;"
>>> print pysqldf(q).head()
year beef_total
0 1944 8801
1 1945 9936
2 1946 9010
3 1947 10096
4 1948 8766
```
More information and code samples available in the [examples](https://github.com/yhat/pandasql/blob/master/examples/demo.py)
folder or on [our blog](http://blog.yhathq.com/posts/pandasql-sql-for-pandas-dataframes.html).
[![Analytics](https://ga-beacon.appspot.com/UA-46996803-1/pandasql/README.md)](https://github.com/yhat/pandasql)
泡芙萝莉酱
- 粉丝: 2398
- 资源: 958
最新资源
- 留守儿童网站-JAVA-基于springBoot的留守儿童网站的设计与实现(毕业论文)
- 算法的在线课程推荐系-JAVA-基于springboot基于推荐算法的在线课程推荐系统设计与实现(毕业论文)
- 两相步进电机FOC矢量控制Simulink仿真模型 1.采用针对两相步进电机的SVPWM控制算法,实现FOC矢量控制,DQ轴解耦控制~ 2.转速电流双闭环控制,电流环采用PI控制,转速环分别采用PI和
- 巡游出租管理-JAVA-基于springCloud微服务架构的巡游出租管理平台(毕业论文)
- 基于RLS的最小二乘法永磁同步电机交直轴电感在线参数辨识 辨识模块是由s-function书写的,辨识效果较好
- 煤矿员工健康-JAVA-基于协同过滤算法的springboot+vue的煤矿员工健康管理系统(毕业论文)
- 基于plc的电梯控制系统 两部六层群控电梯 基于西门子1200plc的电梯自动仿真程序,不需要PLC实物,提供程序,画面,接线图,流程图,IO分配表,设计报告 运行效果,详见上方演示视频
- (178112810)基于ssm+vue餐厅点餐系统.zip
- (178199432)C++实现STL容器之List
- (174768216)基于SpringBoot+Vue的毕业设计选题系统+毕业设计
- 自动驾驶横纵向耦合控制-复现Apollo横纵向控制 基于动力学误差模型,使用mpc算法,一个控制器同时控制横向和纵向,实现横纵向耦合控制 matlab与simulink联合仿真,纵向控制已经做好油门刹
- (177537818)python爬虫基础知识及爬虫实例.zip
- (177377030)Python 爬虫.zip
- 基于滑膜控制smc的3辆协同自适应巡航控制,上层滑膜控制器产生期望加速度,下层通过油门和刹车控制车速,实现自适应巡航控制 个人觉得从结果图中看出基于滑膜控制的效果非常好,不亚于模型预测控制mpc
- lanchaoHunanHoutaiQiantai
- (175989002)DDR4 JESD79-4C.pdf
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈