pandasticsearch-0.4.3.tar.gz资源-CSDN文库

需积分: 1 4 浏览量 2024-03-11 16:21:09 上传评论收藏 245KB GZ 举报

共127个文件

py：116个

txt：5个

md：2个

资源推荐

资源详情

资源评论

收起资源包目录

pandasticsearch-0.4.3.tar.gz （127个子文件）

setup.cfg 38B

MANIFEST.in 111B

README.md 6KB

CHANGELOG.md 770B

PKG-INFO 284B

dtcompat.py 86KB

mock.py 82KB

testpatch.py 55KB

testmock.py 49KB

multiprocess.py 34KB

__init__.py 30KB

test_packaging.py 28KB

testhelpers.py 28KB

util.py 26KB

base.py 25KB

loader.py 25KB

packaging.py 25KB

config.py 25KB

suite.py 22KB

util.py 20KB

version.py 18KB

test_setup.py 18KB

doctests.py 17KB

dataframe.py 17KB

testmagicmethods.py 16KB

manager.py 15KB

test_version.py 13KB

plugintest.py 13KB

case.py 13KB

core.py 13KB

cover.py 11KB

xunit.py 11KB

test_integration.py 11KB

testwith.py 10KB

git.py 10KB

conf.py 10KB

testid.py 10KB

attrib.py 9KB

logcapture.py 9KB

types.py 9KB

selector.py 9KB

builddoc.py 9KB

base.py 8KB

operators.py 8KB

pyversion.py 7KB

test_operators.py 7KB

errorclass.py 7KB

test_queries.py 7KB

inspector.py 7KB

proxy.py 7KB

queries.py 7KB

result.py 7KB

commands.py 6KB

test_dataframe.py 6KB

core.py 6KB

__init__.py 6KB

importer.py 6KB

pluginopts.py 6KB

twistedtools.py 5KB

test_wsgi.py 5KB

testr_command.py 5KB

test_core.py 5KB

prof.py 5KB

testcallable.py 4KB

nontrivial.py 4KB

test_hooks.py 4KB

isolate.py 4KB

files.py 4KB

client.py 3KB

capture.py 3KB

main.py 3KB

collect.py 3KB

test_commands.py 3KB

test_util.py 3KB

test_files.py 3KB

util.py 3KB

commands.py 2KB

conf.py 2KB

options.py 2KB

_setup_hooks.py 2KB

debug.py 2KB

skip.py 2KB

allmodules.py 2KB

failuredetail.py 2KB

deprecated.py 2KB

test_types.py 1KB

wsgi.py 1KB

failure.py 1KB

pbr_json.py 1KB

trivial.py 1KB

backwards.py 1KB

extra_files.py 1KB

__init__.py 1KB

metadata.py 1KB

find_package.py 1KB

base.py 1KB

builtin.py 1021B

__init__.py 985B

testsentinel.py 976B

共 127 条

## Pandasticsearch [![Build Status](https://travis-ci.org/onesuper/pandasticsearch.svg?branch=master)](https://travis-ci.org/onesuper/pandasticsearch) [![PyPI](https://img.shields.io/pypi/v/pandasticsearch.svg)](https://pypi.python.org/pypi/pandasticsearch) Pandasticsearch is an Elasticsearch client for data-analysis purpose. It provides table-like access to Elasticsearch documents, similar to the Python Pandas library and R DataFrames. To install: ``` pip install pandasticsearch # if you intent to export Pandas DataFrame pip install pandasticsearch[pandas] ``` Elasticsearch is skilled in real-time indexing, search and data-analysis. Pandasticsearch can convert the analysis results (e.g. multi-level nested aggregation) into [Pandas](http://pandas.pydata.org) DataFrame objects for subsequent data analysis. Checkout the API doc: [http://pandasticsearch.readthedocs.io/en/latest/](http://pandasticsearch.readthedocs.io/en/latest/). ## Usage ### DataFrame API A `DataFrame` object accesses Elasticsearch data with high level operations. It is type-safe, easy-to-use and Pandas-flavored. ```python # Create a DataFrame object from pandasticsearch import DataFrame df = DataFrame.from_es(url='http://localhost:9200', index='people') # Print the schema(mapping) of the index df.print_schema() # company # |-- employee # |-- name: {'index': 'not_analyzed', 'type': 'string'} # |-- age: {'type': 'integer'} # |-- gender: {'index': 'not_analyzed', 'type': 'string'} # Inspect the columns df.columns #['name', 'age', 'gender'] # Denote a column df.name # Column('name') df['age'] # Column('age') # Projection df.filter(df.age < 25).select('name', 'age').collect() # [Row(age=12,name='Alice'), Row(age=11,name='Bob'), Row(age=13,name='Leo')] # Print the rows into console df.filter(df.age < 25).select('name').show(3) # +------+ # | name | # +------+ # | Alice| # | Bob | # | Leo | # +------+ # Convert to Pandas object for subsequent analysis df[df.gender == 'male'].agg(df.age.avg).to_pandas() # avg(age) # 0 12 # Translate the DataFrame to an ES query (dictionary) df[df.gender == 'male'].agg(df.age.avg).to_dict() # {'query': {'filtered': {'filter': {'term': {'gender': 'male'}}}}, 'aggregations': {'avg(birthYear)': # {'avg': {'field': 'birthYear'}}}, 'size': 0} ``` ### Filter ```python # Filter by a boolean condition df.filter(df.age < 13).collect() # [Row(age=12,gender='female',name='Alice'), Row(age=11,gender='male',name='Bob')] # Filter by a set of boolean conditions df.filter(df.age < 13 & df.gender == 'male').collect() # Row(age=11,gender='male',name='Bob')] # Filter by a wildcard (sql `like`) df.filter(df.name.like('A*')).collect() # [Row(age=12,gender='female',name='Alice')] # Filter by a regular expression (sql `rlike`) df.filter(df.name.rlike('A.l.e')).collect() # [Row(age=12,gender='female',name='Alice')] # Filter by a prefixed string pattern df.filter(df.name.startswith('Al')).collect() # [Row(age=12,gender='female',name='Alice')] # Filter by a script from pandasticsearch.operators import ScriptFilter df.filter(ScriptFilter('2016 - doc["age"].value > 1995')).collect() # [Row(age=12,name='Alice'), Row(age=13,name='Leo')] ``` **5.0 compatibility**: By default, pandasticsearch use `filtered` query (deprecated since 5.0). To use pandasticsearch against the latest ES version, a `compat` arg can be passed to `from_es`: ``` df = DataFrame.from_es(url='http://localhost:9200', index='people', compat=5) ``` ### Aggregation ```python # Aggregation df[df.gender == 'male'].agg(df.age.avg).collect() # [Row(avg(age)=12)] # Metric alias df[df.gender == 'male'].agg(df.age.avg.alias('avg_age')).collect() # [Row(avg_age=12)] # Groupby only (will give the `doc_count`) df.groupby('gender').collect() # [Row(doc_count=1), Row(doc_count=2)] # Groupby and then aggregate df.groupby('gender').agg(df.age.max).collect() # [Row(doc_count=1, max(age)=12), Row(doc_count=2, max(age)=13)] # Group by a set of ranges df.groupby(df.age.ranges([10,12,14])).to_pandas() # doc_count # range(10,12,14) # 10.0-12.0 2 # 12.0-14.0 1 # Advanced ES aggregation df.groupby(df.gender).agg(df.age.stats).to_pandas() df.agg(df.age.extended_stats).to_pandas() df.agg(df.age.percentiles).to_pandas() df.groupby(df.date.date_interval('1d')).to_pandas() # Customized aggregation terms df.groupby(df.age.terms(size=5, include=[1, 2, 3])) ``` ### Sort ```python # Sort df.sort(df.age.asc).select('name', 'age').collect() # [Row(age=11,name='Bob'), Row(age=12,name='Alice'), Row(age=13,name='Leo')] # Sort by a script from pandasticsearch.operators import ScriptSorter df.sort(ScriptSorter('doc["age"].value * 2')).collect() # [Row(age=11,name='Bob'), Row(age=12,name='Alice'), Row(age=13,name='Leo')] ``` ## Use with Another Python Client Pandasticsearch can also be used with another full featured Python client: * [elasticsearch-py](https://github.com/elastic/elasticsearch-py) (Official) * [Elasticsearch-SQL](https://github.com/NLPchina/elasticsearch-sql) * [pyelasticsearch](https://github.com/pyelasticsearch/pyelasticsearch) * [pyes](https://github.com/aparo/pyes) ### Build query ```Python from pandasticsearch import DataFrame body = df[df['gender'] == 'male'].agg(df['age'].avg).to_dict() from elasticsearch import Elasticsearch result_dict = es.search(index="recruit", body=body) ``` ### Parse result ```python from elasticsearch import Elasticsearch es = Elasticsearch('http://localhost:9200') result_dict = es.search(index="recruit", body={"query": {"match_all": {}}}) from pandasticsearch import Select pandas_df = Select.from_dict(result_dict).to_pandas() ``` ## Related Articles * [Spark and Elasticsearch for real-time data analysis](https://spark-summit.org/2015-east/wp-content/uploads/2015/03/SSE15-35-Leau.pdf) ## LICENSE MIT

评论收藏

内容反馈