pandas_redshift-2.0.1.tar.gz_pandas读取不了xlsx文件资源-CSDN文库

需积分: 1 176 浏览量 2024-03-16 13:58:52 上传评论收藏 6KB GZ 举报

共13个文件

txt：5个

py：3个

pkg-info：2个

资源推荐

资源详情

资源评论

收起资源包目录

pandas_redshift-2.0.1.tar.gz （13个子文件）

pandas_redshift-2.0.1

setup.py 503B

PKG-INFO 291B

pandas_redshift

__init__.py 35B

redshift_reserve_words.txt 1KB

core.py 10KB

pandas_redshift.egg-info

SOURCES.txt 319B

top_level.txt 16B

PKG-INFO 291B

requires.txt 29B

dependency_links.txt 1B

MANIFEST.in 51B

setup.cfg 38B

README.md 4KB

## Version 2.0.0 note: Version 2.0.0 introduces a change which may not be compatible with users current workflows/piplines. Previous to 2.0.0 the default when writing a DataFrame to redshift was to write all data types as VARCHAR. In the current version the redshift data types will be inferred from the DataFrame dtypes. ## pandas_redshift This package is designed to make it easier to get data from redshift into a pandas DataFrame and vice versa. The pandas_redshift package only supports python3. ## Installation ```python pip install pandas-redshift ``` ## Example ```python import pandas_redshift as pr ``` Connect to redshift. If port is not supplied it will be set to amazon default 5439. As of release 1.1.2 you can exclude the password if you are using a .pgpass file. ```python pr.connect_to_redshift(dbname = <dbname>, host = <host>, port = <port>, user = <user>, password = <password>) ``` Query redshift and return a pandas DataFrame. ```python data = pr.redshift_to_pandas('select * from gawronski.nba_shots_log') data.ix[0:5,0:5] GAME_ID1 MATCHUP LOCATION W FINAL_MARGIN 0 21400899 MAR 4, 2015 - CHA @ BKN A W 24 1 21400899 MAR 4, 2015 - CHA @ BKN A W 24 2 21400899 MAR 4, 2015 - CHA @ BKN A W 24 3 21400899 MAR 4, 2015 - CHA @ BKN A W 24 4 21400899 MAR 4, 2015 - CHA @ BKN A W 24 5 21400899 MAR 4, 2015 - CHA @ BKN A W 24 ``` Write a pandas DataFrame to redshift. Requires access to an S3 bucket and previously running pr.connect_to_redshift. If the table currently exists **IT WILL BE DROPPED** and then the pandas DataFrame will be put in it's place. If you set append = True the table will be appended to (if it exists). ```python # Connect to S3 pr.connect_to_s3(aws_access_key_id = <aws_access_key_id>, aws_secret_access_key = <aws_secret_access_key>, bucket = <bucket>, subdirectory = <subdirectory> # As of release 1.1.1 you are able to specify an aws_session_token (if necessary): # aws_session_token = <aws_session_token> ) # Write the DataFrame to S3 and then to redshift pr.pandas_to_redshift(data_frame = data, redshift_table_name = 'gawronski.nba_shots_log') ``` Other options: As of v1.1.2 you can specify the region (necessary if the S3 bucket is in a different location than Redshift). ```python pr.pandas_to_redshift(data_frame, redshift_table_name, # Defaults: column_data_types = None, # A list of column data types. As of 2.0.0 If not supplied the data types will be inferred from the DataFrame dtypes index = False, save_local = False, # If set to True a csv from the data frame will save in the current directory delimiter = ',', quotechar = '"', dateformat = 'auto', timeformat = 'auto', region = '', append = False) ``` Redshift data types: http://docs.aws.amazon.com/redshift/latest/dg/c_Supported_data_types.html Finally close the cursor, commit and close the database connection, and remove variables from the environment. ```python pr.close_up_shop() ``` As this package is largely a layer over psycopg2 a convenience function has been added to execute and commit sql queries that don't have anything to do with your local machine (for example creating a new table). ```python pr.exec_commit(""" create table combined_table as select * from table1 union select * from table2; """) ``` If you encounter the error: psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block you can access the pyscopg2 internals with the following: ```python pr.core.connect.commit() pr.core.connect.rollback() ```

评论收藏

内容反馈