pandas_cub-0.0.7.tar.gz资源-CSDN文库

需积分: 5 185 浏览量 2024-03-12 18:21:22 上传评论收藏 30KB GZ 举报

共11个文件

py：4个

txt：3个

pkg-info：2个

资源推荐

资源详情

资源评论

收起资源包目录

pandas_cub-0.0.7.tar.gz （11个子文件）

pandas_cub-0.0.7

setup.py 648B

PKG-INFO 56KB

tests

__init__.py 373B

test_dataframe.py 30KB

pandas_cub.egg-info

SOURCES.txt 219B

top_level.txt 11B

PKG-INFO 56KB

dependency_links.txt 1B

pandas_cub

__init__.py 32KB

setup.cfg 38B

README.md 48KB

# How to use pandas_cub The README.ipynb notebook will serve as the documentation and usage guide to pandas_cub. ## Installation `pip install pandas-cub` ## What is pandas_cub? pandas_cub is a simple data analysis library that emulates the functionality of the pandas library. The library is not meant for serious work. It was built as an assignment for one of Ted Petrou's Python classes. If you would like to complete the assignment on your own, visit [this repository][1]. There are about 40 steps and 100 tests that you must pass in order to rebuild the library. It is a good challenge and teaches you the fundamentals of how to build your own data analysis library. ## pandas_cub functionality pandas_cub has limited functionality but is still capable of a wide variety of data analysis tasks. * Subset selection with the brackets * Arithmetic and comparison operators (+, -, <, !=, etc...) * Aggregation of columns with most of the common functions (min, max, mean, median, etc...) * Grouping via pivot tables * String-only methods for columns containing strings * Reading in simple comma-separated value files * Several other methods ## pandas_cub DataFrame pandas_cub has a single main object, the DataFrame, to hold all of the data. The DataFrame is capable of holding 4 data types - booleans, integers, floats, and strings. All data is stored in NumPy arrays. panda_cub DataFrames have no index (as in pandas). The columns must be strings. ### Missing value representation Boolean and integer columns will have no missing value representation. The NumPy NaN is used for float columns and the Python None is used for string columns. ## Code Examples pandas_cub syntax is very similar to pandas, but implements much fewer methods. The below examples will cover just about all of the API. [1]: https://github.com/tdpetrou/pandas_cub ### Reading data with `read_csv` pandas_cub consists of a single function, `read_csv`, that has a single parameter, the location of the file you would like to read in as a DataFrame. This function can only handle simple CSV's and the delimiter must be a comma. A sample employee dataset is provided in the data directory. Notice that the visual output of the DataFrame is nearly identical to that of a pandas DataFrame. The `head` method returns the first 5 rows by default. ```python import pandas_cub as pdc ``` ```python df = pdc.read_csv('data/employee.csv') df.head() ``` <table><thead><tr><th></th><th>dept </th><th>race </th><th>gender </th><th>salary </th></tr></thead><tbody><tr><td><strong>0</strong></td><td>Houston Police Department-HPD</td><td>White </td><td>Male </td><td> 45279</td></tr><tr><td><strong>1</strong></td><td>Houston Fire Department (HFD)</td><td>White </td><td>Male </td><td> 63166</td></tr><tr><td><strong>2</strong></td><td>Houston Police Department-HPD</td><td>Black </td><td>Male </td><td> 66614</td></tr><tr><td><strong>3</strong></td><td>Public Works & Engineering-PWE</td><td>Asian </td><td>Male </td><td> 71680</td></tr><tr><td><strong>4</strong></td><td>Houston Airport System (HAS)</td><td>White </td><td>Male </td><td> 42390</td></tr></tbody></table> ### DataFrame properties The `shape` property returns a tuple of the number of rows and columns ```python df.shape ``` (1535, 4) The `len` function returns just the number of rows. ```python len(df) ``` 1535 The `dtypes` property returns a DataFrame of the column names and their respective data type. ```python df.dtypes ``` <table><thead><tr><th></th><th>Column Name</th><th>Data Type </th></tr></thead><tbody><tr><td><strong>0</strong></td><td>dept </td><td>string </td></tr><tr><td><strong>1</strong></td><td>race </td><td>string </td></tr><tr><td><strong>2</strong></td><td>gender </td><td>string </td></tr><tr><td><strong>3</strong></td><td>salary </td><td>int </td></tr></tbody></table> The `columns` property returns a list of the columns. ```python df.columns ``` ['dept', 'race', 'gender', 'salary'] Set new columns by assigning the `columns` property to a list. ```python df.columns = ['department', 'race', 'gender', 'salary'] df.head() ``` <table><thead><tr><th></th><th>department</th><th>race </th><th>gender </th><th>salary </th></tr></thead><tbody><tr><td><strong>0</strong></td><td>Houston Police Department-HPD</td><td>White </td><td>Male </td><td> 45279</td></tr><tr><td><strong>1</strong></td><td>Houston Fire Department (HFD)</td><td>White </td><td>Male </td><td> 63166</td></tr><tr><td><strong>2</strong></td><td>Houston Police Department-HPD</td><td>Black </td><td>Male </td><td> 66614</td></tr><tr><td><strong>3</strong></td><td>Public Works & Engineering-PWE</td><td>Asian </td><td>Male </td><td> 71680</td></tr><tr><td><strong>4</strong></td><td>Houston Airport System (HAS)</td><td>White </td><td>Male </td><td> 42390</td></tr></tbody></table> The `values` property returns a single numpy array of all the data. ```python df.values ``` array([['Houston Police Department-HPD', 'White', 'Male', 45279], ['Houston Fire Department (HFD)', 'White', 'Male', 63166], ['Houston Police Department-HPD', 'Black', 'Male', 66614], ..., ['Houston Police Department-HPD', 'White', 'Male', 43443], ['Houston Police Department-HPD', 'Asian', 'Male', 55461], ['Houston Fire Department (HFD)', 'Hispanic', 'Male', 51194]], dtype=object) ### Subset selection Subset selection is handled with the brackets. To select a single column, place that column name in the brackets. ```python df['race'].head() ``` <table><thead><tr><th></th><th>race </th></tr></thead><tbody><tr><td><strong>0</strong></td><td>White </td></tr><tr><td><strong>1</strong></td><td>White </td></tr><tr><td><strong>2</strong></td><td>Black </td></tr><tr><td><strong>3</strong></td><td>Asian </td></tr><tr><td><strong>4</strong></td><td>White </td></tr></tbody></table> Select multiple columns with a list of strings. ```python df[['race', 'salary']].head() ``` <table><thead><tr><th></th><th>race </th><th>salary </th></tr></thead><tbody><tr><td><strong>0</strong></td><td>White </td><td> 45279</td></tr><tr><td><strong>1</strong></td><td>White </td><td> 63166</td></tr><tr><td><strong>2</strong></td><td>Black </td><td> 66614</td></tr><tr><td><strong>3</strong></td><td>Asian </td><td> 71680</td></tr><tr><td><strong>4</strong></td><td>White </td><td> 42390</td></tr></tbody></table> Simultaneously select rows and columns by passing the brackets the row selection followed by the column selection separated by a comma. Here we use integers for rows and strings for columns. ```python rows = [10, 50, 100] cols = ['salary', 'race'] df[rows, cols] ``` <table><thead><tr><th></th><th>salary </th><th>race </th></tr></thead><tbody><tr><td><strong>0</strong></td><td> 77076</td><td>Black </td></tr><tr><td><strong>1</strong></td><td> 81239</td><td>White </td></tr><tr><td><strong>2</strong></td><td> 81239</td><td>White </td></tr></tbody></table> You can use integers for the columns as well. ```python rows = [10, 50, 100] cols = [2, 0] df[rows, cols] ``` <table><thead><tr><th></th><th>gender </th><th>department</th></tr></thead><tbody><tr><td><strong>0</strong></td><td>Male </td><td>Houston Police Department-HPD</td></tr><tr><td><strong>1</strong></td><td>Male </td><td>Houston Police Department-HPD</td></tr><tr><td><strong>2</strong></td><td>Male </td><td>Houston Police Department-HPD</td></tr></tbody></table> You can use a single integer and not just a list. ```python df[99, 3] ``` <table><thead><tr><th></th><th>salary </th></tr></thead><tbody><tr><

评论收藏

内容反馈