<div align="center">
<img src="https://github.com/pandas-dev/pandas/blob/master/doc/logo/pandas_logo.png"><br>
</div>
-----------------
# pandas: powerful Python data analysis toolkit
<table>
<tr>
<td>Latest Release</td>
<td>
<a href="https://pypi.org/project/pandas/">
<img src="https://img.shields.io/pypi/v/pandas.svg" alt="latest release" />
</a>
</td>
</tr>
<td></td>
<td>
<a href="https://anaconda.org/anaconda/pandas/">
<img src="https://anaconda.org/conda-forge/pandas/badges/version.svg" alt="latest release" />
</a>
</td>
</tr>
<tr>
<td>Package Status</td>
<td>
<a href="https://pypi.org/project/pandas/">
<img src="https://img.shields.io/pypi/status/pandas.svg" alt="status" /></td>
</a>
</tr>
<tr>
<td>License</td>
<td>
<a href="https://github.com/pandas-dev/pandas/blob/master/LICENSE">
<img src="https://img.shields.io/pypi/l/pandas.svg" alt="license" />
</a>
</td>
</tr>
<tr>
<td>Build Status</td>
<td>
<a href="https://travis-ci.org/pandas-dev/pandas">
<img src="https://travis-ci.org/pandas-dev/pandas.svg?branch=master" alt="travis build status" />
</a>
</td>
</tr>
<tr>
<td></td>
<td>
<a href="https://dev.azure.com/pandas-dev/pandas/_build/latest?definitionId=1&branch=master">
<img src="https://dev.azure.com/pandas-dev/pandas/_apis/build/status/pandas-dev.pandas?branch=master" alt="Azure Pipelines build status" />
</a>
</td>
</tr>
<tr>
<td>Coverage</td>
<td>
<a href="https://codecov.io/gh/pandas-dev/pandas">
<img src="https://codecov.io/github/pandas-dev/pandas/coverage.svg?branch=master" alt="coverage" />
</a>
</td>
</tr>
<tr>
<td>Downloads</td>
<td>
<a href="https://pandas.pydata.org">
<img src="https://anaconda.org/conda-forge/pandas/badges/downloads.svg" alt="conda-forge downloads" />
</a>
</td>
</tr>
<tr>
<td>Gitter</td>
<td>
<a href="https://gitter.im/pydata/pandas">
<img src="https://badges.gitter.im/Join%20Chat.svg"
</a>
</td>
</tr>
</table>
## What is it?
**pandas** is a Python package providing fast, flexible, and expressive data
structures designed to make working with "relational" or "labeled" data both
easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, **real world** data analysis in Python. Additionally, it has
the broader goal of becoming **the most powerful and flexible open source data
analysis / manipulation tool available in any language**. It is already well on
its way towards this goal.
## Main Features
Here are just a few of the things that pandas does well:
- Easy handling of [**missing data**][missing-data] (represented as
`NaN`) in floating point as well as non-floating point data
- Size mutability: columns can be [**inserted and
deleted**][insertion-deletion] from DataFrame and higher dimensional
objects
- Automatic and explicit [**data alignment**][alignment]: objects can
be explicitly aligned to a set of labels, or the user can simply
ignore the labels and let `Series`, `DataFrame`, etc. automatically
align the data for you in computations
- Powerful, flexible [**group by**][groupby] functionality to perform
split-apply-combine operations on data sets, for both aggregating
and transforming data
- Make it [**easy to convert**][conversion] ragged,
differently-indexed data in other Python and NumPy data structures
into DataFrame objects
- Intelligent label-based [**slicing**][slicing], [**fancy
indexing**][fancy-indexing], and [**subsetting**][subsetting] of
large data sets
- Intuitive [**merging**][merging] and [**joining**][joining] data
sets
- Flexible [**reshaping**][reshape] and [**pivoting**][pivot-table] of
data sets
- [**Hierarchical**][mi] labeling of axes (possible to have multiple
labels per tick)
- Robust IO tools for loading data from [**flat files**][flat-files]
(CSV and delimited), [**Excel files**][excel], [**databases**][db],
and saving/loading data from the ultrafast [**HDF5 format**][hdfstore]
- [**Time series**][timeseries]-specific functionality: date range
generation and frequency conversion, moving window statistics,
moving window linear regressions, date shifting and lagging, etc.
[missing-data]: https://pandas.pydata.org/pandas-docs/stable/missing_data.html#working-with-missing-data
[insertion-deletion]: https://pandas.pydata.org/pandas-docs/stable/dsintro.html#column-selection-addition-deletion
[alignment]: https://pandas.pydata.org/pandas-docs/stable/dsintro.html?highlight=alignment#intro-to-data-structures
[groupby]: https://pandas.pydata.org/pandas-docs/stable/groupby.html#group-by-split-apply-combine
[conversion]: https://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe
[slicing]: https://pandas.pydata.org/pandas-docs/stable/indexing.html#slicing-ranges
[fancy-indexing]: https://pandas.pydata.org/pandas-docs/stable/indexing.html#advanced-indexing-with-ix
[subsetting]: https://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
[merging]: https://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging
[joining]: https://pandas.pydata.org/pandas-docs/stable/merging.html#joining-on-index
[reshape]: https://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-and-pivot-tables
[pivot-table]: https://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations
[mi]: https://pandas.pydata.org/pandas-docs/stable/indexing.html#hierarchical-indexing-multiindex
[flat-files]: https://pandas.pydata.org/pandas-docs/stable/io.html#csv-text-files
[excel]: https://pandas.pydata.org/pandas-docs/stable/io.html#excel-files
[db]: https://pandas.pydata.org/pandas-docs/stable/io.html#sql-queries
[hdfstore]: https://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables
[timeseries]: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-series-date-functionality
## Where to get it
The source code is currently hosted on GitHub at:
https://github.com/pandas-dev/pandas
Binary installers for the latest released version are available at the [Python
package index](https://pypi.org/project/pandas) and on conda.
```sh
# conda
conda install pandas
```
```sh
# or PyPI
pip install pandas
```
## Dependencies
- [NumPy](https://www.numpy.org): 1.12.0 or higher
- [python-dateutil](https://labix.org/python-dateutil): 2.5.0 or higher
- [pytz](https://pythonhosted.org/pytz): 2011k or higher
See the [full installation instructions](https://pandas.pydata.org/pandas-docs/stable/install.html#dependencies)
for recommended and optional dependencies.
## Installation from sources
To install pandas from source you need Cython in addition to the normal
dependencies above. Cython can be installed from pypi:
```sh
pip install cython
```
In the `pandas` directory (same one where you found this file after
cloning the git repo), execute:
```sh
python setup.py install
```
or for installing in [development mode](https://pip.pypa.io/en/latest/reference/pip_install.html#editable-installs):
```sh
python setup.py develop
```
Alternatively, you can use `pip` if you want all the dependencies pulled
in automatically (the `-e` option is for installing it in [development
mode](https://pip.pypa.io/en/latest/reference/pip_install.html#editable-installs)):
```sh
pip install -e .
```
See the full instructions for [installing from source](https://pandas.pydata.org/pandas-docs/stable/install.html#installing-from-source).
## License
[BSD 3](LICENSE)
## Documentation
The official documentation is hosted on PyData.org: https://pandas.pydata.org/pandas-docs/stable
## Background
Work on ``pandas`` started at AQR (a quantitative hedge fund) in 2008 and
has been under active development since then.
## Getting Help
For usage questions, the best place to go to is [StackOverflow](https://stackoverf