Pandas Datacube
======
## About
**pandas-datacube** is a python package allowing to convert and download a
[datacube](https://www.w3.org/2011/gld/wiki/Data_Cube_Vocabulary) from a remote source using
[SPARQL](https://www.w3.org/TR/sparql11-overview/) queries and to obtain a pandas dataframe.
This module is able to detect the different datasets of an entry point and its different dimensions and measures,
to use the metadata present in the ontology to order the dimensions and to download the data
This project was realized during an internship at [LIG](https://www.liglab.fr/) in the
[GETALP](http://lig-getalp.imag.fr/fr/accueil/) team under the supervision of Mr Sérasset (Gilles.Serasset@imag.fr)
## Installation
You can install pandas-datacube from [PyPi](https://pypi.org/project/pandas-datacube):
```
$ pip install pandas-datacube
```
## How to use
The module is quite simple to use:
- get all datasets available:
```python
from pandasdatacube import get_datasets
import pandas as pd
ENDPOINT: str = "https://statistics.gov.scot/sparql"
datasets: pd.DataFrame = get_datasets(ENDPOINT)
datasets.head()
```
| | dataset | commentaire |
|---:|:--------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | http://statistics.gov.scot/data/pupil-attainment | Number of pupils who attained a given number of qualifications by level and stage. |
| 1 | http://statistics.gov.scot/data/alcohol-related-discharge | Number and European Age-sex Standardised Rates (EASRs) of general acute inpatient and day case discharges with an alcohol-related diagnosis. |
| 2 | http://statistics.gov.scot/data/business-births-deaths-and-survival-rates | Number and rate (per 10,000 adults) of VAT/PAYE registrations, de-registrations and business survival rates |
| 3 | http://statistics.gov.scot/data/earnings | Mean and median gross weekly earnings (£s) by gender, working pattern and workplace/residence measure. |
| 4 | http://statistics.gov.scot/data/economic-inactivity | Economic inactivity level and rate by gender|
- get and transform features of a dataset
```python
from pandasdatacube import get_features, transform_features
import pandas as pd
ENDPOINT: str = "https://statistics.gov.scot/sparql"
DATASET_NAME: str = "http://statistics.gov.scot/data/earnings"
features: pd.DataFrame = get_features(ENDPOINT, DATASET_NAME)
features.head()
```
| | item | type | property |
|---:|:--------------------------------------------------------------------------|:------------------------------------------------|:----------------------------------------------------------|
| 0 | http://statistics.gov.scot/def/component-specification/earnings/refArea | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://purl.org/linked-data/cube#ComponentSpecification |
| 1 | http://statistics.gov.scot/def/component-specification/earnings/refArea | http://purl.org/linked-data/cube#dimension | http://purl.org/linked-data/sdmx/2009/dimension#refArea |
| 2 | http://statistics.gov.scot/def/component-specification/earnings/refArea | http://purl.org/linked-data/cube#order | 1 |
| 3 | http://statistics.gov.scot/def/component-specification/earnings/refArea | http://purl.org/linked-data/cube#codeList | http://statistics.gov.scot/def/code-list/earnings/refArea |
| 4 | http://statistics.gov.scot/def/component-specification/earnings/refPeriod | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://purl.org/linked-data/cube#ComponentSpecification |
```python
transformed_features: tuple[list[str]] = transform_features(features)
print(transformed_features)
```
Output:
```python
(['http://purl.org/linked-data/sdmx/2009/dimension#refArea',
'http://purl.org/linked-data/sdmx/2009/dimension#refPeriod',
'http://purl.org/linked-data/cube#measureType',
'http://statistics.gov.scot/def/dimension/gender',
'http://statistics.gov.scot/def/dimension/workingPattern',
'http://statistics.gov.scot/def/dimension/populationGroup'],
['http://statistics.gov.scot/def/measure-properties/median',
'http://statistics.gov.scot/def/measure-properties/mean'])
```
- download a dataset
```python
from pandasdatacube import download_dataset
import pandas as pd
ENDPOINT: str = "https://statistics.gov.scot/sparql"
DATASET_NAME: str = "http://statistics.gov.scot/data/earnings"
DIMENSIONS: list[str] = ['http://purl.org/linked-data/sdmx/2009/dimension#refArea',
'http://purl.org/linked-data/sdmx/2009/dimension#refPeriod',
'http://purl.org/linked-data/cube#measureType',
'http://statistics.gov.scot/def/dimension/gender',
'http://statistics.gov.scot/def/dimension/workingPattern',
'http://statistics.gov.scot/def/dimension/populationGroup']
MEASURES: list[str] = ['http://statistics.gov.scot/def/measure-properties/median',
'http://statistics.gov.scot/def/measure-properties/mean']
data: pd.DataFrame = download_dataset(
endpoint =ENDPOINT,
dataset_name=DATASET_NAME,
dimensions=DIMENSIONS,
measures=MEASURES
)
data.head().reset_index()
```
| | refArea | refPeriod | measureType | gender | workingPattern | populationGroup | median | mean |
|---:|:--------------------------------------------------------------|:------------------------------------------|:---------------------------------------------------------|:-----------------------------------------------------|:-----------------------------------------------------------------|:------------------------------------------------------------------------|---------:|-------:|
| 0 | http://statistics.gov.scot/id/statistical-geography/S92000003 | http://reference.data.gov.uk/id/year/1997 | http://statistics.gov.scot/def/measure-properties/median | http://statistics.gov.scot/def/concept/gender/male | http://statistics.gov.scot/def/concept/working-pattern/full-time | http://statistics.gov.scot/def/concept/population-group/workplace-based | 340.8 | |
| 1 | http://statistics.gov.scot/id/statistical-geography/S92000003 | http://reference.data.gov.uk/id/year/1997 | http://statistics.gov.scot/def/measure-properties/mean | http://statistics.gov.scot/def/concept/gender/male | http://statistics.gov.scot/def/concept/working-pattern/full-time | http://statistics.gov.scot/def/concept/population-group/workplace-based | | 387.1 |
| 2 | http://statistics.gov.scot/id/statistical-geography/S92000003 | http://reference.data.gov.uk/id/year/1997 | http://statistics.gov.scot/def/measure-properties/median | http://statistics.gov.scot/def/concept/gender/male | http://statistics.gov.scot/def/concept/working-pattern/part-time | http://statistics.gov.scot/def/concept/population-group/workplace-based | 80 | |
| 3 | http://statistics.gov.scot/id/statistical-geography/S92000003 | http://reference.data.gov.uk/id/year/1997 | http://statistics.gov.sc
pandas-datacube-0.0.4.tar.gz
需积分: 1 84 浏览量
2024-03-06
13:10:16
上传
评论
收藏 10KB GZ 举报
程序员Chino的日记
- 粉丝: 2818
- 资源: 3万+
最新资源
- 基于MATLAB的钢板表面缺陷检测系统
- MS SQL里生成行政区域县区信息表和相应数据
- delphi实现DBGrid全选和反选功能
- 25C11F41-2B2A-4D1A-AAA8-7C654526B129.pdf
- Android Studio Jellyfish(android-studio-2023.3.1.18-cros.deb)
- MVC+EF框架+EasyUI实现权限管理源码程序
- python第66-75天,Day66-75.rar
- python后端服务project-of-tornado.rar
- python测验,hello-tornado.rar
- 基于SpringBoot+Vue3快速开发平台、自研工作流引擎源码设计.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈