---
permalink: /docs/index.html
---
**The complete documentation is available at https://advestis.github.io/transparentpath/**
# TransparentPath
A class that allows one to use a path in a local file system or a gcs file system (more or less) in almost the
same way one would use a pathlib.Path object.
## Requirements
You will need credential .json file, that you can set in the envvar GOOGLE_APPLICATION_CREDENTIALS.
If your python code is launched in a google cloud instance (VM, pods, etc...), GOOGLE_APPLICATION_CREDENTIALS should
be set by default.
## Installation
You can install this package with pip :
pip install transparentpath-nightly
Or use it in a Dockerfile:
FROM advestis/transparentpath-nightly
...
## Optional packages
The vanilla version allows you to declare paths and work with them. You can use them in the builtin `open` method.
Optionally, you can also install support for several other packages like pandas, dask, etc... the currently
available optionnal packages are accessible through the follownig commands:
pip install transparentpath-nightly[pandas]
pip install transparentpath-nightly[parquet]
pip install transparentpath-nightly[hdf5]
pip install transparentpath-nightly[json]
pip install transparentpath-nightly[excel]
pip install transparentpath-nightly[dask]
you can install all of those at once
pip install transparentpath-nightly[all]
## Usage
Set TransparentPath to point to GCS:
```python
from transparentpath import TransparentPath as Path
Path.set_global_fs("gcs", bucket="bucket_name")
mypath = Path("foo") / "bar" # Will use GCS
local_path = Path("chien", fs="local") # will NOT use GCS
other_path = mypath / "stuff" # Will use GCS
other_path_2 = local_path / "stuff" # Will NOT use GCS
```
or
```python
from transparentpath import TransparentPath as Path
mypath = Path("foo", fs='gcs', bucket="my_bucket_name") # Will use GCS
local_path = Path("chien", fs="local") # will NOT use GCS
other_local_path = Path("foo2") # will NOT use GCS
```
or
```python
# noinspection PyShadowingNames
from transparentpath import TransparentPath as Path
mypath = Path("gs://my_bucket_name/foo") # Will use GCS
other_path = Path("foo2") # will NOT use GCS
```
No matter whether you are using GCS or your local file system, the following commands are valid:
```python
from transparentpath import TransparentPath as Path
# Path.set_global_fs("gcs", bucket="bucket_name", project="project_name")
# The following lines will also work with the previous line uncommented
# Reading a csv into a pandas' DataFrame and saving it as a parquet file
mypath = Path("foo") / "bar.csv"
df = mypath.read(index_col=0, parse_dates=True)
otherpath = mypath.with_suffix(".parquet")
otherpath.write(df)
# Reading and writing a HDF5 file works on GCS and on local:
import numpy as np
mypath = Path("foo") / "bar.hdf5" # can be .h5 too
with mypath.read() as ifile:
arr = np.array(ifile["store1"])
# Doing '..' from 'foo/bar.hdf5' will return 'foo'
# Then doing 'foo' + 'babar.hdf5' will return 'foo/babar.hdf5' ('+' and '/' are synonymes)
mypath.cd("..") # Does not return a path but modifies inplace
with (mypath + "babar.hdf5").write(None) as ofile:
# Note here that we must explicitely give 'None' to the 'write' method in order for it
# to return the open HDF5 file. We could also give a dict of {arr: "store1"} to directly
# write the file.
ofile["store1"] = arr
# Reading a text file. Can also use 'w', 'a', etc... also works with binaries.
mypath = Path("foo") / "bar.txt"
with open(mypath, "r") as ifile:
lines = ifile.readlines()
# open is overriden to understand gs://
with open("gs://bucket/file.txt", "r") as ifile:
lines = ifile.readlines()
mypath.is_file()
mypath.is_dir() # Specific behavior on GCS. See 'Behavior' below.
mypath.is_file()
files = mypath.parent.glob("*.csv") # Returns a Iterator[TransparentPath], can be casted to list
```
As you can see from the previous example, all methods returning a path from a TransparentPath return a
TransparentPath.
### Dask
TransparentPath supports writing and reading Dask dataframes from and to csv, excel, parquet and HDF5, both locally and
remotely. You need to have dask-dataframe and dask-distributed installed, which will be the case if you ran `pip
install transparentpath-nightly[dask]`. Writing Dask dataframes does not require any additionnal arguments to be passed
for the type will be checked before calling the appropriate writting method. Reading however requires you to pass
the *use_dask* argument to the `read()` method. If the file to read is HDF5, you will also need to specify
*set_names*, matching the argument *key* of Dask's `read_hdf()` method.
Note that if reading a remote HDF5, the file will be downloaded in your local tmp, then read. If not using Dask, the
file is deleted after being read. But since Dask uses delayed processes, deleting the file might occure before the
file is actually read, so the file is kept. Up to you to empty your /tmp directory if it is not done automatically
by your system.
Do not hesitate to read the documentation in **docs/** for more details on each method.
## Behavior
All instances of TransparentPath are absolute, even if created with relative paths.
TransparentPaths are seen as instances of str:
```python
from transparentpath import TransparentPath as Path
path = Path()
isinstance(path, str) # returns True
```
This is required to allow
```python
from transparentpath import TransparentPath as Path
path = Path()
with open(path(), "w/r/a/b...") as ifile:
...
```
to work. If you want to check whether path is actually a TransparentPath and nothing else, use
```python
from transparentpath import TransparentPath as Path
path = Path()
type(path) == Path # returns True
```
instead.
Note that your script must be able to log to GCS somehow. As mentionned before, you can use a service account json
file by setting the env var
`GOOGLE_APPLICATION_CREDENTIALS=path_to_project_cred.json`
in your .bashrc. You can also do it from within your python code with `os.environ["GOOGLE_APPLICATION_CREDENTIALS"]
=path_to_project_cred.json`. The last method is:
```python
from transparentpath import TransparentPath as Path
Path.set_global_fs("gcs", bucket="bucket", token="path_to_project_cred.json")
# AND/OR
path = Path("gs://bucket/file", token="path_to_project_cred.json")
```
If your code is running on a VM or pod on GCP, you do not need to provide any credentials.
Since the bucket name is provided in set_global_fs, you **must not** specify it in your paths unless you also
include "gs://" in front of it. You should never create a path with a directory with the same name as your current
bucket.
If your directories architecture on GCS is the same than localy up to some root directory, you can do:
```python
from transparentpath import TransparentPath as Path
Path.nas_dir = "/media/SERVEUR" # Example root path that differs between local and GCS architecture
Path.set_global_fs("gcs", bucket="my_bucket")
p = Path("/media/SERVEUR") / "chien" / "chat" # Will be gs://my_bucket/chien/chat
```
If the line *Path.set_global_fs(...* is not commented out, the resulting path will be *gs://my_bucket/chien/chat*.
If the line *Path.set_global_fs(...* is commented out, the resulting path will be */media/SERVEUR/chien/chat*.
This allows you to create codes that can run identically both localy and on gcs, the only difference being
the line 'Path.set_global_fs(...'.
Any method or attribute valid in fsspec.implementations.local.LocalFileSystem, gcs.GCSFileSystem or pathlib.Path
can be used on a TransparentPath object.
## Warnings
### Warnings about GCS behaviour
if you use GCS:
1. Remember that directories are not a thing on GCS.
2. The is_dir() method exists but, on GCS, only makes sense if tested on a part of an existing path,
i.e not on a leaf.
3. You do not need the parent directories of a
没有合适的资源?快使用搜索试试~ 我知道了~
PyPI 官网下载 | transparentpath_nightly-0.3.257.tar.gz
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 183 浏览量
2022-01-30
10:14:06
上传
评论
收藏 10.89MB GZ 举报
温馨提示
共157个文件
py:71个
txt:16个
sample:13个
资源来自pypi官网。 资源全名:transparentpath_nightly-0.3.257.tar.gz
资源推荐
资源详情
资源评论
收起资源包目录
PyPI 官网下载 | transparentpath_nightly-0.3.257.tar.gz (157个子文件)
v0.1 41B
v0.2 41B
v0.3 41B
setup.cfg 38B
config 285B
chat.csv 21B
chien.csv 21B
description 73B
Dockerfile 366B
exclude 240B
FETCH_HEAD 1KB
gh-pages 310B
gh-pages 41B
gh-pages-nightly 310B
gh-pages-nightly 41B
.gitignore 2KB
chien_pandas_multi.hdf5 10KB
chien_pandas.hdf5 7KB
chien_multi.hdf5 2KB
chien.hdf5 2KB
HEAD 219B
HEAD 24B
pack-4956d47515b9260097c1b92d5a188cfe5fb824de.idx 82KB
MANIFEST.in 9B
index 6KB
pc.is_dir 310B
pc.is_dir 41B
chien.joblib 391B
chien.json 116B
pd.load_pickle 310B
pd.load_pickle 41B
Makefile 1KB
master 310B
master 41B
README.md 13KB
pc.more_readers 310B
pc.more_readers 41B
nightly 310B
nightly 228B
nightly 41B
nightly 41B
pc.overlad_joblib_load_pickle 310B
pc.overlad_joblib_load_pickle 41B
pack-4956d47515b9260097c1b92d5a188cfe5fb824de.pack 11.06MB
chien.parquet 3KB
PKG-INFO 16KB
PKG-INFO 16KB
transparentpath.py 96KB
transparentpath.py 96KB
test_vanilla.py 13KB
test_vanilla.py 13KB
dask.py 12KB
dask.py 12KB
io.py 9KB
io.py 9KB
hdf5.py 8KB
hdf5.py 8KB
test_path_creation.py 7KB
test_path_creation.py 7KB
setup.py 4KB
parquet.py 4KB
parquet.py 4KB
methodtranslator.py 4KB
methodtranslator.py 4KB
test_pandas.py 4KB
test_pandas.py 4KB
test_hdf5.py 4KB
test_hdf5.py 4KB
excel.py 3KB
excel.py 3KB
test_json.py 3KB
test_json.py 3KB
joblib_load.py 3KB
joblib_load.py 3KB
pandas.py 3KB
pandas.py 3KB
json.py 2KB
json.py 2KB
functions.py 2KB
functions.py 2KB
test_io.py 2KB
test_io.py 2KB
test_dask.py 2KB
test_dask.py 2KB
conftest.py 2KB
conftest.py 2KB
conftest.py 1KB
conftest.py 1KB
test_fs_initialisation.py 1KB
test_fs_initialisation.py 1KB
test_multiproject.py 1KB
test_multiproject.py 1KB
test_joblib.py 1KB
test_joblib.py 1KB
test_parquet.py 1KB
test_parquet.py 1KB
test_excel.py 1014B
test_excel.py 1014B
zipfile.py 939B
zipfile.py 939B
共 157 条
- 1
- 2
资源评论
挣扎的蓝藻
- 粉丝: 12w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功