PyPI官网下载|cloud-files-2.0.1.tar.gz资源-CSDN文库

版权申诉

111 浏览量 2022-01-31 13:14:03 上传评论收藏 55KB GZ 举报

共33个文件

py：15个

txt：6个

license：2个

资源推荐

资源详情

资源评论

收起资源包目录

cloud-files-2.0.1.tar.gz （33个子文件）

cloud-files-2.0.1

MANIFEST.in 46B

PKG-INFO 23KB

.github

workflows

test-suite.yml 891B

automated_test.py 27KB

LICENSE 2KB

setup.cfg 921B

cloud_files.egg-info

PKG-INFO 23KB

requires.txt 326B

not-zip-safe 1B

SOURCES.txt 760B

entry_points.txt 52B

top_level.txt 26B

dependency_links.txt 1B

pbr.json 46B

requirements.txt 291B

AUTHORS 129B

cloudfiles_cli

cloudfiles_cli.py 12KB

LICENSE 2KB

__init__.py 89B

setup.py 262B

ChangeLog 10KB

README.md 19KB

cloudfiles

threaded_queue.py 7KB

lib.py 4KB

interfaces.py 26KB

__init__.py 393B

cloudfiles.py 26KB

secrets.py 4KB

scheduler.py 3KB

paths.py 5KB

compression.py 6KB

exceptions.py 770B

connectionpools.py 5KB

[![PyPI version](https://badge.fury.io/py/cloud-files.svg)](https://badge.fury.io/py/cloud-files) [![Test Suite](https://github.com/seung-lab/cloud-files/workflows/Test%20Suite/badge.svg)](https://github.com/seung-lab/cloud-files/actions?query=workflow%3A%22Test+Suite%22) CloudFiles: Fast access to cloud storage and local FS. ======== ```python from cloudfiles import CloudFiles, dl results = dl(["gs://bucket/file1", "gs://bucket2/file2", ... ]) # shorthand cf = CloudFiles('gs://bucket', progress=True) # s3://, https://, and file:// also supported results = cf.get(['file1', 'file2', 'file3', ..., 'fileN']) # threaded results = cf.get(paths, parallel=2) # threaded and two processes file1 = cf['file1'] part = cf['file1', 0:30] # first 30 bytes of file1 cf.put('filename', content) cf.put_json('filename', content) cf.puts([{ 'path': 'filename', 'content': content, }, ... ]) # automatically threaded cf.puts(content, parallel=2) # threaded + two processes cf.puts(content, storage_class="NEARLINE") # apply vendor-specific storage class cf.put_jsons(...) # same as puts cf['filename'] = content for fname in cf.list(prefix='abc123'): print(fname) list(cf) # same as list(cf.list()) cf.delete('filename') del cf['filename'] cf.delete([ 'filename_1', 'filename_2', ... ]) # threaded cf.delete(paths, parallel=2) # threaded + two processes boolean = cf.exists('filename') results = cf.exists([ 'filename_1', ... ]) # threaded ``` CloudFiles was developed to access files from object storage without ever touching disk. The goal was to reliably and rapidly access a petabyte of image data broken down into tens to hundreds of millions of files being accessed in parallel across thousands of cores. The predecessor of CloudFiles, `CloudVolume.Storage`, the core of which is retained here, has been used to processes dozens of images, many of which were in the hundreds of terabyte range. Storage has reliably read and written tens of billions of files to date. ## Highlights 1. Fast file access with transparent threading and optionally multi-process. 2. Google Cloud Storage, Amazon S3, local filesystems, and arbitrary web servers making hybrid or multi-cloud easy. 3. Robust to flaky network connections. Uses exponential random window retries to avoid network collisions on a large cluster. Validates md5 for gcs and s3. 4. gzip, brotli, and zstd compression. 5. Supports HTTP Range reads. 6. Supports green threads, which are important for achieving maximum performance on virtualized servers. 7. High efficiency transfers that avoid compression/decompression cycles. 8. High speed gzip decompression using libdeflate (compared with zlib). 9. Bundled CLI tool. 10. Accepts iterator and generator input. ## Installation ```bash pip install cloud-files pip install cloud-files[test] # to enable testing with pytest ``` If you run into trouble installing dependenies, make sure you're using at least Python3.6 and you have updated pip. On Linux, some dependencies require manylinux2010 or manylinux2014 binaries which earlier versions of pip do not search for. MacOS, Linux, and Windows are supported platforms. ### Credentials You may wish to install credentials under `~/.cloudvolume/secrets`. CloudFiles is descended from CloudVolume, and for now we'll leave the same configuration structure in place. You need credentials only for the services you'll use. The local filesystem doesn't need any. Google Storage ([setup instructions here](https://github.com/seung-lab/cloud-volume/wiki/Setting-up-Google-Cloud-Storage)) will attempt to use default account credentials if no service account is provided. If neither of those two conditions apply, you need a service account credential. `google-secret.json` is a service account credential for Google Storage, `aws-secret.json` is a service account for S3, etc. You can support multiple projects at once by prefixing the bucket you are planning to access to the credential filename. `google-secret.json` will be your defaut service account, but if you also want to also access bucket ABC, you can provide `ABC-google-secret.json` and you'll have simultaneous access to your ordinary buckets and ABC. The secondary credentials are accessed on the basis of the bucket name, not the project name. ```bash mkdir -p ~/.cloudvolume/secrets/ mv aws-secret.json ~/.cloudvolume/secrets/ # needed for Amazon mv google-secret.json ~/.cloudvolume/secrets/ # needed for Google mv matrix-secret.json ~/.cloudvolume/secrets/ # needed for Matrix ``` #### `aws-secret.json` and `matrix-secret.json` Create an [IAM user service account](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users.html) that can read, write, and delete objects from at least one bucket. ```json { "AWS_ACCESS_KEY_ID": "$MY_AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY": "$MY_SECRET_ACCESS_TOKEN", "AWS_DEFAULT_REGION": "$MY_AWS_REGION" // defaults to us-east-1 } ``` #### `google-secret.json` You can create the `google-secret.json` file [here](https://console.cloud.google.com/iam-admin/serviceaccounts). You don't need to manually fill in JSON by hand, the below example is provided to show you what the end result should look like. You should be able to read, write, and delete objects from at least one bucket. ```json { "type": "service_account", "project_id": "$YOUR_GOOGLE_PROJECT_ID", "private_key_id": "...", "private_key": "...", "client_email": "...", "client_id": "...", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://accounts.google.com/o/oauth2/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "" } ``` ## API Documentation Note that the "Cloud Costs" mentioned below are current as of June 2020 and are subject to change. As of this writing, S3 and Google use identical cost structures for these operations. ### Constructor ```python # import gevent.monkey # uncomment when using green threads # gevent.monkey.patch_all(thread=False) from cloudfiles import CloudFiles cf = CloudFiles( cloudpath, progress=False, green=False, secrets=None, num_threads=20, use_https=False, endpoint=None, request_payer=None ) # cloudpath examples: cf = CloudFiles('gs://bucket/') # google cloud storage cf = CloudFiles('s3://bucket/') # Amazon S3 cf = CloudFiles('s3://https://s3emulator.com/coolguy/') # alternate s3 endpoint cf = CloudFiles('file:///home/coolguy/') # local filesystem cf = CloudFiles('mem:///home/coolguy/') # in memory cf = CloudFiles('https://website.com/coolguy/') # arbitrary web server ``` * cloudpath: The path to the bucket you are accessing. The path is formatted as `$PROTOCOL://BUCKET/PATH`. Files will then be accessed relative to the path. The protocols supported are `gs` (GCS), `s3` (AWS S3), `file` (local FS), `mem` (RAM), and `http`/`https`. * progress: Whether to display a progress bar when processing multiple items simultaneously. * green: Use green threads. For this to work properly, you must uncomment the top two lines. * secrets: Provide secrets dynamically rather than fetching from the credentials directory `$HOME/.cloudvolume/secrets`. * num_threads: Number of simultaneous requests to make. Usually 20 per core is pretty close to optimal unless file sizes are extreme. * use_https: `gs://` and `s3://` require credentials to access their files. However, each has a read-only https endpoint that sometimes requires no credentials. If True, automatically convert `gs://` to `https://storage.googleapis.com/` and `s3://` to `https://s3.amazonaws.com/`. * endpoint: (s3 only) provide an alternate endpoint than the official Amazon servers. This is useful for accessing the various S3 emulators offered by on-premises deployments of object storage. * request_payer: Specify the account that should be charged for requests towards the bucket, rather than the bucket owner. * `gs://`: `request_payer` can be any Google Cloud project id. Please refer to the do

评论收藏

内容反馈

版权申诉