PyPI官网下载|camelot-py-0.7.0.tar.gz_camelot-py下载资源-CSDN文库

版权申诉

199 浏览量 2022-01-31 11:24:19 上传评论收藏 35KB GZ 举报

共30个文件

py：18个

txt：5个

pkg-info：2个

《PyPI官网下载：camelot-py-0.7.0.tar.gz——探索Python表格数据提取库》 PyPI（Python Package Index）是Python开发者的重要资源库，它提供了丰富的Python库，供全球开发者下载和使用。今天我们将关注的是其中的一个库——camelot-py-0.7.0，这是一个用于从PDF文件中提取表格数据的工具，适用于需要处理大量PDF文档中的结构化信息的项目。我们来理解一下camelot-py的核心功能。camelot-py是一个开源项目，它专门设计用于从PDF文档中识别和提取表格数据。这个库基于强大的LXML库，LXML是Python中处理XML和HTML的高效库，它提供了XPath和CSS选择器，使得解析和操作文档变得更加便捷。在camelot-py-0.7.0版本中，我们能够看到该库的持续优化和改进。0.7.0版本可能包含了性能提升、新的API接口、错误修复以及对不同PDF格式的更好支持。使用此版本，开发者可以更高效地处理具有复杂布局或不同格式的PDF文件，确保数据提取的准确性和完整性。安装camelot-py非常简单，只需要在命令行中输入以下命令： ```bash pip install camelot-py ``` 一旦安装完成，我们就可以利用camelot读取PDF中的表格了。例如，以下代码展示了如何基本使用camelot提取PDF中的表格： ```python from camelot import read_pdf # 指定PDF文件路径 tables = read_pdf('example.pdf', pages='1') # 访问提取的表格 for table in tables: # 输出表格数据 print(table.data) ``` camelot-py支持多种参数进行自定义，如`pages`参数可以指定提取特定页面的表格，`flavor`参数可以选择不同的表格识别方法（'stream'或'lattice'），`strip_text`可以移除单元格内的空白字符等。在实际应用中，camelot-py可以广泛应用于数据分析、报表自动化、信息抓取等领域。例如，金融行业可以用来自动提取财务报表，科研领域可以方便地处理实验数据，甚至在政府公开数据的分析中也能大显身手。总结来说，camelot-py-0.7.0是一个强大的Python库，它简化了从PDF中提取表格数据的过程，使得开发者无需深入了解PDF的内部结构，就能轻松处理PDF中的结构化信息。通过不断更新和优化，camelot-py将继续为Python开发者提供更加高效、灵活的数据提取解决方案。对于需要处理PDF表格数据的项目，camelot-py绝对值得你尝试和依赖。

资源推荐

资源详情

资源评论

收起资源包目录

camelot-py-0.7.0.tar.gz （30个子文件）

camelot-py-0.7.0

MANIFEST.in 68B

PKG-INFO 8KB

HISTORY.md 10KB

camelot

core.py 23KB

utils.py 22KB

cli.py 8KB

io.py 5KB

__main__.py 183B

plotting.py 7KB

image_processing.py 7KB

__init__.py 756B

ext

ghostscript

_gsprint.py 7KB

__init__.py 3KB

__init__.py 0B

handlers.py 6KB

parsers

stream.py 16KB

__init__.py 81B

lattice.py 15KB

base.py 737B

__version__.py 721B

LICENSE 1KB

camelot_py.egg-info

PKG-INFO 8KB

requires.txt 361B

SOURCES.txt 659B

entry_points.txt 45B

top_level.txt 8B

dependency_links.txt 1B

setup.cfg 209B

setup.py 2KB

README.md 6KB

<p align="center"> <img src="https://raw.githubusercontent.com/socialcopsdev/camelot/master/docs/_static/camelot.png" width="200"> </p> # Camelot: PDF Table Extraction for Humans [![Build Status](https://travis-ci.org/socialcopsdev/camelot.svg?branch=master)](https://travis-ci.org/socialcopsdev/camelot) [![Documentation Status](https://readthedocs.org/projects/camelot-py/badge/?version=master)](https://camelot-py.readthedocs.io/en/master/) [![codecov.io](https://codecov.io/github/socialcopsdev/camelot/badge.svg?branch=master&service=github)](https://codecov.io/github/socialcopsdev/camelot?branch=master) [![image](https://img.shields.io/pypi/v/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![image](https://img.shields.io/pypi/l/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![image](https://img.shields.io/pypi/pyversions/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![Gitter chat](https://badges.gitter.im/camelot-dev/Lobby.png)](https://gitter.im/camelot-dev/Lobby) **Camelot** is a Python library that makes it easy for *anyone* to extract tables from PDF files! **Note:** You can also check out [Excalibur](https://github.com/camelot-dev/excalibur), which is a web interface for Camelot! --- **Here's how you can extract tables from PDF files.** Check out the PDF used in this example [here](https://github.com/socialcopsdev/camelot/blob/master/docs/_static/pdf/foo.pdf). <pre> >>> import camelot >>> tables = camelot.read_pdf('foo.pdf') >>> tables <TableList n=1> >>> tables.export('foo.csv', f='csv', compress=True) # json, excel, html, sqlite >>> tables[0] <Table shape=(7, 7)> >>> tables[0].parsing_report { 'accuracy': 99.02, 'whitespace': 12.24, 'order': 1, 'page': 1 } >>> tables[0].to_csv('foo.csv') # to_json, to_excel, to_html, to_sqlite >>> tables[0].df # get a pandas DataFrame! </pre> | Cycle Name | KI (1/km) | Distance (mi) | Percent Fuel Savings | | | | |------------|-----------|---------------|----------------------|-----------------|-----------------|----------------| | | | | Improved Speed | Decreased Accel | Eliminate Stops | Decreased Idle | | 2012_2 | 3.30 | 1.3 | 5.9% | 9.5% | 29.2% | 17.4% | | 2145_1 | 0.68 | 11.2 | 2.4% | 0.1% | 9.5% | 2.7% | | 4234_1 | 0.59 | 58.7 | 8.5% | 1.3% | 8.5% | 3.3% | | 2032_2 | 0.17 | 57.8 | 21.7% | 0.3% | 2.7% | 1.2% | | 4171_1 | 0.07 | 173.9 | 58.1% | 1.6% | 2.1% | 0.5% | There's a [command-line interface](https://camelot-py.readthedocs.io/en/master/user/cli.html) too! **Note:** Camelot only works with text-based PDFs and not scanned documents. (As Tabula [explains](https://github.com/tabulapdf/tabula#why-tabula), "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".) ## Why Camelot? - **You are in control.**: Unlike other libraries and tools which either give a nice output or fail miserably (with no in-between), Camelot gives you the power to tweak table extraction. (This is important since everything in the real world, including PDF table extraction, is fuzzy.) - *Bad* tables can be discarded based on **metrics** like accuracy and whitespace, without ever having to manually look at each table. - Each table is a **pandas DataFrame**, which seamlessly integrates into [ETL and data analysis workflows](https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873). - **Export** to multiple formats, including JSON, Excel, HTML and Sqlite. See [comparison with other PDF table extraction libraries and tools](https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools). ## Installation ### Using conda The easiest way to install Camelot is to install it with [conda](https://conda.io/docs/), which is a package manager and environment management system for the [Anaconda](http://docs.continuum.io/anaconda/) distribution. <pre> $ conda install -c conda-forge camelot-py </pre> ### Using pip After [installing the dependencies](https://camelot-py.readthedocs.io/en/master/user/install-deps.html) ([tk](https://packages.ubuntu.com/trusty/python-tk) and [ghostscript](https://www.ghostscript.com/)), you can simply use pip to install Camelot: <pre> $ pip install camelot-py[cv] </pre> ### From the source code After [installing the dependencies](https://camelot-py.readthedocs.io/en/master/user/install.html#using-pip), clone the repo using: <pre> $ git clone https://www.github.com/socialcopsdev/camelot </pre> and install Camelot using pip: <pre> $ cd camelot $ pip install ".[cv]" </pre> ## Documentation Great documentation is available at [http://camelot-py.readthedocs.io/](http://camelot-py.readthedocs.io/). ## Development The [Contributor's Guide](https://camelot-py.readthedocs.io/en/master/dev/contributing.html) has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README. ### Source code You can check the latest sources with: <pre> $ git clone https://www.github.com/socialcopsdev/camelot </pre> ### Setting up a development environment You can install the development dependencies easily, using pip: <pre> $ pip install camelot-py[dev] </pre> ### Testing After installation, you can run tests using: <pre> $ python setup.py test </pre> ## Versioning Camelot uses [Semantic Versioning](https://semver.org/). For the available versions, see the tags on this repository. For the changelog, you can check out [HISTORY.md](https://github.com/socialcopsdev/camelot/blob/master/HISTORY.md). ## License This project is licensed under the MIT License, see the [LICENSE](https://github.com/socialcopsdev/camelot/blob/master/LICENSE) file for details.

评论收藏

内容反馈

版权申诉