PyPI官网下载|mordl-1.0.7.tar.gz资源-CSDN文库

版权申诉

165 浏览量 2022-01-13 08:31:48 上传评论收藏 56KB GZ 举报

共28个文件

py：20个

txt：4个

pkg-info：2个

《PyPI官网下载 | mordl-1.0.7.tar.gz：深入解析Python库的安装与使用》 PyPI（Python Package Index）是Python开发者的重要资源库，它提供了无数的开源Python库供全球的程序员使用。在本篇文章中，我们将深入探讨如何从PyPI官网下载并安装名为“mordl”的Python库，以及这个库的基本功能和可能的应用场景。让我们关注“mordl-1.0.7.tar.gz”这个文件。这是PyPI上发布的mordl库的1.0.7版本，其文件格式为tar.gz。这种格式是一种常见的压缩方式，它将文件打包成tar文件后再进行gzip压缩，以减小存储空间。在使用前，你需要先解压这个文件，可以使用Unix或Linux命令行中的“tar -zxvf mordl-1.0.7.tar.gz”命令，或者在Windows上使用7-Zip等工具完成解压。解压后，你会得到一个名为“mordl-1.0.7”的目录，其中包含了库的所有源代码、文档、测试文件等。对于开发者来说，这是一个很好的学习资源，可以查看源代码了解库的实现细节。如果你打算安装此库，通常会通过Python的包管理器pip来完成。在命令行中输入“pip install ./mordl-1.0.7”（前提是当前目录包含解压后的目录），pip会自动处理构建、编译（如果需要）以及安装过程。关于mordl库本身，虽然具体功能没有在描述中明确给出，但我们可以推测这可能是一个机器学习相关的库，因为“ml”通常代表“machine learning”。Python在机器学习领域有丰富的库，如scikit-learn、TensorFlow和PyTorch等。mordl可能是为特定任务或算法提供便利的工具，例如数据预处理、模型训练或评估。为了深入了解mordl库的功能，我们需要查看库的官方文档或源代码。通常，开发者会在库的根目录下提供一个README文件，里面会有库的简介、安装指南、使用示例等信息。此外，库的文档通常包括API参考，帮助用户理解如何调用库中的函数和类。在实际应用中，mordl库可能被用于各种场景，比如数据分析、预测建模、自然语言处理等。它可能提供了简化常见机器学习任务的接口，使得非专业领域的开发者也能轻松使用。如果你是数据科学家或Python开发者，了解和掌握这样的库能大大提高工作效率，让你在项目中快速实现目标。 PyPI上的“mordl-1.0.7.tar.gz”是一个值得探索的Python库，通过了解和使用它可以拓展你的编程技能，特别是在机器学习领域。记得在使用任何第三方库时，遵循最佳实践，确保库的安全性和兼容性，同时也要尊重并遵守开源许可证的规定。

资源推荐

资源详情

资源评论

收起资源包目录

mordl-1.0.7.tar.gz （28个子文件）

mordl-1.0.7

setup.cfg 42B

README.md 7KB

mordl

base_tagger.py 43KB

base_tagger_model.py 9KB

feats_tagger.py 32KB

upos_tagger_model.py 3KB

feat_tagger_model.py 4KB

base_tagger_sequence_model.py 8KB

base_model.py 7KB

upos_tagger.py 10KB

_version.py 22B

word_embeddings.py 40KB

__init__.py 678B

deprel_tagger_model.py 4KB

ne_tagger.py 1KB

feat_tagger.py 15KB

lib

conll18_ud_eval.py 27KB

__init__.py 0B

lemma_tagger.py 25KB

defaults.py 366B

deprel_tagger.py 22KB

mordl.egg-info

dependency_links.txt 1B

PKG-INFO 9KB

SOURCES.txt 607B

top_level.txt 6B

requires.txt 100B

PKG-INFO 9KB

setup.py 2KB

<h2 align="center">MorDL: Morphological Parser (POS, lemmata, NER etc.)</h2> <a name="start"></a> [![PyPI Version](https://img.shields.io/pypi/v/mordl?color=blue)](https://pypi.org/project/mordl/) [![Python Version](https://img.shields.io/pypi/pyversions/mordl?color=blue)](https://www.python.org/) [![License: BSD-3](https://img.shields.io/badge/License-BSD-brightgreen.svg)](https://opensource.org/licenses/BSD-3-Clause) ***MorDL*** is a tool to organize a pipeline for complete morphological sentence parsing (POS-tagging, lemmatization, morphological feature tagging) and Named-entity recognition. Scores (accuracy) on *SynTagRus*: UPOS: `99.15%`; FEATS: `98.28%` (tokens), `98.86%` (tags); LEMMA: `99.13%`. In all experiments we used `seed=42`. Some other `seed` values may help to achive better results. Models' hyperparameters are also allowed to tune. The validation with the [official evaluation script](http://universaldependencies.org/conll18/conll18_ud_eval.py) of [CoNLL 2018 Shared Task](https://universaldependencies.org/conll18/results.html): * For inference on the *SynTagRus* test corpus, when predicted fields were emptied and all other fields were stayed intact, the scores are the same as outlined above. * Serial inference with UPOS - FEATS - LEMMA taggers resulted with scores: - UPOS: `99.15%`; UFeats: `97.75%`; AllTags: `98.55`; Lemmas: `98.57%` for the taggers trained on the original *SynTagRus* corpus; - UPOS: `99.15%`; UFeats: `97.76%`; AllTags: `98.53`; Lemmas: `98.58%` for the taggers trained serially on the *SynTagRus* corpus processed by previous taggers (UPOS tagger for FEATS; UPOS and FEATS taggers for LEMMA). For completeness, we included that script in our distribution, so you can use it for your model evaluation, too. To simplify it, we also made a wrapper [`mordl.conll18_ud_eval`](https://github.com/fostroll/mordl/blob/master/doc/README_SUPPLEMENTS.md#conll18) for it. ## Installation ### pip ***MorDL*** supports *Python 3.5* or later. To install via *pip*, run: ```sh $ pip install mordl ``` If you currently have a previous version of ***MorDL*** installed, run: ```sh $ pip install mordl -U ``` ### From Source Alternatively, you can install ***MorDL*** from the source of this *git repository*: ```sh $ git clone https://github.com/fostroll/mordl.git $ cd mordl $ pip install -e . ``` This gives you access to examples that are not included in the *PyPI* package. ## Usage Our taggers use separate models, so they can be used independently. But to achieve best results FEATS tagger uses UPOS tags during training. And LEMMA and NER taggers use both UPOS and FEATS tags. Thus, for a fully untagged corpus, the tagging pipeline is serially applying the taggers, like shown below (assuming that our goal is NER and we already have trained taggers of all types): ```python from mordl import UposTagger, FeatsTagger, NeTagger tagger_u, tagger_f, tagger_n = UposTagger(), FeatsTagger(), NeTagger() tagger_u.load('upos_model') tagger_f.load('feats_model') tagger_n.load('misc-ne_model') tagger_n.predict( tagger_f.predict( tagger_u.predict('untagged.conllu') ), save_to='result.conllu' ) ``` Any tagger in our pipeline may be replaced with a better one if you have it. The weakness of separate taggers is that they take more space. If all models were created with BERT embeddings, and you load them in memory simultaneously, they may eat up to 9Gb on GPU. Or even more, if you use them as a part of a multiprocess server (for example, as a part of *Flask* application). In that case, during loading you have to use params **device** and **dataset_device** to distribute your models on various GPUs. Alternatively, if you need just to tag some corpus once, you may load models serially: ```python tagger = UposTagger() tagger.load('upos_model') tagger.predict('untagged.conllu', save_to='result_upos.conllu') del tagger # just for sure tagger = FeatsTagger() tagger.load('feats_model') tagger.predict('result_upos.conllu', save_to='result_feats.conllu') del tagger tagger = NeTagger() tagger_n.load('misc-ne_model') tagger.predict('result_feats.conllu', save_to='result.conllu') del tagger ``` Don't use identical names for input and output file names when you call the `.predict()` methods. Normally, there will be no problem, because the methods by default load all input file in memory before tagging. But if the input file is large, you may want to use **split** parameter for that the methods handle the file by parts. In that case, saving of the first part of the tagging data occurs before loading next. So, identical names will entail data loss. Training process is also simple. If you have training corpora and you don't want any experiments, just run: ```python from mordl import UposTagger tagger = UposTagger() tagger.load_train_corpus(train_corpus) tagger.load_test_corpus(dev_corpus) stat = tagger.train('upos_model', device='cuda:0', word_emb_tune_params={}) ``` It is training pipeline for the UPOS tagger; pipelines for other taggers are identical. If you want to train the model again without re-training word embeddings anew to possibly achieve better results, set the **word_emb_tune_params** to `None`. For a more complete understanding of ***MorDL*** toolkit usage, refer to the Python notebook with pipeline examples in the `examples` directory of the ***MorDL*** GitHub repository. Also, the detailed descriptions are available in the docs: [***MorDL*** Basics](https://github.com/fostroll/mordl/blob/master/doc/README_BASICS.md#start) [Part of Speech Tagging](https://github.com/fostroll/mordl/blob/master/doc/README_POS.md#start) [Single Feature Tagging](https://github.com/fostroll/mordl/blob/master/doc/README_FEAT.md#start) [Multiple Feature Tagging](https://github.com/fostroll/mordl/blob/master/doc/README_FEATS.md#start) [Lemmata Prediction](https://github.com/fostroll/mordl/blob/master/doc/README_LEMMA.md#start) [Named-entity Recognition](https://github.com/fostroll/mordl/blob/master/doc/README_NER.md#start) [Supplements](https://github.com/fostroll/mordl/blob/master/doc/README_SUPPLEMENTS.md#start) This project was developed with a focus on Russian language, but a few nuances we used are unlikely to worsen the quality of processing other languages. ***MorDL's*** supports [*CoNLL-U*](https://universaldependencies.org/format.html) (if input/output is a file), or [*Parsed CoNLL-U*](https://github.com/fostroll/corpuscula/blob/master/doc/README_PARSED_CONLLU.md) (if input/output is an object). Also, ***MorDL's*** allows [***Corpuscula***'s corpora wrappers](https://github.com/fostroll/corpuscula/blob/master/doc/README_CORPORA.md) as input. ## License ***MorDL*** is released under the BSD License. See the [LICENSE](https://github.com/fostroll/mordl/blob/master/LICENSE) file for more details.

评论收藏

内容反馈

版权申诉