swh-indexer
============
Tools to compute multiple indexes on SWH's raw contents:
- content:
- mimetype
- ctags
- language
- fossology-license
- metadata
- revision:
- metadata
An indexer is in charge of:
- looking up objects
- extracting information from those objects
- store those information in the swh-indexer db
There are multiple indexers working on different object types:
- content indexer: works with content sha1 hashes
- revision indexer: works with revision sha1 hashes
- origin indexer: works with origin identifiers
Indexation procedure:
- receive batch of ids
- retrieve the associated data depending on object type
- compute for that object some index
- store the result to swh's storage
Current content indexers:
- mimetype (queue swh_indexer_content_mimetype): detect the encoding
and mimetype
- language (queue swh_indexer_content_language): detect the
programming language
- ctags (queue swh_indexer_content_ctags): compute tags information
- fossology-license (queue swh_indexer_fossology_license): compute the
license
- metadata: translate file into translated_metadata dict
Current revision indexers:
- metadata: detects files containing metadata and retrieves translated_metadata
in content_metadata table in storage or run content indexer to translate
files.
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
共94个文件
py:49个
sql:17个
txt:8个
资源分类:Python库 所属语言:Python 资源全名:swh.indexer-0.0.151.tar.gz 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059
资源推荐
资源详情
资源评论
收起资源包目录
swh.indexer-0.0.151.tar.gz (94个子文件)
swh.indexer-0.0.151
MANIFEST.in 210B
PKG-INFO 2KB
swh
indexer
journal_client.py 1KB
cli.py 8KB
data
codemeta
CITATION 490B
codemeta.jsonld 4KB
LICENSE 10KB
crosswalk.csv 15KB
metadata_detector.py 2KB
metadata_dictionary
python.py 2KB
ruby.py 4KB
npm.py 6KB
__init__.py 973B
codemeta.py 925B
maven.py 5KB
base.py 7KB
indexer.py 21KB
tests
test_journal_client.py 4KB
test_metadata.py 43KB
utils.py 24KB
conftest.py 2KB
test_mimetype.py 5KB
test_ctags.py 5KB
test_origin_head.py 6KB
test_fossology_license.py 6KB
storage
test_converters.py 5KB
test_api_client.py 1KB
generate_data_test.py 3KB
test_server.py 4KB
__init__.py 350B
test_storage.py 62KB
test_in_memory.py 425B
__init__.py 349B
test_origin_metadata.py 8KB
tasks.py 1KB
test_cli.py 12KB
metadata.py 13KB
rehash.py 6KB
storage
db.py 17KB
api
server.py 3KB
client.py 515B
wsgi.py 334B
__init__.py 0B
__init__.py 33KB
converters.py 4KB
in_memory.py 29KB
__init__.py 251B
sql
30-swh-schema.sql 6KB
10-swh-init.sql 121B
40-swh-func.sql 15KB
20-swh-enums.sql 6KB
50-swh-data.sql 0B
60-swh-indexes.sql 4KB
ctags.py 4KB
mimetype.py 4KB
codemeta.py 4KB
origin_head.py 5KB
tasks.py 2KB
fossology_license.py 5KB
__init__.py 65B
swh.indexer.egg-info
PKG-INFO 2KB
requires.txt 231B
SOURCES.txt 3KB
entry_points.txt 138B
top_level.txt 4B
dependency_links.txt 1B
version.txt 19B
requirements-swh.txt 142B
setup.cfg 38B
sql
bin
dot_add_content 393B
db-upgrade 2KB
upgrades
122.sql 347B
121.sql 475B
125.sql 2KB
120.sql 2KB
117.sql 530B
118.sql 432B
124.sql 387B
115.sql 1KB
123.sql 4KB
119.sql 3KB
116.sql 4KB
json
indexer_configuration.tool_configuration.schema.json 269B
.gitignore 8B
Makefile 283B
revision_metadata.translated_metadata.json 1KB
doc
json
indexer_configuration.tool_configuration.schema.json 269B
.gitignore 8B
Makefile 283B
revision_metadata.translated_metadata.json 1KB
requirements.txt 44B
setup.py 2KB
Makefile 163B
README.md 1KB
共 94 条
- 1
资源评论
挣扎的蓝藻
- 粉丝: 13w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功