# SignalP 6.0
Signal peptide prediction model based on a [Bert protein language model encoder](https://github.com/agemagician/ProtTrans) and a conditional random field (CRF) decoder.
This is the development codebase. If you are looking for the prediction service, go to https://services.healthtech.dtu.dk/service.php?SignalP-6.0.
The installation instructions for the installable SignalP 6.0 prediction tool can be found [here](https://github.com/fteufel/signalp-6.0/blob/main/installation_instructions.md).
Install in editable mode with `pip install -e ./` to experiment.
## Data
The training dataset as well as the full dataset before homology partitioning are in `data`. The directory additionally contains the extended vocabulary of the ProtTrans `BertTokenizer` used.
## Training
You can find the training script in `scripts/train_model.py`. The pytorch model is in `src/signalp6/models`. Please refer to the training script source for the meaning of all parameters.
A basic training command looks like this:
```
python3 scripts/train_model.py --data data/train_set.fasta --test_partition 0 --validation_partition 1 --output_dir testruns --experiment_name testrun1 --remove_top_layers 1 --kingdom_as_token --sp_region_labels --region_regularization_alpha 0.5 --constrain_crf --average_per_kingdom
```
## Other things in package
- `training_utils` contains parts that were used to fit the model, e.g. dataloading and regularization.
- `utils` contains other utilities, such as functions to calculate metrics or region statistics.
没有合适的资源?快使用搜索试试~ 我知道了~
多类信号肽预测和结构解码模型。_Python_Jupyter Notebook_下载.zip
共54个文件
py:21个
faa:18个
fasta:4个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 197 浏览量
2023-04-27
10:55:28
上传
评论
收藏 12.42MB ZIP 举报
温馨提示
多类信号肽预测和结构解码模型。_Python_Jupyter Notebook_下载.zip
资源推荐
资源详情
资源评论
收起资源包目录
多类信号肽预测和结构解码模型。_Python_Jupyter Notebook_下载.zip (54个子文件)
signalp-6.0-main
setup.py 985B
src
signalp6
__init__.py 21B
training_utils
label_processing_utils.py 12KB
__init__.py 312B
smart_optim.py 13KB
datasets.py 36KB
cosine_similarity_regularization.py 5KB
utils
online_analysis_utils.py 8KB
__init__.py 272B
metrics_utils.py 25KB
kingdom_utils.py 8KB
region_similarity.py 12KB
__main__.py 0B
models
__init__.py 67B
multi_tag_crf.py 34KB
bert_crf.py 19KB
data
reference_genomes
sspeciespcc6803.faa 1.39MB
tthermophilus_proteins.faa 789KB
mgenitalium_proteins.faa 215KB
ecoli_proteins.faa 1.66MB
mtuberculosis_proteins.faa 1.56MB
baphidicola_proteins.faa 227KB
tmaritima_proteins.faa 701KB
hvolcanii_proteins.faa 1.33MB
dradiodurans_proteins.faa 1.07MB
sruminantium_proteins.faa 1.17MB
cglutamicum_proteins.faa 1.12MB
psyntrophicum_proteins.faa 1.52MB
mjannaschii_proteins.faa 626KB
bsubtilis_proteins.faa 1.6MB
sacidocaldarius_proteins.faa 801KB
tvolcanium_proteins.faa 553KB
paeruginosa_proteins.faa 2.18MB
pfalciparum_proteins.faa 851KB
transformer_generated_sps.csv 45KB
tokenizer
tokenizer_config.json 198B
special_tokens_map.json 112B
vocab.txt 184B
train_set.fasta 3.26MB
remainder.fasta 872KB
before_partitioning.fasta 4.01MB
example_seqs.fasta 2KB
LICENSE 11KB
experiments
investigate_sp_regions.ipynb 85KB
media
sp_regions.png 153KB
installation_instructions.md 9KB
requirements.txt 160B
.gitignore 90B
README.md 2KB
scripts
make_plots.py 46KB
train_model.py 41KB
cross_validate.py 4KB
average_viterbi_decode.py 6KB
distill_model.py 18KB
共 54 条
- 1
资源评论
快撑死的鱼
- 粉丝: 1w+
- 资源: 9154
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功