# cidc-schemas
| Branch | Status |
| ----------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| [master](https://cimac-cidc.github.io/cidc-schemas/) | [![Build Status](https://travis-ci.org/CIMAC-CIDC/cidc-schemas.svg?branch=master)](https://travis-ci.org/CIMAC-CIDC/cidc-schemas) |
This repository contains formal definitions of the CIDC metadata model using [json-schema](https://json-schema.org/) syntax and vocabulary.
### View documentation at https://cimac-cidc.github.io/cidc-schemas/
## Installation
To install the latest stable version from the `release` branch, run:
```bash
pip install git+https://github.com/cimac-cidc/cidc-schemas@release
```
To install the latest development version from the `master` branch, run:
```bash
pip install git+https://github.com/cimac-cidc/cidc-schemas
```
## Development
### Project Structure
- **`cidc_schemas/`** - a python module for generating, validating, and reading manifest and assay templates.
- **`schemas/`** - json specifications defining the CIDC metadata model.
- `templates/` - schemas for generating and validating manifest and assay templates.
- `assays/` - schemas for defining assay data models.
- `artifacts/` - schemas for defining artifacts.
- **`docs/`** - the most recent build of the data model documentation, along with templates and scripts for re-generating the documentation.
- **`template_examples/`** - example populated Excel files for template specifications in `schemas/templates`, and `.csv`s auto-generated from those `.xlsx`s that allow to transparently keep track of changes in them.
- **`tests/`** - tests for the `cidc_schemas` module.
- **`.githooks/`** - git hooks, e.g. for auto-generate `.csv`s in `template_examples/`.
### Setting up git hooks
This repository contains git hooks in the `.githooks` folder. After cloning it
it's recommended to configure those hooks with
```bash
git config core.hooksPath .githooks
```
### Running tests
This repository has unit tests in the _tests_ folder. After installing dependencies
the tests can be run via the command
```bash
py.test --cache-clear tests
```
### Building documentation
To build the documentation, run the following commands:
```bash
python setup.py install # install helpers from the cidc_schemas library
python docs/generate_docs.py
```
This will output the generated html documents in `docs/docs`. If the updated docs are pushed up and merged
into master, they will be viewable at https://cimac-cidc.github.io/cidc-schemas/.
## Using the Command-Line Interface
This project comes with a command-line interface for validating schemas and generating/validating assay and manifest templates.
### Install the CLI
Clone the repository and cd into it
```bash
git clone git@github.com:CIMAC-CIDC/cidc-schemas.git
cd cidc-schemas
```
Install the `cidc_schemas` package (this adds the `cidc_schemas` CLI to your console)
```bash
python setup.py install
```
Run `cidc_schemas --help` to see available options.
If you're making changes to the module and want those changes to be reflected in the CLI without reinstalling the `cidc_schemas` module every time, run
```bash
python3 -m cidc_schemas.cli [args]
```
### Generate templates
Create a template for a given template configuration.
```bash
cidc_schemas generate_template -m templates/manifests/pbmc_template.json -o pbmc.xlsx
```
### Validate filled-out templates
Check that a populated template file is valid with respect to a template specification.
```bash
cidc_schemas validate_template -m templates/manifests/pbmc_template.json -x template_examples/pbmc_template.xlsx
```
### Validate JSON schemas
Check that a JSON schema conforms to the JSON Schema specifications.
```bash
cidc_schemas validate_schema -f shipping_core.json
```
### Convert between yaml and json
The CLI comes with a little utility for converting between yaml and json files.
```bash
cidc_schemas convert --to_json <some_yaml_file>
```
## Creating New Excel Templates
### Workflow
A new manifest or assay template has been "added" to the repository once these three things are true:
- A file `schemas/templates/<TEMPLATE TYPE>/<TEMPLATE NAME>.json` exists specifying the template schema.
- A file `template_examples/<TEMPLATE NAME>.xlsx` exists containing a populated example Excel template corresponding to the template schema.
- Running `pytest tests/test_templates.py` generates no errors related to this template.
Here's the recommended workflow for achieving those three things:
1. Create a new git branch and switch to it (naming your branch something like `template-dev-<TEMPLATE NAME>`):
```bash
git checkout -b <YOUR BRANCH NAME>
```
2. On this branch, create a new template schema called `<TEMPLATE NAME>.json` in the `schemas/templates/<template-type>` directory. See the template schema structure section below for guidance.
3. Use the CLI to generate an empty Excel template from your schema, and visually verify that the generated template accords with your expectations. Iteratively edit the schema and regenerate the Excel template until you are satisfied.
4. Fill out the generated Excel template with some valid sample values, and place that file in `template_examples` with the name `<TEMPLATE NAME>.xlsx`.
- **Note**: by this point, you should have created two files:
1. `schemas/templates/<TEMPLATE TYPE>/<TEMPLATE NAME>.json`
2. `template_examples/<TEMPLATE NAME>.xlsx`
5. Ensure that `pytest tests/test_templates.py` raises no errors related to this template.
6. Commit and push your changes.
```bash
# Add the two files you've created
git add schemas/templates/<TEMPLATE TYPE>/<TEMPLATE NAME>.json template_examples/<TEMPLATE NAME>.xlsx
git commit -m "Added template for <TEMPLATE NAME>"
git push -u origin <YOUR BRANCH NAME>
```
7. Navigate to GitHub and create a pull request. Get feedback on your template.
8. Once your pull request is approved, merge your changes into master. All done!
### Template Schema Structure
The current template generator can create empty Excel workbooks with arbitrarily many worksheets in them from JSON schemas. Every worksheet in the template has the same high level structure made up of two sections:
1. **`preamble_rows`**: a set of key-value rows appearing at the top of the worksheet. This is a good place to have template users input data that applies to, e.g., the entire batch of samples.
2. **`data_columns`**: a data table that appears below `preamble_rows` in each worksheet, containing data headers with columns beneath them with multiple data entries for each header. Data columns are grouped into subtables, where a set of column headers will have one shared header above them (e.g., the shared header "Filled by Biorepository" should appear above all data columns that the biorepository will fill out). This is a good place to have template users input data that will be different for, e.g., each sample.
**Note**: Either of these sections can be omitted from a given worksheet.
The template generator expects JSON schemas with the following structure:
```
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": <a unique id for this template>,
"title": <title appearing at top of every worksheet>,
"description": <a statement about this template to appear in documentation>
"properties": {
"worksheets": {
<worksheet name>: {
"preamble_rows": {
<field name>: <value schema>,
<field name>: <value schema>,
...
},
"data_columns": {
<subtable header>: {
<field name>: <value schema>,
<field name>: <v
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
PyPI 官网下载 | cidc_schemas-0.8.3.tar.gz (125个子文件)
setup.cfg 395B
MANIFEST.in 314B
CT_1.json 15KB
pbmc_template.json 14KB
plasma_template.json 14KB
sample.json 12KB
CT_1PA_multiWES.json 11KB
cytof_template.json 8KB
ihc_template.json 7KB
olink_template.json 5KB
wes_template.json 4KB
shipping_core.json 3KB
cytof_entry.json 3KB
participant.json 3KB
micsss_antibody.json 3KB
artifact_core.json 2KB
ihc_entry.json 2KB
aliquot.json 2KB
CT_minimal.json 2KB
olink_entry.json 2KB
clinical_trial.json 2KB
available_assays.json 2KB
image.json 2KB
rna_expression_assay.json 1KB
ngs_assay_record.json 1KB
mif_antibody.json 1KB
ngs_assay_core.json 1KB
mapping.json 1KB
mif_input.json 1KB
wes_assay.json 1KB
cytof_antibody.json 1KB
antibody.json 1KB
cytof_assay.json 1019B
artifact_image.json 984B
imaging_data.json 965B
artifact_npx.json 963B
olink_assay.json 864B
rna_expression_entry.json 863B
user.json 817B
artifact_csv.json 816B
micsss_input.json 792B
fastq_pair_and_mapping.json 790B
ihc_assay.json 781B
micsss_assay.json 753B
mif_assay.json 732B
ihc_antibodies.json 730B
cytof_analysis.json 717B
ihc_input.json 664B
micsss_entry.json 663B
composite_image.json 642B
wes_assay_record.json 638B
mif_entry.json 618B
olink_input.json 592B
olink_combined.json 590B
wes_export.json 583B
wes_output.json 569B
alignment.json 526B
artifact_astrolabe.json 465B
cytof_input.json 463B
artifact_binary.json 445B
assay_core.json 441B
artifact_xlsx.json 439B
artifact_text.json 437B
somatic.json 423B
artifact_fastq.json 408B
artifact_bam.json 400B
artifact_maf.json 400B
artifact_vcf.json 400B
local_file.json 368B
germline.json 343B
rna_input.json 262B
1.json 144B
2.json 141B
a.json 138B
b.json 138B
c.json 127B
3.json 127B
LICENSE 1KB
README.md 9KB
not-zip-safe 1B
PKG-INFO 617B
PKG-INFO 617B
test_prism.py 40KB
prism.py 32KB
json_validation.py 15KB
test_assays.py 13KB
template.py 11KB
test_json_validation.py 11KB
template_reader.py 11KB
template_writer.py 10KB
test_template_reader.py 7KB
cli.py 5KB
util.py 5KB
test_trial_core.py 5KB
test_template.py 4KB
test_artifacts.py 4KB
test_util.py 3KB
test_templates.py 3KB
test_clinicaltrial_examples.py 2KB
test_unprism.py 1KB
共 125 条
- 1
- 2
资源评论
挣扎的蓝藻
- 粉丝: 13w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功