# CRISPR_Screen_Processing
A basic working example that you can follow through to semi-automate the analysis of pooled CRISPR screen data in a uniform way using MAGeCK and/or DrugZ for enrichment/depletion analysis. See ReadMe for more details. Testing table files are templates, not working examples.
### Contents
1) Main Folder
* CRISPR.sh: bash script to process raw FASTQ files through cutadapt, bowtie 1.3, and MAGeCK (counting/RRA testing)
* drugz.sh: bash script to process count table generated by MAGeCK using the drugz.py script
* MAGeCK_ or DrugZ_ Tests_Table.txt: **tab-delimited tables that inform the above bash scripts on how to conduct enrichment/depletion tests (Row from first to last: Output Name, Treatment Groups, Control Groups)**
* Both bash scripts were run using the default settings that are indicated at the top of the scripts, except for the threads/cores option, all other settings use the defaults for the respective packages
2) Raw_FASTQ: Original (compressed) FASTQ.gz files. **You should input your files (ideally with sensible replicate/sample labels here!)**
* A script to download some sample data (Brunello plasmid library input and dropout data for 3x replicates each of BCBL1 Cas9 clonal/pooled cell lines) from the Gottwein Lab publication, "Gene essentiality landscape and druggable oncogenic dependencies in herpesviral primary effusion lymphoma" Manzano et al., Nat Comm. 2018 can be found in Nat_Comm_BCBL1_Download.sh
* To experiment, try comparing BCBL1 clonal & pooled cell lines to the plasmid input to observe dropout/essentiality using MAGeCK.sh, then use drugz.sh to characterize differences between clonal/pooled Cas9 cell lines--in theory few genes should be differential.
3) Trim_FASTQ: Cutadapt output (generated after running CRISPR.sh)
4) Libraries: sgRNA sequences/IDs and control sgRNA ID lists for MAGeCK/DrugZ
5) Bowtie: Bowtie 1.3 index files and .bam alignments of trimmed reads to library (generated after running CRISPR.sh)
6) MAGeCk: Output for MAGeCK (generated after running CRISPR.sh)
* Count table is in main folder
* Counts folder contains three subfolders
* Logs
* Other--contains median normalized versions of read count table and summary table
* R_Output--contains scripts generated by MAGeCK for rough analysis/visualization in R/RStudio
* Tests folder contains five subfolders
* Logs
* sgRNA_Results--contains tables of sgRNA-level output for each test
* Gene_Results--contains tables of gene-level output for each test
* R_Output--contains scripts generated by MAGeCK for rough analysis/visualization in R/RStudio
* Figures--Figures generated by myself or MAGeCK's R Output (code/potentially copies of data in folder)
7) DrugZ: Output for DrugZ (generated after running drugz.sh --> run CRISPR.sh first)
* Contains only a single output file (DrugZ statistical output)
* Note: You will need to download [drugz.py](https://github.com/hart-lab/drugz/) to run drugz.sh
* Note: DrugZ requires equal numbers of replicates for control/treatment groups to run.
### CRISPR.sh command-line arguments
Currently this script supports a few optional arguments with default values based on the most common usage scenarios I've come across.
* -p Cores/Threads argument passed to cutadapt, bowtie, and MAGeCK count (default = 1)
* -n Project name/file prefix for this run, passed to MAGeCK for output prefixes. (defaults to the directory name that CRISPR.sh is found in)
* -l CRISPR library to use. (defaults to Brunello)
* FASTA files/control guide tables have already been included for libraries in use by the Gottwein Lab (Brunello, Human_GeCKOv2_A, Human_GeCKOv2_B, Human_GeCKOv2_Full, Human_SAM).
* You can also download these files yourself. Files need to be in a tab-delimited format with columns of "sgRNA ID", "Sequence", and "Gene/Target" often, they will be comma-separated so just convert the commas to \t.
* You should also generate a control guide list suffixed with _controls.txt as shown in Libraries/Controls for Brunello (note: I haven't added control guide lists GeCKO/SAM yet)
* If your library does not have control guides/they are not included for some reason, you will need to modify the script at line 121 *or* provide a blank .txt file with your library name and the suffix _controls under Libraries/Controls.
* -a single adapter sequence (5' or 3') to trim, passed to cutadapt. (currently defaults to "g cgaaacaccg" which should work for most LentiCRISPRv2-based libraries if sequenced in the forward sense direction relative to transcription)
* (prefix desired sequence with g to indicate 5', a to indicate 3')
* current workflow assumes that your libraries are prepared in such a way that only one end of the read needs to be trimmed and the other can be trimmed to length (20 bp for a sgRNA, see -t)
* -t trimming length, passed to cutadapt for length trimming (defaults to 20 nt)
* -m minimum trimmed read size, passed to cutadapt for min/max read filtering (defaults to 20)
* x Non-aligned direction, [assed to Bowtie 1.3 as the alignment direction to ignore (defaults to rc, equivalent to --norc; can also take fw for --nofw)
* c Normalized method for MAGeCK (defaults to median)
### drugz.sh command-line arguments
drugz.sh supports the -l and -n arguments above, to indicate the library used and an output prefix.
### Dependencies/Other notes
MAGeCK.sh relies on the following command-line tools:
* Cutadapt (tested w/ ver 0.5.9.4)
* Bowtie (tested w/ v3.1)
* Samtools (tested w/ ver 1.9)
* Samtools frequently encounters installation issues on many Linux distros (not sure about MacOS)
* You can solve this by explicitly installing bzip2 ver 1.0.8
* MAGeCK (tested w/ ver 1.3.0; note: MAGeCK is not Windows compatible)
* I also recommend installed pigz for multi-core decompression support if you are sticking with compressed fastq.gz files
DrugZ relies on drugz.py and its dependencies (six, pandas, numpy, scipy) -- most of these except for six are usually present in most data science/bioinformatics python environments.
To setup my environment, I used conda with the following command (conda-forge and bioconda repos are needed)
conda create -n crisprenv -c conda-forge -c bioconda -c default mageck=0.5.9.4 cutadapt=3.1 bowtie=1.3.0 samtools=1.9 bzip2=1.0.8 six pandas scipy numpy
I've only tested this script using the bash and dash shells. It runs properly on bash, but encountered issues between lines 119-122 (mageck test/IFS while loop)on Ubuntu 20.04 using the default dash shell. The #!/bin/bash shebang should be preserved for that reason, as I cannot promise it will run on alternative shells, like zshell.
没有合适的资源?快使用搜索试试~ 我知道了~
CRISPR_Screen_Processing:您可以遵循一个基本的工作示例,使用MAGeCK和/或DrugZ进行统一的半自动...
共13个文件
txt:8个
sh:3个
license:1个
需积分: 36 4 下载量 106 浏览量
2021-03-11
02:21:26
上传
评论 2
收藏 4.5MB ZIP 举报
温馨提示
CRISPR_Screen_Processing 您可以遵循一个基本的工作示例,使用MAGeCK和/或DrugZ进行富集/耗竭分析,以统一的方式半自动分析汇总的CRISPR筛选数据。 有关更多详细信息,请参见自述文件。 测试表文件是模板,而不是工作示例。 内容 主资料夹 CRISPR.sh:bash脚本通过cutadapt,bowtie 1.3和MAGeCK(计数/ RRA测试)处理原始FASTQ文件 drugz.sh:bash脚本处理MAgeCK使用drugz.py脚本生成的计数表 MAGeCK_或DrugZ_ Tests_Table.txt:制表符分隔的表,这些表通知上述bash脚本如何进行富集/耗竭测试(从头到尾:输出名称,治疗组,对照组) 这两个bash脚本均使用脚本顶部指示的默认设置运行,除了thread / cores选项外,所有其他设置均使用相应软件包的默认设置。 Ra
资源详情
资源评论
资源推荐
收起资源包目录
CRISPR_Screen_Processing-main.zip (13个子文件)
CRISPR_Screen_Processing-main
drugz.sh 658B
MAGeCK_Tests_Table.txt 30B
CRISPR.sh 6KB
Nat_Comm_BCBL1_Download.sh 1KB
Libraries
Human_GeCKOv2_Full.txt 4.87MB
Controls
Brunello_controls.txt 48KB
Human_GeCKOv2_A.txt 2.6MB
Brunello.txt 4.52MB
Human_SAM.txt 3.29MB
Human_GeCKOv2_B.txt 2.27MB
DrugZ_Tests_Table.txt 30B
LICENSE 1KB
README.md 7KB
共 13 条
- 1
DaleDai
- 粉丝: 16
- 资源: 4724
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0