CRISPR_Screen_Processing:您可以遵循一个基本的工作示例，使用MAGeCK和/或DrugZ进行统一的半自动分析合并的CRISPR筛选数据的数据，进行富集耗尽分析。有关更多详细信息，请参见自述文件。测试表文件是模板，不是工作示例资源-CSDN文库

共13个文件

txt：8个

sh：3个

license：1个

Shell

需积分: 36 106 浏览量 2021-03-11 02:21:26 上传评论 2 收藏 4.5MB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

CRISPR_Screen_Processing-main.zip （13个子文件）

CRISPR_Screen_Processing-main

drugz.sh 658B

MAGeCK_Tests_Table.txt 30B

CRISPR.sh 6KB

Nat_Comm_BCBL1_Download.sh 1KB

Libraries

Human_GeCKOv2_Full.txt 4.87MB

Controls

Brunello_controls.txt 48KB

Human_GeCKOv2_A.txt 2.6MB

Brunello.txt 4.52MB

Human_SAM.txt 3.29MB

Human_GeCKOv2_B.txt 2.27MB

DrugZ_Tests_Table.txt 30B

LICENSE 1KB

README.md 7KB

# CRISPR_Screen_Processing A basic working example that you can follow through to semi-automate the analysis of pooled CRISPR screen data in a uniform way using MAGeCK and/or DrugZ for enrichment/depletion analysis. See ReadMe for more details. Testing table files are templates, not working examples. ### Contents 1) Main Folder * CRISPR.sh: bash script to process raw FASTQ files through cutadapt, bowtie 1.3, and MAGeCK (counting/RRA testing) * drugz.sh: bash script to process count table generated by MAGeCK using the drugz.py script * MAGeCK_ or DrugZ_ Tests_Table.txt: **tab-delimited tables that inform the above bash scripts on how to conduct enrichment/depletion tests (Row from first to last: Output Name, Treatment Groups, Control Groups)** * Both bash scripts were run using the default settings that are indicated at the top of the scripts, except for the threads/cores option, all other settings use the defaults for the respective packages 2) Raw_FASTQ: Original (compressed) FASTQ.gz files. **You should input your files (ideally with sensible replicate/sample labels here!)** * A script to download some sample data (Brunello plasmid library input and dropout data for 3x replicates each of BCBL1 Cas9 clonal/pooled cell lines) from the Gottwein Lab publication, "Gene essentiality landscape and druggable oncogenic dependencies in herpesviral primary effusion lymphoma" Manzano et al., Nat Comm. 2018 can be found in Nat_Comm_BCBL1_Download.sh * To experiment, try comparing BCBL1 clonal & pooled cell lines to the plasmid input to observe dropout/essentiality using MAGeCK.sh, then use drugz.sh to characterize differences between clonal/pooled Cas9 cell lines--in theory few genes should be differential. 3) Trim_FASTQ: Cutadapt output (generated after running CRISPR.sh) 4) Libraries: sgRNA sequences/IDs and control sgRNA ID lists for MAGeCK/DrugZ 5) Bowtie: Bowtie 1.3 index files and .bam alignments of trimmed reads to library (generated after running CRISPR.sh) 6) MAGeCk: Output for MAGeCK (generated after running CRISPR.sh) * Count table is in main folder * Counts folder contains three subfolders * Logs * Other--contains median normalized versions of read count table and summary table * R_Output--contains scripts generated by MAGeCK for rough analysis/visualization in R/RStudio * Tests folder contains five subfolders * Logs * sgRNA_Results--contains tables of sgRNA-level output for each test * Gene_Results--contains tables of gene-level output for each test * R_Output--contains scripts generated by MAGeCK for rough analysis/visualization in R/RStudio * Figures--Figures generated by myself or MAGeCK's R Output (code/potentially copies of data in folder) 7) DrugZ: Output for DrugZ (generated after running drugz.sh --> run CRISPR.sh first) * Contains only a single output file (DrugZ statistical output) * Note: You will need to download [drugz.py](https://github.com/hart-lab/drugz/) to run drugz.sh * Note: DrugZ requires equal numbers of replicates for control/treatment groups to run. ### CRISPR.sh command-line arguments Currently this script supports a few optional arguments with default values based on the most common usage scenarios I've come across. * -p Cores/Threads argument passed to cutadapt, bowtie, and MAGeCK count (default = 1) * -n Project name/file prefix for this run, passed to MAGeCK for output prefixes. (defaults to the directory name that CRISPR.sh is found in) * -l CRISPR library to use. (defaults to Brunello) * FASTA files/control guide tables have already been included for libraries in use by the Gottwein Lab (Brunello, Human_GeCKOv2_A, Human_GeCKOv2_B, Human_GeCKOv2_Full, Human_SAM). * You can also download these files yourself. Files need to be in a tab-delimited format with columns of "sgRNA ID", "Sequence", and "Gene/Target" often, they will be comma-separated so just convert the commas to \t. * You should also generate a control guide list suffixed with _controls.txt as shown in Libraries/Controls for Brunello (note: I haven't added control guide lists GeCKO/SAM yet) * If your library does not have control guides/they are not included for some reason, you will need to modify the script at line 121 *or* provide a blank .txt file with your library name and the suffix _controls under Libraries/Controls. * -a single adapter sequence (5' or 3') to trim, passed to cutadapt. (currently defaults to "g cgaaacaccg" which should work for most LentiCRISPRv2-based libraries if sequenced in the forward sense direction relative to transcription) * (prefix desired sequence with g to indicate 5', a to indicate 3') * current workflow assumes that your libraries are prepared in such a way that only one end of the read needs to be trimmed and the other can be trimmed to length (20 bp for a sgRNA, see -t) * -t trimming length, passed to cutadapt for length trimming (defaults to 20 nt) * -m minimum trimmed read size, passed to cutadapt for min/max read filtering (defaults to 20) * x Non-aligned direction, [assed to Bowtie 1.3 as the alignment direction to ignore (defaults to rc, equivalent to --norc; can also take fw for --nofw) * c Normalized method for MAGeCK (defaults to median) ### drugz.sh command-line arguments drugz.sh supports the -l and -n arguments above, to indicate the library used and an output prefix. ### Dependencies/Other notes MAGeCK.sh relies on the following command-line tools: * Cutadapt (tested w/ ver 0.5.9.4) * Bowtie (tested w/ v3.1) * Samtools (tested w/ ver 1.9) * Samtools frequently encounters installation issues on many Linux distros (not sure about MacOS) * You can solve this by explicitly installing bzip2 ver 1.0.8 * MAGeCK (tested w/ ver 1.3.0; note: MAGeCK is not Windows compatible) * I also recommend installed pigz for multi-core decompression support if you are sticking with compressed fastq.gz files DrugZ relies on drugz.py and its dependencies (six, pandas, numpy, scipy) -- most of these except for six are usually present in most data science/bioinformatics python environments. To setup my environment, I used conda with the following command (conda-forge and bioconda repos are needed) conda create -n crisprenv -c conda-forge -c bioconda -c default mageck=0.5.9.4 cutadapt=3.1 bowtie=1.3.0 samtools=1.9 bzip2=1.0.8 six pandas scipy numpy I've only tested this script using the bash and dash shells. It runs properly on bash, but encountered issues between lines 119-122 (mageck test/IFS while loop)on Ubuntu 20.04 using the default dash shell. The #!/bin/bash shebang should be preserved for that reason, as I cannot promise it will run on alternative shells, like zshell.