Peak calling

Peak calling

Peak calling is performed using Genrich.

Genrich supports biological replicates. Therefore, we first merge compatible technical replicates from the previous alignment steps to produce one alignment file per biosample.

We use Genrich for calling peaks for ATAC-seq and ChIP-seq. The following parameters are in common:

-r: Specifies PCR duplicates should be removed
-e MT: Reads aligning to the mitochondrial genome are excluded
-E: Regions to exclude from peak-calling (we exclude repeat masked regions)
-s: Keep secondary alignments with Alignment Score (AS) >= bestAS - value. We use a value of 20.

ATAC-seq

In addition, ATAC-seq uses:

-j: ATAC-seq mode
-q: We use a q-value of 0.1 as the threshold for peak-calling.

ChIP-seq narrow peaks

Narrow peaks are ChIP-seq transcription factor targets and histone marks H3K27ac, H3K4me2, H3K4me3 and H3K9ac.

-y: Keep unpaired alignments. This is necessary because some ChIP-seq experiments have single-end reads. This does not affect how paired alignments are processed since our alignment step for paired-end uses only properly paired reads.
-q: We use a q-value of 0.1 as the threshold for narrow peaks.

ChIP-seq broad peaks

Broad peaks are ChIP-seq histone marks H3K27me3, H3K36me3 and H3K4me1. For these, we create a gapped peak track, which annotates broad peaks with peaks called with more restrictive parameters.

Broad peaks are called with:

-p: We use a p-value of 0.1 as the threshold for broad peaks.
-g: The maximum distance between significant sites is relaxed from 100 to 200 for broad peaks.
-a: The minimum AUC for a broad peak is increased from 200 to 800.

Narrow peaks within these broad peaks are added based on:

-p: We use a p-value of 0.05 as the threshold for narrow peaks within gapped peaks.