Segmentation

Regulatory Segmentation

Segmentation algorithms partition the genome into regions with distinct epigenomic profiles. These are genomic regions of similar signal pattern over a selected number of assays.

To segment the genome for each cell type we currently use either ChromHMM (Ernst et al., 2011) or Segway (Hoffman et al., 2011). These algorithms detect recurring signal patterns, called states, from a collection of genome-wide assays, such as DNase-seq and ChIP-seq, across the different cell types. They then assign a state to each basepair per epigenome. Following this stage, the 25 states are assigned a functional label, including CTCF, Distal, Heterochromatin, Open Chromatin, Transcription Factor Binding Site, Gene, Predicted Weak enhancer/Cis-reg element, Proximal, Tss, Poised and Repressed, based on a decision tree.

ChromHMM uses a multivariate hidden Markov Model, training on the binary presence or absence of signal for each assay in 200 base pair bins over the whole genome. Segway runs a dynamic Bayesian network algorithm using real-valued signal data, trained over the ENCODE pilot regions (1% of the genome), and fitted over the whole genome.

For the currect release we use ChromHMM with 25 epigenomic states with 200 bp resolution. The human genome segmentation is based on ENCODE (ENCODE Project Consortium, 2012), Roadmap Epigenomics (Roadmap Epigenomics Consortium, 2015) and BLUEPRINT data. The mouse segmentation is based on Mouse ENCODE (Mouse ENCODE Consortium, 2012) data. The assays were chosen to maximise information content about the state of the genome in each project. These assays (including control input sequencing) were coordinated across all cell types and constituted from three classes of data, which differ across projects due to data availability:

SegmentationInput Data ClassDescription
Human ENCODE/Roadmap ChromHMMOpen chromatinDNase1 hypersensitivity
Transcription factorsCTCF
Histone modificationsH3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K27ac, H3K27me3, H3K36me3, H4K20me1
Human BLUEPRINT ChromHMMHistone modificationsH3K4me1, H3K4me3, H3K9me3, H3K27ac, H3K27me3, H3K36me3
Mouse ENCODE ChromHMMHistone modificationsH3K4me1, H3K4me3, H3K9ac, H3K27ac, H3K27me3, H3K36me3

Regulatory Segmentation in the Browser

There is one segmentation track available for each of the cell types in the Ensembl Regulatory Build. These tracks are off by default. To turn on the Segmentation tracks, you need to add them using "Configure this page". They can be found in the Regulation section of "Configure this page" under Features by Cell/Tissue., and then by selecting the Configure Track Display.