Interested in quality control of sequencing data? You should be! And a new blog will answer all the questions you were too afraid to ask. Here is an example:
Since the movie ’50 shades of Grey’ is about to be released I thought this is the perfect opportunity to introduce everybody to the concept of “grey lists” and the recent R package developed by Gordon Brown at my institute: The GreyListChIP R package!
ChIP-seq and many other NextGen sequencing experiments (e.g. MNase-seq, DNase-seq, FAIRE-seq) often produce artifact signal in certain regions of the genome. These so called blacklisted regions are often found at repeat elements (such as satellite, centromeric and telomeric repeats), and show unstructured and high signal (excessive pile up of reads) independently of cell line and experiment type. The ENCODE project generated two sets of human blacklists (the DAC and DUKE regions, see here: http://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeMapability).
Blacklisted regions are known to present problems for fragment length estimation and signal normalisation between samples, and although often found at repeat elements, reads typically map uniquely to these regions…
View original post 556 more words