Nucleosome positioning data and tools
Below is a manually curated collection of online resources relevant to nucleosome positioning. This list of nucleosome-positioning tools and resources is being constantly updated, comments are very welcome. For protein-DNA interaction of non-histone proteins and TFs, see the page "TF-DNA binding". Also, have a look at the "epigenetic modifications" page.
*Disclaimer: Software is listed in the order "newest first".
**How to cite: I am writing a review article based on the collection of resources below. Here is the draft of this manuscript . (Last updated: July 14, 2015). Until it is published, please refer to http://generegulation.info.
This is more than just a nucleosome peak calling initially implemented in DANPOS (mentioned below). It is a suite of several scripts including MNase-seq and ChIP-seq analysis. DANPOS was initially described in Chen et al., Genome Research, 2012.
NUCwave is a bioinformatic tool that generates nucleosome occupation maps from MNase-seq, ChIP-seq and chemical cleavage (CC-seq), both for single-end and paired-end reads. Written in Python; requires as input files in a Bowtie output format. The program is described in Briefings in Bioinformatics, 2014.
PuFFIN builds genome-wide nucleosome maps using a multi-scale (or multi-resolution) approach. The algorithm relies on a set of nucleosome "landscape" functions at different resolution levels: each function represents the likelihood of each genomic location to be occupied by a nucleosome for a particular value of the smoothing parameter. After a set of candidate nucleosomes is computed for each function, PuFFIN produces a consensus set that satisfies non-overlapping constraints and maximizes the number of nucleosomes. PuFFIN is a command line tool for accurate placing of the nucleosomes based on the pair-end reads. It was designed to place non-overlapping nucleosomes using extra length information present in pair-end data-sets. PuFFIN is written in Python, and released in 2014. It outperforms NOrMAL previously released by the same authors, and is claimed by the authors to outperform also NSeq, NPS and Template Filtering (described below). Returns nucleosome positions in the following format: <Position of the nucleosome center> <width of the peak> <confidence score> <"Fuzziness"> <Level of the curve that was used to detect nucleosome >
The authors claim that iNPS allows classifying nucleosomes by shape to reveal their different biological properties (interesting, to be tested). Described in Nature Communications, 2014
NucPosSimulator allows identifying non-overlapping nucleosome configurations that combines binary-variable analysis and a Monte Carlo approach with a simulated annealing scheme. In this manner we obtain specific nucleosome configurations and optimized solutions for the complex positioning patterns from experimental data. We apply the method to compare nucleosome positioning at transcription factor binding sites in different mouse cell types. Our method can model nucleosome translocations at regulatory genomic elements and generate configurations for simulations of the spatial folding of the nucleosome chain. See details in Schöpflin et al., 2013, Bioinformatics.
NucHunter is an algorithm that uses the data from ChIP-seq experiments to infer positioned nucleosomes. It is a versatile tool that can be used to predict positioned nucleosomes from one or multiple ChIP-seq bam files and it can be also used in conjunction with a control experiment. See details in Mammana et al., Bioinformatics, 2013.
NSeq includes a user-friendly graphical interface written in Java. It computes false discovery rates (FDRs) for candidate nucleosomes from Monte Carlo simulations, plots nucleosome coverage and centers, and exploits the availability of multiple processor cores by parallelizing its computations. The software is described in Nellore et al., 2013, Frontiers in Epigenomics and Epigenetics.
NucleoFinder addresses both the positional heterogeneity across cells and experimental biases by seeking nucleosomes consistently positioned in a cell population and showing a significant enrichment relative to a control sample. This software is written in R. See details in Becker et al., Bioinformatics, 2013.
This is a PDF flie containing listing of the Perl scripts provided by Cole et al., 2011 as supplementary materials for their research article. A more detailed experimental and theoretical protocol from this lab is available in Cole et al., 2012.
NOrMAL is a command line tool for accurate placing of the nucleosomes. It was designed to resolve overlapping nucleosomes and extract extra information ("fuzziness", probability, etc.) of nucleosome placement. To achieve this goal the tool clusters the input tags using EM learning process. The tool is written in C++. (Polishko et al, Bioinformatics, 2012).
DiNuP compares the nucleosome profiles generated by high-throughput sequencing between different conditions. DiNuP provides a statistical p-value for each identified RDNP based on the difference of read distributions. DiNuP also empirically estimates the FDR as a cutoff when two samples have different sequencing depths and differentiate reliable RDNPs from the background noise (Fu et al., Bioinformatics, 2012). DiNuP takes as input BED files.
- PING: Probabilistic inference for Nucleosome Positioning with MNase-based or Sonicated Short-read Data
PING is an R package available from the Bioconductor web site. The authors mention that PING compares favorably to NPS (see below) and TemplateFilter (see below) in scalability, accuracy and robustness to low read density. The method is described in the following paper: Zhang X, Robertson G, Woo S, Hoffman BG, Gottardo R (2012) Probabilistic Inference for Nucleosome Positioning with MNase-Based or Sonicated Short-Read Data. PLoS ONE 7(2): e32095. Also see the PING 2.0 paper in Bioinformatics by the same authors.
This is a source code and executable files of the program from the lab of Nir Friedman provided as a supplement with the following paper: Weiner, A., Hughes, A., Yassour, M., Rando, O.J. & Friedman, N. High-resolution nucleosome mapping reveals transcription-dependent promoter packaging. Genome Res 20, 90-100 (2010).
DANPOS is designed for genome-wide comparative analysis of nucleosome positioning or histone modifications. Generates occupancy data from sequencing reads. If background data is provided, does background subtraction. Normalizes occupancy data among replicates, calls occupancy peaks. If both control data and treatment data are provided, calculates differential P value at each base pair between a control group and a treatment group, calls differential peaks. Detects shifted peaks in treatment group relative to control group, as a control, also gives out a group of super-conservative peaks.
nucleR is an R/Bioconductor package for a flexible and fast recognition of nucleosome positioning from Next Generation Sequencing (NGS) and Tiling Arrays experiments. The software is integrated with standard high-throughput genomics R packages and allows for in situ visualization as well as to export results to common genome browser formats. A detailed desription can be found in the recent paper (Flores and Orozco, 2011).
NucDe is an R package mapping nucleosome-linker boundaries from both MNase-Chip and MNase-Seq data using a non-homogeneous hidden-state model based on first order differences of experimental data along genomic coordinates (Kuan et al.; 2009).
Transcription factor binding events often leave a trace pattern of nucleosome occupancy changes in which nucleosomes flanking the binding site increase in occupancy while those in the vicinity of the binding site itself are displaced. Genome wide information on enhancer proximal nucleosome occupancy can be readily acquired using ChIP-seq targeting enhancer related histone modifications such as H3K4me2. BINOCH is an attempt to use such data to infer the identity of key transcription factors that regulate the response of a cell to a stimulus or determine a program of differentiation.
Skyline is a suite of algorithms and programs for the identification of nucleosome peaks over the genome. It contains datasets from several recent publications.
NPS is a python software package that can identify nucleosome positions given histone-modification ChIP-seq or nucleosome sequencing at the nucleosome level. NPS obtains continuous wave-form that represents the enrichment of histone modifications (or nucleosomes) by extending each tag (25nt, Solexa) to 150nt in the 3’ direction and taking the middle 75, and detects the positions of nucleosomes based on Laplacian of Gaussian edge detection.
ChIPseqR is an "R" library included in the Bioconductor package. ChIPseqR provides functions to identify nucleosome positions in high-throughput sequencing Data. A formal description can be found at the Bioconductor web site. A more detailed description of the methods is available here.
A Matlab source code allowing to estimate positions from microarray data assuming that stably-positioned nucleosomes are characterized by a well-defied coverage profile. This average nucleosome profile is derived from genome-wide distribution and then the genome is scanned to find the peaks matching this profile. Currently works only with microarray data, not with Solexa data.
Ceres currently contains MNase-seq and MNase-microarray nucleosome positioning data from Shivaswamy et al, (2008) and from Lee et al, (2007). It is interesting to see that the nucleosome positions given by these two methods are not identical.
Reference: Yuan GC, Liu YJ, Dion MF, Slack MD, Wu LF, Altschuler SJ, Rando OJ. Genome-scale identification of nucleosome positions in S. Cerevisiae. Science 2005;309(5734):626-630 [PDF]
The methodology is based on the sequence-dependent anisotropic bending, which dictates how DNA is wrapped around a histone octamer. This application allows users to specify a number of options such as schemes and parameters for threading calculation and provides multiple layout formats. The web server is described in a recent paper (Alharbi et al., Genomics Proteomics Bioinformatics 12 (2014) 249–253). It seems that this web server replaces a previous release of the source code written in Perl, which required local installation (the code can be found here).
- iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition
“iNuc-PseKNC” was developed for predicting nucleosome positioning in Homo sapiens, Caenorhabditis elegans, and Drosophila melanogaster genomes, respectively. In the new predictor, the samples of DNA sequences were formulated by a novel feature-vector called “pseudo k-tuple nucleotide composition”, into which six DNA local structural properties were incorporated.
This model described in a recent issue of PNAS captures both dyad position within a few base pairs, and free binding energy within 2 k(B)T, for all the known nucleosome positioning sequences. By applying Percus's equation to the derived energy landscape, the authors isolates sequence effects on genome-wide nucleosome occupancy from other factors that may influence nucleosome positioning. The authors claim that for both in vitro and in vivo systems, three parameters suffice to predict nucleosome occupancy with correlation coefficients of respectively 0.74 and 0.66.
A new predictor, called iNuc-PhysChem, was developed for identification of nucleosomal sequences by incorporating these physicochemical properties into a 1788-D (dimensional) feature vector, which was further reduced to a 884-D vector via the IFS (incremental feature selection) procedure to optimize the feature set. It was observed by a cross-validation test on a benchmark dataset that the overall success rate achieved by iNuc-PhysChem was over 96% in identifying nucleosomal or linker sequences. The program is described in Chen et al., PLoS ONE, 2012.
Nu-OSCAR is a program that can be used to identify binding sites of known transcription factors, which further incorporates nucleosome occupancy around sites on promoter regions. The derivation of the the algorithm is based on a biophysical view of interactions between protein factors and nucleosome DNA. The program as it is now supports only yeast transcription factors. For more details about Nu-OSCAR, see their manuscript self-archived in 2007. This work is missing in Pubmed (?!).
ICM Web allows users to assess nucleosome stability and fold sequences of DNA into putative chromatin templates. It takes a DNA sequence and generates (i) a nucleosome energy level diagram, (ii) coarse-grained representations of free DNA and chromatin and (iii) plots of the helical parameters (Tilt, Roll, Twist, Shift, Slide and Rise) as a function of position.
NucEnergGen is a downloadable C++ code. The algorithm is described here.
FineStr server allows one to submit a genomic sequence and to detect positions (centers) of the nucleosomes in it. The analysis is performed using the probe based on the DNA bendability matrix of C. elegans (Gabdank et al., 2009). The probe size is 117 bases that corresponds to the assumed size of DNA-histone contact area. Although the probe is based on C. elegans matrix of bendability, the authors suggest the universality of the pattern.
NuPoP is built upon a duration hidden Markov model, in which the linker DNA length is explicitly modeled. The nucleosome or linker DNA state model can be chosen as either a 4th order or 1st orderMarkov chain. NuPoP outputs the Viterbi prediction, nucleosome occupancy score (from backward and forward algorithms) and nucleosome affinity score. NuPoP has three formats including a web server prediction engine, a stand-alone Fortran program, and an R package. The latter two can predict nucleosome positioning for a DNA sequence of any length. NuPoP_F is the Fotran program that allows customized compiling of Fortran codes.
Users can provide multiple sequences in fasta format. The length of each sequence must be between 147bp and 40kb. The algorithm is based on the assignment of specific weights for the two main features of nucleosome positioning: the repetitions of dinucleotides and the 5-nucleotide motifs. These features have been extracted from genomic data for different species. Boundary conditions are taken as in DNA-ligand binding algorithms, so the distribution depend on the boundaries.
A MATLAB code containing a wavelet analysis based model for predicting nucleosome positions from DNA sequence information described in the following references: Yuan GC, Liu JS. Genomic sequence is highly predictive of local nucleosome depletion. PLoS Computational Biology 2007 [paper]; Yuan GC. Targeted recruitment of histone modifications in humans predicted by genomic sequences. J Comput Biol 2009 Feb;16(2):341-355. [paper]
nuScore calculates deformation energy and nucleosome-positioning score
NXSensor is a tool for finding regions of DNA sequences that are likely to be nucleosome-free. The basic idea behind NXSensor is that the DNA sequence which wraps around the nuclosome needs to have a certain degree of flexibility. Regions of DNA that have several rigid sequences close to each other are likely to be nucleosome-free.
- Teif et al., 2012. Genome-wide nucleosome positioning during embryonic stem cell development. (Mouse ESCs, NPCs, MEFs, MNase-seq, paired-end)
- Nekrasov et al., 2012. H2A.Z inheritance during the cell cycle and its impact on promoter organization and dynamics. (Mouse trophoblast stem cells, ChIP-seq).
- Genome-wide mapping of nucleosome positioning and DNA methylation within individual DNA molecules. (Human, IMR90 fibroblasts).
- Gaffney et al., 2012. Genome-wide maps of nucleosome occupancy in human lymphoblastoid cell lines. (Human, MNase-seq, paired-end).
- Ku et al., 2012. H2A.Z landscapes and dual modifications in pluripotent and multipotent stem cells underlie complex genome regulatory functions. (Human, mouse; ESCs, NPCs)
- Fenouil et al., 2012. CpG islands and GC content dictate nucleosome depletion in a transcription independent manner at mammalian promoters (MNase-seq)
- Hu et al., 2011. Regulation of nucleosome landscape and transcription factor binding at enhancers by BRG1. (Human hematopoietic stem cells; + GATA1 and TAL1 transcription factors).
- Yao et al., 2011. MNase digestion-sensitive nucleosomes in Saccharomyces cerevisiae
-->not all nucleosomes are equally sensitive to MNase digestion.
- Li et al., 2011. The nucleosome map of the mouse liver cells (+ transcription factors: HP1, CBP (Crebbp), p300 (Ep300), Foxa1, Foxa2)
- Valouev et al., 2011. Genome-wide maps of nucleosome organization in human CD4+ T-cells, CD8+ T-cells, granulocytes, and from in vitro reconstitution.
- Moshkin, Verrijzer et al. (2011). "Remodelers organize cellular chromatin by counteracting intrinsic histone-DNA sequence preferences in a class-specific manner"
- Relationship between nucleosome positioning and DNA methylation. (Arabidopsis: Bisulfite-Seq and MNase-Seq)
- Gilchrist et al. (2010), pausing of RNA polymerase II disrupts DNA-specified nucleosome organization
-->MNase digestion of DNA in the absence of nucleosomes has ~30% correlation with nucleosome positioning data
2009 and before:
- Experimental Drosophila and Saccharomyces nucleosome atlas at U. Penn.
- Field et al. (2008), nucleosome distribution in S. cerevisiae | GEO accession for these data
- Experimental nucleosome positioning data for C. elegans at UCSC Genome Browser
- Valouev et al. (2008), nucleosome positioning in Caenorhabditis elegans.
- Shones et al. (2008), nucleosome destribution in resting/activated human CD4+ T cells
- Whitehouse et al. (2007), genome-wide microarrays for nucleosome positioning in S. cerevisiae
- Nucleosome Explorer at Rockefeller
- Manually curated collection of (mainly) in vitro nucleosome positioning experiments at yeastgenome.org
This special issue includes updated protocols from many leading labs in the field.
This collection contains ~30 protocols from different publications.
This protocol includespurification of human CD4+ T cells from lymphocytes and chromatinfragmentation using micrococcal nuclease (MNase) digestion,followed by chromatin immunoprecipitation (ChIP) and constructionof a library for Illumina/Solexa sequencing.
This is the supplementary materials file from the paper of Segal et al. with their methods
A user-friendly protocol, understandable for a beginner