Nucleosome positioning analysis and predictions
Below is a manually curated collection of online tools relevant to nucleosome positioning. This list is being constantly updated, comments are very welcome. The database of experimental nucleosome positioning in different cell types is moved to a separate page. For protein-DNA interaction of non-histone proteins and TFs, see the section on TF-DNA binding. Also, have a look at the epigenetic modifications section of the site.
- Tiling array analysis (Yuan et al. 2005). A Matlab code, which is complemented by MLM and NucleR packages (see below). Applicable to tiling microarray experiments for nucleosome positioning.
- TemplateFilter: Perl source code and executable files for nucleosome positioning data processing (Weiner et al. 2010). Applicable to single-end NGS sequencing.
- NPS: Nucleosome Positioning from Sequencing (Zhang et al. 2008). This is a Python based nucleosome peak caller, which is recommended for the use together with software BINOCh from the same group (see below). Applicable to single-end NGS sequencing.
- DANPOS and DANPOS2: Dynamic Analysis of Nucleosome Positioning and Occupancy by Sequencing (Chen et al. 2013). This is a Python package, which reports changes in location, fuzziness, or occupancy for a given nucleosome or any genomic region. It allows generating aggregate profile plots and heatmaps for subsets of genomic regions. Applicable to paired-end sequencing.
- nucleR: Non-parametric nucleosome positioning. This is an R package included in the Bioconductor (Flores and Orozco 2011). It allows treating both NGS and Tiling Arrays experiments. The software is integrated with standard genomics R packages and allows for in situ visualization as well as to export results to common genome browser formats. Applicable to paired-end sequencing.
- NOrMAL: Accurate nucleosome positioning using a modified Gaussian mixture model. C++ code and executables are provided for download (Polishko et al. 2012). It is a command line tool designed to resolve overlapping nucleosomes and extract extra information ("fuzziness", probability, etc.) of nucleosome placement. Newer software called PuFFIN developed by the same authors is claimed to outperform NOrMAL (see below). Applicable to paired-end sequencing.
- PING and PING 2.0: Probabilistic inference for nucleosome positioning with MNase-based or sonicated short-read data. An R package for nucleosome peak calling integrated in the Bioconductor (Zhang et al. 2012; Woo et al. 2013). The authors say that PING compares favorably to NPS and TemplateFilter in scalability, accuracy and robustness.
- BINOCh: Binding Inference from Nucleosome Occupancy Changes (He et al. 2010; Meyer et al. 2011). This is a Python package, which allows identification of putative enhancers by comparing nucleosome occupancy in two cell conditions and analyzing DNA motifs near nucleosome centres and edges. It requires as input sorted BED files and relies for peak calling on the software NPC developed by the same group. Applicable to single- and paired-end sequencing.
- NucPosSimulator: Deriving non-overlapping nucleosome configurations from MNase-seq data (Schopflin et al. 2013). It utilizes a Monte Carlo approach to determine the most probable nucleosome position in overlapping and ambiguous DNA reads from high through-put sequencing experiments. In contrast to peak-calling procedures NucPosSimulator probes many possible solutions, and can apply a Simulated Annealing scheme, a heuristic optimization method, which finds an optimal solution for complex positioning problems. Applicable to paired-end sequencing.
- MLM: A Multi-Layer Method to analyze microarray nucleosome positioning data. A Matlab code is available for download (Di Gesu et al. 2009).
- NucDe: Mapping nucleosome-linker boundaries (Kuan et al. 2009). This is an R package mapping nucleosome-linker boundaries from both MNase-ChIP-seq and MNase-seq data using a non-homogeneous hidden-state model based on first order differences of experimental data along genomic coordinates. Applicable to single-end sequencing.
- NucleoFinder: A statistical approach for the detection of nucleosome positions (Becker et al. 2013). This is an R package, which addresses both the positional heterogeneity across cells and experimental biases. Applicable to paired-end sequencing.
- ChIPseqR: Analysis of ChIP-Seq experiments using R; included in the Bioconductor R package (Humburg et al. 2011). ChIPseqR takes as input mapped reads and outputs nucleosome centres and their scores. It allows producing basic statistical graphs using standard R functions. Applicable to single-end sequencing.
- DiNuP: A systematic approach to identify regions of differential nucleosome positioning (Fu et al. 2012). DiNuP compares the nucleosome profiles generated by high-throughput sequencing between different conditions. It provides a statistical P-value for each identified differential regions and empirically estimates the False Discovery Rate (FDR) as a cutoff when two samples have different sequencing depths and differentiate differential regions from the background noise.
- NSeq: a multithreaded Java application for finding positioned nucleosomes from sequencing data (Nellore et al. 2012). NSeq includes a user-friendly graphical interface written in Java. It computes FDRs for candidate nucleosomes from Monte Carlo simulations, plots nucleosome coverage and centers, and exploits the availability of multiple processor cores by parallelizing its computations. NSeq analyzes alignment data in BAM, SAM, or BED format. It assumes that the data are single-end.
- Skyline nucleosome browser: a web-based application for the identification of nucleosome peaks over the genome (Belch et al. 2010).
- NucHunter: Inferring nucleosome positions with their histone mark annotation from ChIP-seq data (Mammana et al. 2013). It uses data from histone ChIP-seq experiments to infer positioned nucleosomes, and can predict positioned nucleosomes from one or multiple BAM files, e.g. taking into account a control experiment. Applicable to paired-end sequencing.
- Perl scripts to analyze MNase-seq experiments (Cole et al. 2012). The authors have listed the code in supplementary materials of their publication, which is useful for other developers.
- iNPS: The authors developed an improved version of the NPC nucleosome peak calling algorithm, which they claim to outperform the latter (Chen et al. 2014). Applicable to paired-end sequencing.
- NUCwave: Nucleosome occupancy maps from MNase-seq, ChIP-seq and CC-seq (Quintales et al. 2014). It is a Python package which generates nucleosome occupancy maps from MNase-seq, ChIP-seq and chemical cleavage (CC-seq), both for single-end and paired-end reads. It requires as input files in a Bowtie output format. Applicable to paired-end sequencing.
- PuFFIN: A parameter-free method to build genome-wide nucleosome maps from paired-end sequencing data (Polishko et al. 2014). PuFFIN is a command line tool for accurate placing of the nucleosomes based on the pair-end reads. It was designed to place non-overlapping nucleosomes using extra length information present in pair-end data-sets. PuFFIN is written in Python, and released in 2014. It outperforms NOrMAL previously released by the same authors, and is claimed by the authors to outperform also NSeq, NPS and Template Filtering. It returns nucleosome positions, the width of the peak, confidence score and fuzziness. Applicable to paired-end sequencing.
- NucleoATAC: A Python package for calling nucleosomes using ATAC-Seq data (Schep et al. 2015). Requires as input sorted aligned paired-end reads in BAM format, FASTA file with genome reference and sorted bed file with non-overlapping regions for which nucleosome analysis is to be performed. These regions will generally be broad open-chromatin regions. Outputs nucleosome calls and occupancy. Applicable to paired-end sequencing.
2) Software to predict preferential nucleosome positions from DNA sequence (the section below awaits a regular update, meanwhile see the latest version in this manuscript).
The methodology is based on the sequence-dependent anisotropic bending, which dictates how DNA is wrapped around a histone octamer. This application allows users to specify a number of options such as schemes and parameters for threading calculation and provides multiple layout formats. The web server is described in a recent paper (Alharbi et al., Genomics Proteomics Bioinformatics 12 (2014) 249–253). It seems that this web server replaces a previous release of the source code written in Perl, which required local installation (the code can be found here).
- iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition
“iNuc-PseKNC” was developed for predicting nucleosome positioning in Homo sapiens, Caenorhabditis elegans, and Drosophila melanogaster genomes, respectively. In the new predictor, the samples of DNA sequences were formulated by a novel feature-vector called “pseudo k-tuple nucleotide composition”, into which six DNA local structural properties were incorporated.
This model described in a recent issue of PNAS captures both dyad position within a few base pairs, and free binding energy within 2 k(B)T, for all the known nucleosome positioning sequences. By applying Percus's equation to the derived energy landscape, the authors isolates sequence effects on genome-wide nucleosome occupancy from other factors that may influence nucleosome positioning. The authors claim that for both in vitro and in vivo systems, three parameters suffice to predict nucleosome occupancy with correlation coefficients of respectively 0.74 and 0.66.
A new predictor, called iNuc-PhysChem, was developed for identification of nucleosomal sequences by incorporating these physicochemical properties into a 1788-D (dimensional) feature vector, which was further reduced to a 884-D vector via the IFS (incremental feature selection) procedure to optimize the feature set. It was observed by a cross-validation test on a benchmark dataset that the overall success rate achieved by iNuc-PhysChem was over 96% in identifying nucleosomal or linker sequences. The program is described in Chen et al., PLoS ONE, 2012.
Nu-OSCAR is a program that can be used to identify binding sites of known transcription factors, which further incorporates nucleosome occupancy around sites on promoter regions. The derivation of the the algorithm is based on a biophysical view of interactions between protein factors and nucleosome DNA. The program as it is now supports only yeast transcription factors. For more details about Nu-OSCAR, see their manuscript self-archived in 2007. This work is missing in Pubmed (?!).
ICM Web allows users to assess nucleosome stability and fold any sequence of DNA into a 3D model of chromatin. The model is displayed in JSmol or can be downloaded. ICM takes a DNA sequence and generates (i) a nucleosome energy level diagram, (ii) coarse-grained representations of free DNA and chromatin and (iii) plots of the helical parameters (Tilt, Roll, Twist, Shift, Slide and Rise) as a function of position. Options allow the user to specify a set of nucleosome positions and control the level of thermal variation. See details here.
NucEnergGen is a downloadable C++ code. The algorithm is described here.
FineStr server allows one to submit a genomic sequence and to detect positions (centers) of the nucleosomes in it. The analysis is performed using the probe based on the DNA bendability matrix of C. elegans (Gabdank et al., 2009). The probe size is 117 bases that corresponds to the assumed size of DNA-histone contact area. Although the probe is based on C. elegans matrix of bendability, the authors suggest the universality of the pattern.
NuPoP is built upon a duration hidden Markov model, in which the linker DNA length is explicitly modeled. The nucleosome or linker DNA state model can be chosen as either a 4th order or 1st orderMarkov chain. NuPoP outputs the Viterbi prediction, nucleosome occupancy score (from backward and forward algorithms) and nucleosome affinity score. NuPoP has three formats including a web server prediction engine, a stand-alone Fortran program, and an R package. The latter two can predict nucleosome positioning for a DNA sequence of any length. NuPoP_F is the Fotran program that allows customized compiling of Fortran codes.
Users can provide multiple sequences in fasta format. The length of each sequence must be between 147bp and 40kb. The algorithm is based on the assignment of specific weights for the two main features of nucleosome positioning: the repetitions of dinucleotides and the 5-nucleotide motifs. These features have been extracted from genomic data for different species. Boundary conditions are taken as in DNA-ligand binding algorithms, so the distribution depend on the boundaries.
A MATLAB code containing a wavelet analysis based model for predicting nucleosome positions from DNA sequence information described in the following references: Yuan GC, Liu JS. Genomic sequence is highly predictive of local nucleosome depletion. PLoS Computational Biology 2007 [paper]; Yuan GC. Targeted recruitment of histone modifications in humans predicted by genomic sequences. J Comput Biol 2009 Feb;16(2):341-355. [paper]
nuScore calculates deformation energy and nucleosome-positioning score
NXSensor is a tool for finding regions of DNA sequences that are likely to be nucleosome-free. The basic idea behind NXSensor is that the DNA sequence which wraps around the nuclosome needs to have a certain degree of flexibility. Regions of DNA that have several rigid sequences close to each other are likely to be nucleosome-free.
4) Experimental protocols for high-throughput nucleosome positioning experiments
This special issue includes updated protocols from many leading labs in the field.
This collection contains ~30 protocols from different publications.
This protocol includespurification of human CD4+ T cells from lymphocytes and chromatinfragmentation using micrococcal nuclease (MNase) digestion,followed by chromatin immunoprecipitation (ChIP) and constructionof a library for Illumina/Solexa sequencing.
This is the supplementary materials file from the paper of Segal et al. with their methods
A user-friendly protocol, understandable for a beginner