Gene Regulation Info

Quantitative approaches for gene regulation

  • Increase font size
  • Default font size
  • Decrease font size
Epigenetic data & tools online


Protein-DNA binding: data, tools & models

Gene regulation is very complex, but since DNA provides a one-dimensional template for protein binding, most events can be described with one-dimensional lattice models. Basic features of DNA-protein-drug binding encountered in gene regulation include site specificity determined by the DNA sequence; binding site overlapping; competitions between different protein types or different binding modes; interactions between proteins bound to the DNA; multilayer binding (when a protein bound to the DNA presents a lattice for the next-layer binding of other proteins), and protein-assisted DNA looping (Teif, NAR 2007; Teif, BJ 2010). In chromatin, additional complex elements such as nucleosomes, remodelers and higher-order chromatin structures should be taken into account (Teif and Rippe, NAR 2009, Teif and Rippe, JPCM 2010).

Below is an annotated list of online resources where the parameters may be obtained for calculations

 

Protein-DNA binding databases (thermodynamics & weight matrices):

The JASPAR CORE database contains a curated, non-redundant set of profiles, derived from published collections of experimentally defined transcription factor binding sites for eukaryotes. The prime difference from TRANSFAC is the open data acess.

KDBI is a collection of experimentally determined kinetic data of protein-protein, protein-RNA, protein-DNA, protein-ligand, RNA-ligand, DNA-ligand binding events described in the literature. Currently, KDBI contains 19,263 records. (Feb 2010).

ProNIT currently contains more than 4900 entries. Each entry has the protein and nucleic acid information, experimental conditions and the following binding thermodynamic data: dissociation constant Kd, energies, stoichiometry of binding and activity (Km and kcat).

UniPROBE contains data on the preferences of proteins for all possible sequence variants ('words') of length k ('k-mers'), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. In total, the database currently hosts DNA binding data for 391 nonredundant proteins (individual proteins or in some cases heterodimers) from a diverse collection of organisms.

TRANSFAC consists of free and paid sections. I did nto check the paid section. Human TF weight matrices may be viewed through the web interface of UCSC Genome Browser. Provided binding sites are experimentally proved.


Last Updated on Friday, 13 August 2010 20:23
 

Epigenetic modifications databases

Histone and DNA post-translational modification databases:

  • HHMD - Human Histone Modification Database (data sources include ENCODE, Zhao lab and some others)
  • Zhao Lab (NIH) - Genome-wide mapping of histone H3 modifications in human CD4+ T cells:

| Methylation, H2A.Z, CTCF & Pol II occupancies (Barski et al., 2007) | Actetylation data (Wang et al., 2008)|

  • HistoneHits - phenotypes for systematic collections of histone mutants
  • MethDB - the database for DNA methylation and environmental epigenetic effects
  • EpiGRAPH - software for advanced (epi-) genome analysis and prediction
  • ChromatinDB - visualising histone modification data (last update in 2007?)
  • Also have a look here
Last Updated on Tuesday, 03 August 2010 12:23
 

Nucleosome positioning data and prediction tools

In eukaryotes, the wrapping of DNA around the histone octamer complex has long been recognized as a mechanism to regulate DNA access for other proteins. Nucleosome positions are determined by three major contributions: the intrinsic binding affinity of the histone octamer depends on the DNA sequence, competitive/cooperative binding of other protein factors, and active translocation by ATP-dependent remodeling complexes. The challenge is to take into accout all these contributions and predict nucleosome repositioning in gene regulation in vivo (Teif & Rippe, NAR 2009; Teif and Rippe, JPCM 2010). Below is a collection of annotated online resources, which deal with nucleosome positioning. This list is being constantly updated, comments are very welcome.

 

1) Software to analyze ChIP-Chip and ChIP-Seq experiments on nucleosome positioning:

NPS is a python software package that can identify nucleosome positions given histone-modification ChIP-seq or nucleosome sequencing at the nucleosome level. NPS obtains continuous wave-form that represents the enrichment of histone modifications (or nucleosomes) by extending each tag (25nt, Solexa) to 150nt in the 3’ direction and taking the middle 75, and detects the positions of nucleosomes based on Laplacian of Gaussian edge detection.

ChIPseqR is an "R" library included in the Bioconductor package. ChIPseqR provides functions to identify nucleosome positions in high-throughput sequencing Data. A formal description can be found at the Bioconductor web site. A more detailed description of the methods is in the author's Ph.D. thesis .

A Matlab source code allowing to estimate positions from microarray data assuming that stably-positioned nucleosomes are characterized by a well-defied coverage profile. This average nucleosome profile is derived from genome-wide distribution and then the genome is scanned to find the peaks matching this profile. Currently works only with microarray data, not with Solexa data.

Ceres currently contains MNase-seq and MNase-microarray nucleosome positioning data from Shivaswamy et al, (2008) and from Lee et al, (2007). It is interesting to see that the nucleosome positions given by these two methods are not identical.

 

2) Software to predict preferential nucleosome positions from DNA sequence:

FineStr server allows one to submit a genomic sequence and to detect positions (centers) of the nucleosomes in it. The analysis is performed using their probe based on the DNA bendability matrix of C. elegans (Gabdank et al., 2009). The probe size is 117 bases that corresponds to the assumed size of DNA-histone contact area. Although the probe is based on C. elegans matrix of bendability, the authors suggest the universality of the pattern.

NuPoP is built upon a duration hidden Markov model, in which the linker DNA length is explicitly modeled. The nucleosome or linker DNA state model can be chosen as either a 4th order or 1st orderMarkov chain. NuPoP outputs the Viterbi prediction, nucleosome occupancy score (from backward and forward algorithms) and nucleosome affinity score. NuPoP has three formats including a web server prediction engine, a stand-alone Fortran program, and an R package. The latter two can predict nucleosome positioning for a DNA sequence of any length. NuPoP_F is the Fotran program that allows customized compiling of Fortran codes.

Users can provide multiple sequences in fasta format. The length of each sequence must be between 147bp and 40kb bp. The algorithm is based on the assignment of specific weights for the two main features of nucleosome positioning: the repetitions of dinucleotides and the 5-nucleotide motifs. These features have been extracted from genomic data for different species. Boundary conditions are taken as in DNA-ligandn binding algorithms.

nuScore calculates deformation energy and nucleosome-positioning score

ICM Web allows users to assess nucleosome stability and fold sequences of DNA into putative chromatin templates. It takes a DNA sequence and generates (i) a nucleosome energy level diagram, (ii) coarse-grained representations of free DNA and chromatin and (iii) plots of the helical parameters (Tilt, Roll, Twist, Shift, Slide and Rise) as a function of position.

NXSensor is a tool for finding regions of DNA sequences that are likely to be nucleosome-free. The basic idea behind NXSensor is that the DNA sequence which wraps around the nuclosome needs to have a certain degree of flexibility. Regions of DNA that have several rigid sequences close to each other are likely to be nucleosome-free.

 

3) Experimental ChIP-Chip and ChIP-Seq data on nucleosome positioning:

Last Updated on Saturday, 14 August 2010 17:12