SNPeffect

About


1. Welcome to SNPeffect 4.0

SNPeffect is a database for phenotyping human single nucleotide polymorphisms (SNPs). SNPeffect primarily focuses on the molecular characterization and annotation of disease and polymorphism variants in the human proteome.
We provide a detailed variant analysis using our tools such as:

  • TANGO to predict aggregation prone regions
  • WALTZ to predict amylogenic regions
  • LIMBO to predict hsp70 chaperone binding sites
  • FoldX to analyse the effect on structure stability

Further, SNPeffect holds per-variant annotations on functional sites, structural features and post-translational modification.

Two new features are implemented in SNPeffect 4.0.
The meta-analysis tool enables scientists to carry out a large scale mining of SNPeffect data and visualize the results in a graph.
In SNPeffect 4.0 it is now possible to submit custom single protein variants for a detailed phenotypic analysis.

2. Changelog, bugfixes and known bugs

We are determined to provide the best user experience with SNPeffect 4.0, so any feedback with bugs or other issues is welcome and will be solved as soon as possible.
An up-to-date list of bugfixes and improvements can be viewed in the changelog.
Yet unresolved issues are listed in the known issues section.

3. Methodology

3.1. SNPeffect pipeline for phenotyping coding protein variants

Below is a scheme that shows all the steps involved to phenotype a variant with SNPeffect. The resulting information is stored in the pre-calculated SNPeffect database.


3.2. Datasets of variants

SNPeffect currently contains more than 60000 human variants gathered from the humsavar list available at the UniProt website. This list includes three types of variants:

  • Polymorphisms: variants not implicated in a disease
  • Disease mutations: normally these are variants proven to be responsible for a disease, but also variants that occur in disease samples and not in controls are also listed here (role of mutant in disease sometimes still has to be clarified)
  • Unclassified: these are mostly mutations found in disease samples, but it has not been proven yet that the variant is responsible for the disease and these mutations might also appear in controls

3.3. Phenotyping

SNPeffect's variant phenotyping and characterization involves applying several algorithms to each mutant and wild type protein. Below is a summary of the algorithms used.

3.3.1. TANGO

TANGO is an algorithm that detects aggregation-prone regions in protein sequences by analyzing the hydrophobicity and beta-sheet forming propensity. TANGO has a more broader use than WALTZ, because TANGO detects amorphous and amyloid
aggregation-prone zones.
The presence of TANGO regions does not necessarily implies that the protein readily aggregates. Such regions are normally buried in the protein core, but when these regions become exposed due to other factors (e.g. by a structurally destabilizing mutation), protein aggregation can become more prominent. TANGO score differences between wild type and mutant outside the range of -50 to +50, are considered significant.

3.3.2. WALTZ

WALTZ is an algorithm that accurately and specifically predicts amyloid-forming regions in protein sequences. It is thus more specific in terms of aggregate morphology than TANGO.
The WALTZ algorithm was trained with 265 sequences and uses a position-specific scoring matrix to score protein sequences for amyloid-forming propensity.

3.3.3. LIMBO

LIMBO is a chaperone binding site predictor for the Hsp70 chaperones, trained from peptide binding data and structural modeling. Accurate prediction of Hsp70 binding sites is an essential prerequisite to understand the precise function of these chaperones and the properties of its substrate proteins. Currently, LIMBO predicts exclusively for the Hsp70 bacterial homolog DnaK. Nevertheless, early experiments have indicated (data not shown) that human and bacterial Hsp70 share a certain degree of homology in its substrate recognition patterns.

3.3.4. FoldX

If structural information is available, the empirical protein design forcefield FoldX is used to calculate the difference in free energy of the mutation (ddG). SNPeffect also prints out a decision whether the mutation is stabilizing or destabilizing the structure.

The decision is based on following rules:

  • No effect on stability: ddG is between -0.5 and +0.5 kcal/mol
  • Slightly reduced stability: ddG is between +0.5 and +1 kcal/mol
  • Reduced stability: ddG is between +1 and +5 kcal/mol
  • Severely reduced stability: ddG is higher than +5 kcal/mol
  • Slightly enhanced stability: ddG is between -0.5 and -1 kcal/mol
  • Enhanced stability: ddG is between -1 and -5 kcal/mol
  • Greatly enhanced stability: ddG is lower than -5 kcal/mol

A molecular graphics image made with YASARA (http://www.yasara.org) is also shown on the variant page.

3.3.5. Functional sites and structural features

  • Catalytic site: this feature indicates whether the mutant position is part of a catalytic site. Information comes from the Catalytic Site Atlas.
  • Secondary structure annotations were derived from UniProt.
  • Transmembrane topology was derived using TMHMM from CBS.

3.3.6. Cellular processing and posttranslational modification

  • Farnesylation, geranylgeranylation, myristoylation, GPI anchor and PTS1 targetting information were analyzed by Dr. Sebastian Maurer-Stroh at Bioinformatics Institute A*STAR Singapore.
  • Subcellular localization was predicted using PSORT.

3.3.7. Domain annotation

For each variant and wild type a domain organization cartoon is shown. On the variant page, it is additionally indicated whether the mutated residue falls inside a known domain.

  • SMART information was kindly provided by Dr. Ivica Letunic
  • Pfam information was parsed from the Pfam website

3.4. Structural analysis of variants

FoldX is applied to calculate the free energy difference between wild type and mutant. The result is the ddG (delta delta free energy) of mutation expressed in kcal/mol. The error margin of FoldX is about 0.5 kcal/mol so changes in that range are considered insignificant.
To obtain structural information, we first check in the UniProt cross-references whether a PDB structure is available for the protein and whether the variant position is actually solved in the structure. We only consider X-Ray crystal structures and do a side-chain minimization with FoldX to obtain results that approach experimental values.
When an identical match between sequence and structure cannot be found, we consider crystal structures with no less than 90% sequence identity to the protein sequence in order to build a homology model with FoldX. This homology percentage is always indicated on the variant page.
Burial of mainchain and sidechain is also indicated where the values are in the range between 0 and 1. 0 is completely exposed and 1 is completely buried.

3.5. Technicalities

SNPeffect is stored as a relational MySQL database and the website is built with Drupal.

4. Downloads

Requesting subsets of the SNPeffect database using the filter and the CSV or XLS buttons works without problems up to 5000 results. However, attempts to download (CSV or XLS button) more results or the full database on the Database page is not possible and will time-out the server eventually (size is too large). Alternatively, you can download the full SNPeffect web tables here directly:

5. Credits

  • SNPeffect 4.0 was developed by Greet De Baets and Joost Van Durme
  • Previous versions of SNPeffect were developed by Joke Reumers
  • The SNPeffect project was designed by Joost Schymkowitz and ‚Ä®Frederic Rousseau

6. Contact support

Please use the contact form for support. We will reply by e-mail as soon as possible.

7. References

SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants
Greet De Baets, Joost Van Durme, Joke Reumers, Sebastian Maurer-Stroh, Peter Vanhee, Joaquin Dopazo, Joost Schymkowitz and Frederic Rousseau
Nucleic Acids Res. 2012 Jan;40(1):D935-9.

Joint annotation of coding and non coding single nucleotide polymorphisms and mutations in the SNPeffect and PupaSuite databases
Joke Reumers, Lucia Conde, Ignacio Medina, Sebastian Maurer-Stroh, Joost Van Durme, Joaquin Dopazo, Joost Schymkowitz and Frederic Rousseau
Nucleic Acids Res. 2008 Jan;36(Database issue):D825-9.

SNPeffect v2.0: a new step in investigating the molecular phenotypic effects of human non-synonymous SNPs
Joke Reumers, Sebastian Maurer-Stroh, Joost Schymkowitz and Frederic Rousseau
Bioinformatics. (2006),22:2183-2185

SNPeffect: a database mapping molecular phenotypic effects of human non-synonymous coding SNPs
Joke Reumers, Joost Schymkowitz, Jesper Ferkinghoff-Borg, Francois Stricher, Luis Serrano and Frederic Rousseau
Nucleic Acids Research, 33, D527-D532, 2005