SNPeffect

Help


1. General site information

1.1. Is registration necessary?

Not at all. As an unregistered visitor you can access the database and meta-analysis tool. However, to submit jobs to the SNPeffect server, registration is required as of May 2013. We had to make this decision due to continuing abuse of the server by spammers and bots. Registration is free and provides a private job account. This means that the results of your submitted SNPeffect job can only be seen and downloaded by yourself. These are then stored in your personal account. You will also receive an e-mail when a job has finished.

2. Introductory video

The kind people at OpenHelix made a great introductory video of SNPeffect4. Take a look!

The original post can be watched on the OpenHelix site at http://blog.openhelix.eu/?p=10257

3. Searching the database

3.1. Explaining the filter

To finetune your search in the database, you make use of the following filter which is always visible at the right side of the database content list. The options you can set are:


  1. Which pathology you want to filter on. All the diseases that are present in the database and fit your search will be autosuggested. But you can of course enter any term to search for.

  2. Filter on the three possible Mutation types: Disease, Polymorphism or Unclassified. Unclassified is an annotation from UniProt indicating that this variant was found in a disease sample, but it is not yet proven that this variant is responsible for the disease.

  3. Search directly for the protein of interest. The identifier needs to correspond with the UniProt entry name (e.g. P53_HUMAN).

  4. Search by Gene Name (e.g. TP53)

  5. Search by dbSNP identifier (e.g. rs35163653)

  6. TANGO predicts the aggregation prone regions in a protein. You can search for mutations that increase (dTANGO > 50), decrease (dTANGO < -50) or do not affect aggregation propensity (dTANGO between -50 and 50).

  7. WALTZ predicts the amylogenic regions in a protein. You can search for mutations that increase (dWALTZ > 50), decrease (dWALTZ < -50) or do not affect amyloid propensity (dWALTZ between -50 and 50).

  8. LIMBO predicts the Hsp70 chaperone binding sites. You can search for mutation that increase (dLIMBO > 50) ), decrease (dLIMBO < -50) or do not affect chaperone binding (dLIMBO between -50 and 50). Currently, LIMBO only detects bacterial hsp70-recognized sites (DnaK), but there is evidence (data not shown) that human Hsp70 binding sites and DnaK binding sites share a certain level of similarity. Dedicated human Hsp70 predictors will be implemented in the future.

  9. ddG indicates the free energy change of the mutation calculated by FoldX. If the mutation destabilizes the structure, ddG is increased, whereas stabilizing mutations decrease the ddG. Since the FoldX error margin is around 0.5 kcal/mol, changes in this range are considered insignificant.

3.2. Example

If you want to search e.g. for disease mutations that are structurally destabilizing, have an increased aggregation propensity and are related with cancer, you need to set the following parameters :


By clicking on the variant ID in the result list you get more information about the effect of this mutation. Details are given in the next section.

3.3. Detailed variant information

When analyzing the variant after clicking on the VAR ID, you first see a global summary on the dedicated variant page:


It contains:

  1. the effect of the mutation on the aggregation tendency (amorphous as well as amyloids) and on the chaperone binding. The difference compared to the wild type including a decision is indicated as well as the comparison of number of predicted zones found in mutant and wild type

  2. the location of the mutation in the context of protein domains (SMART and PFAM).

  3. the effect on protein stability (FoldX), including a molecular image of the mutation site and an indication where the change is located compared with the distribution of all disease (N=6099) mutations and polymorphism (N=4266).

Below you can find more details in the feature-specific tabs. These tabs are by default collapsed, but can be expanded by clicking on them to see a detailed overview.


Analyzing the effect on amorphous aggregates (TANGO), the first result we encounter is an overview of the short TANGO stretches in the wild type (WT) and mutant (MT) protein. The number of stretches is indicated (red square), if this number differs it’s worth investigating which aggregation stretch has been added or lost due to the variant.


You also get a difference plot for the TANGO score (see below) where the WT score has been subtracted from the MT score (the dTANGO) . A value above 50 indicates an increase in aggregation tendency, a score below -50 indicates a decrease in aggregation tendency. So large positive peaks indicate an increase in aggregation propensity, large negative peaks indicate a decrease in aggregation propensity due to the mutation.


Finally, there is the full TANGO score profile of the variant and, if available, the colored PDB structure indicating where the TANGO stretches are located in the WT and MT protein.


The same information is available for the WALTZ and LIMBO predictors.

If structural information is present, another available tab is the ‘FOLDX structural profile’. This tab includes information on the used PDB structure, the percentage of homology between the protein sequence and the used structure (if no structure is available we build homology models with no less then 90% sequence identity between target and template) and the free energy change. As extra information, plots are available that visualize the WT and MT residue in contact with residues at a distance of 5 Angstrom.


Other available information is contained in the Functional sites, structural features and cellular processing tab, divided in 2 subcategories:

  1. Functional sites and structural features: mentions if the mutated residue is a known catalytic site or belongs to a particular secondary structure or transmembrane topology.
  2. Cellular processing: mentions if the lipid modification and subcellular localization is affected by the mutation.

For both categories, we make use of external predictors. More information is available in the about section.


An overview of all the mutations listed in this protein is given at the bottom of the page.

4. Meta-analysis

The meta-analysis tool enables scientists to carry out a large scale mining of SNPeffect data and visualize the results in a graph.
First you need to select the mutation type you want to filter on. The possibilities are: Disease, Polymorphism, Unclassified or All. Only variants of the selected type(s) will be plotted.
Next you can fine-tune on a particular (group of) disease(s). This fine-tuning is possible by selecting one or more diseases out of the list and/or by specifying a keyword (e.g. cancer, neurodegeneration, …). Finetuning Disease type variants will search for variants annotated with that disease. Finetuning Unclassified or Polymorphism type variants will search for variants from proteins annotated with that disease.


Next you need to indicate which feature you would like to plot. The possibilities are TANGO change (dTANGO), WALTZ change (dWALTZ), LIMBO change (dLIMBO) and Stability Change (ddG). This will result in a scatter plot which indicates the correlation between the X and Y variable for the selected dataset. If the number of hits for one of the categories (disease,polymorphism or unclassified) exceeds 500, the average Y is plotted for each X bin.


It is possible that your query did not result in any hit for the most likely following reason:

  • if you chose ddG as X or Y-axis and there is no structural information available for the resulting variants, hence we cannot plot the ddG

E.g. to plot all mutations related with 'cancer' and the 'Ehlers-Danlos syndrome', we need to set the following parameters:


This will result in a dataset of all polymorhpisms, disease and unclassified mutations related with 'cancer' or the 'Ehlers-Danlos syndrome'. As X- and Y-axis, we chose TANGO difference (X) and Stability change (Y).

This results in 3 plots:

1) the scatterplot indicating a shift in decreased stability for the disease mutations.


2) Frequency plot and boxplot for the 'Difference in Tango score' (X-variable) for the different mutation types.


3) Frequency plot and boxplot for the 'Difference in Stability' (Y-variable) for the different mutation types.


5. Submitting jobs

In SNPeffect 4.0 it is now possible to submit custom single protein variants for a detailed phenotypic analysis including TANGO, WALTZ, LIMBO and FoldX analyses.

5.1. Input formats

The first thing you need to do, is choose which input you are providing:

  • Fasta sequence: protein sequence in FASTA format.
  • PDB file: a PDB file containing atom coordinates. Please note that PDB files should comply to PDB standards.
  • PDB ID: a valid PDB file ID. The PDB file will be automatically retrieved from the PDB.
  • UniProt ID: a valid UniProt ID. The FASTA sequence will be automatically retrieved from UniProt.

5.2. Tips

  • If "UniProt ID" or "Fasta sequence" was chosen as input format, SNPeffect tries to retrieve structures. If this structure has more than one chain, the molecular images and PDB files in the report package will only show the chain carrying the mutation. If the input format was PDB file or PDB ID, all chains are kept in the images and structures.
  • If "UniProt ID" or "Fasta sequence" was chosen as input format and an exact structure was not found by SNPeffect, we can select a homologous template structure to build a homology model. In the "Homology" list you can specify the minimum percent sequence identity between the template structure and your protein sequence. If no template was found, you can try to lower this number. However, model accuracy tends to increase with increasing sequence identity.
  • Molecules or chains in PDB files should be named. PDB files with nameless chains will not run.
  • Only alphanumeric characters are allowed in PDB filenames (and underscores _ as well).
  • Non-natural or modified aminoacids in PDBs will be replaced by Ala.
  • FASTA headers (the name after the > sign) are automatically truncated to 12 characters (for further processing) and should ideally contain only alphanumeric characters and underscores.


After selecting the input format and providing your data, you can click 'Next' to go to the next step. Here you select the residue that you want to mutate and the new residue to put at that position E.g. if you mutate Gly400 in chain A to His:


After clicking 'Submit job', your job will start on our computing cluster and you will be notified by e-mail when the job has finished. A zip file with the SNPeffect report file (output.pdf) and figures will be available for download in your SNPeffect account for 2 months.

5.3. Issue when no structural data could be found to make a model (when there should in fact be data)

When SNPeffect could not find reliable structural data in the Protein Data Bank (PDB), no homology model will be consructed and no FoldX stability output will be present in the final report.
Here is a useful tip to still get (partial) structural data and hence also a FoldX stabilty report on the mutant.

E.g. selecting UniProt ID as input format and choosing an ID like P53_HUMAN, will result in am empty search for structural data with default homology settings (90%). SNPeffect performs a Blast search with the complete sequence from UniProt against the PDB. With a homology setting at 90%, no suitable structures will be found, because e.g. the PDB only contains structures of the DNA binding domain only of P53. But perhaps you are only interested in the DNA binding domain (DBD) of P53 and not in the rest of the protein?

  • In that case you can surf to http://www.uniprot.org/uniprot/P53_HUMAN
  • Click on the top on Sequence annotation
  • Find the line of the DNA binding region and click on its positions 102 – 292
  • This will give you the sequence of the DBD in a Fasta file, which you can now use as a SNPeffect job input (choose "Fasta sequence" when submitting your job). Just copy/paste the whole Fasta sequence including the header.
  • Now you will surely get structural stability data on the mutant in that domain