Bioinformatics #1 The analysis of Cd2+ impact on MAPK-signalling through DUSPs. Part 3. Ontological analyzis. B. GeneCodis

(One of the “heroes” of the 1st experiment – DUSP2 – with its catalytic domain (cartoon representation) with its conservative (V)-HC-XX-X-XX-R-(S/T) motif (highlighted with magenta). The image was created by me with the help of PyMol - open source tool for molecules visualization/exploration. You can use the image if you want)

Welcome back to our journey with MAPK signalling pathway, DUSPs (dual-specificity phosphatases) and Cd2+.

Finding the biological processes the genes (related to “cellular response to cadmium ion”) are involved

In this part we are going to visit GeneCodis [1, 2, 3, 4]

Select the organism (Homo sapiens in our case).

Choose “GO Biological Processes” in “Select the annotations” option.

And let’s paste the list of our genes (29 unique genes) in the input field (“Paste your lists of genes” option). And click “Submit”. The page will be reloaded several times.

Note that we got 42 genes in the previous part, but there're 29 unique genes. You would get different results, if you submitted 42 genes to GeneCodis (I mean, seems like it doesn't exclude redundant data).

Also note that GeneCodis assign unique URL to each job. You can see results discussed in this post at http://genecodis.genyo.es/analysis/job-5920922725697

At the top of the page. You’ll see “3 Genes (10.34%)” (highlighted with red). There were no annotations for these genes (so that GeneCodis uses 26 genes to represent results). Using “Summary of the user provided list of genes” you can see the list with our genes, along with their description and names.

Now open “Singular Enrichment Analysis of GO Biological Process” section where we can already see “stress-activated MAPK cascade (BP)” and “cellular response to cadmium ion (BP)” processes (which contains those terms of “Biological process” ontology) with tags cloud. With “Singular Enrichment Analysis of GO Biological Process” we’ll have 1 biological process for 1 group of genes (as opposed to “Modular Enrichment Analysis (all annotations)” where we will see multiple processes for 1 gene group).

Also you can see a table in the “Interactive table” section.

By clicking on the “NG” column we can sort results based on the “Number of annotated genes in the input list”.

Let’s export results. You can see “Get results in other formats:” option above the “Interactive table” section. Click the icon below it. You’ll see the page titled “Summary in other formats”. There you can find a graph with results (at the bottom of the page). Click on “Get the results in TAB delimited text format” link to export result in tsv format.

Now, open that with Notepad, copy text and paste it into Excel. You’ll see “Support” column (which is (to my knowledge) is equivalent for “NG” column mentioned above). Sort results in Excel with the help of “Sort & filter option” based on that “Support” column values (“Largest to smallest”).

Delete “Id”, “Items”, “List size”, “Reference Support”, “Reference size”, “Hyp_c” columns. And choose the first 15 results (groups with 9, 8, 7 and 6 genes). So that we get the following …

(The image above shows the top 15 biological processes the genes we've gotten with the help of QuickGO are involved in. There're 275 entries in total). The results were gotten with the help of GeneCodis. The image was created by me. Results are presented in Excel).

where

“Items_Details” column contains biological processes;

“Support” column contains the number of genes taking part in specified process;

“Hyp” column contains p-values (probability value / significance).

“Genes” column contains, obviously, genes names.

(as for “Hyp_c” we’ve deleted, to my knowledge Hyp_c is the p-value corrected with the help of FDR (false discovery rate) method. [5]. I don’t see a big difference between the values in Hyp_c and Hyp columns in our case, so I deleted Hyp_c column).

Results

So, you can see that we have “stress-activated MAPK cascade (BP)” process among the first 15 results we’ve got with GeneCodis with 6 genes (FOS, MAPK3, MAPK1, JUN, MAPK8, MAPK9). These genes and this process relate to the “cellular response to cadmium ion” phrase we’ve been looking for with QuickGO. Thus, we can conclude that cadmium influences MAPK signalling pathway (and all other biological processes you see in the table above) even without searching for that information traditionally with the help of papers/articles/monographs.

A good question here might be, I guess, that we don’t see DUSPs genes among the ones we’ve found with the help of QuickGO. And our first experiment is titled “Bioinformatics experiments. Exp 1. Analysis of mechanisms of cadmium ions impact on MAPK signalling pathway through the members of dual-specificity phosphatases (DUSP) family (or “Catch me (Cd2+) if you can”). Part …”. So, obviously, it might seem strange to us that we don’t see DUSPs genes there. Probably there’s no such information neither in the literature, nor in those databases, though. So our experiment, I think, still make sense.

Anyway, we are not making this first experiment to get the data, which could allow scientists to develop a new strategy to remove Cd+2 from people contaminated with it. We are exploring bioinformatics tools and are trying to figure out how to use them to make some little discoveries.

We said at the beginning of this part (Part 3. A) that ontological analysis could also help us to narrow down the list of DUSPs we need to analyse. MAPK1, MAPK3, MAPK8 and MAPK9 are kinases, and FOS gene product (c-Fos) is the protein which forms a complex with JUN (transcription factor) (involved in MAPK signalling) in the nucleus [7, 8].

Well, now we know that MAPK8, MAPK9, MAPK3 and MAPK1 are involved in “cellular response to cadmium ion”. This doesn’t allow us to narrow down the list of DUSPs, because we know now that 2 MAPK signalling pathways (the first one is where MAPK1 and MAPK3 are involved and the second one is where MAPK8 and MAPK9 are involved) might be influenced by the cadmium (see the first post of Part 3).

But the results allow us to exclude the pathway where p38 (MAPK14) is involved. This might help us to try predict the cell fate (death, proliferation…) in response to cadmium in the last post of this series where we are going to discuss the results of the first experiment.

All images (without the license specified) are used under the doctrine known in USA as “Fair Use” (similar doctrines are used in other countries). For more information visit the US Gov website.

Other posts of this series:

Bioinformatics experiments. Introduction

Bioinformatics #1 The analysis of Cd2+ impact on MAPK-signalling through DUSPs. Part 1. Theory

Bioinformatics #1 The analysis of Cd2+ impact on MAPK-signalling through DUSPs. Part 2. MSA (Multiple Sequence Alignment)

Bioinformatics #1 The analysis of Cd2+ impact on MAPK-signalling through DUSPs. Part 3. Ontological analyzis. A. QuickGO

References:

1. GeneCodis3

2. Tabas-Madrid D, Nogales-Cadenas R: GeneCodis3: a non-redundant and modular enrichment analysis tool for functional genomics. Nucleic Acids Research 2012; doi: 10.1093/nar/gks402

3. Nogales-Cadenas R, Carmona-Saez P: GeneCodis: interpreting gene lists through enrichment analysis and integration of diverse biological information. Nucleic Acids Research 2009; doi: 10.1093/nar/gkp416

4. Carmona-Saez P, Chagoyen M: GENECODIS: A web-based tool for finding significant concurrent annotations in gene lists. Genome Biology 2007 8(1):R3

5. GeneCodis Help page

6. Stanton A. Glantz. Primer of Biostatistics, Fourth edition, McGraw‐Hill Inc., New York, 1997. No. of pages: xvi+473+computer program

7. Proto-oncogene c-Fos

8. Transcription factor AP-1 / JUN

Continue reading with a Coil membership.