Bioinformatics #1 The analysis of Cd2+ impact on MAPK-signalling through DUSPs. Part 3. Ontological analyzis. A. QuickGO

February 20, 2020

(or “Catch me (Cd2+) if you can”)

In this post we’re going to continue our “journey” with MKPs (MAP kinase phosphatases).

(DUSP2 catalytic domain (surface representation) with its conservative (V)-HC-XX-X-XX-R-(S/T) motif (highlighted with magenta). 3D-structure (PDB-file) – 1M3G in this case – was obtained from PDB. The image was created by me with the help of PyMol - open source tool for molecules visualization/exploration. You can use the image if you want).

Feel free to leave comments related to the topic of this post (or this series of posts). As I mentioned in the introduction post for this series, I’m not a professional bioinformatician/biochemist, so if you see any mistakes or have a question, go ahead.

Before we begin to investigate actually possible ways Cd2+ could “use” to influence MKPs (hence MAPK signalling pathway), let’s try to investigate Cd2+ relationship with MKPs using another approach. We’ve found out in the first part of this series that Cd2+ affects MAPK-signalling pathway [1].

Great. But MAPK signalling pathway is one of the best studied signalling pathways ever [2]. And you, probably, remember from the first part, how scary all that looks. What if we would like to narrow down the list of MKPs cadmium could have impact on, or what if we wouldn’t have the data we’ve got from the articles/papers/monograph. As it was mentioned in the first part, MAPK signalling includes MAPK kinase kinases (MKKK or MEKK), MAPK kinases (MKK or MEK) and MAPKs themselves [6].

Knowing what genes products exactly of the MAPK pathway are influenced by Cd2+ also would help us to understand, if Cd2+ could influence MKPs at all. Because if the list of genes/genes products influenced by Cd2+ includes all kinases except for MAPK themselves (which are the only ones (according to that image on KEGG) regulated by MKPs), then Cd2+ should not influence MKPs (what might dash our hopes for/ruin the whole idea of the first experiment).

We saw in the first part, that the range of DUSPs which can regulate ERK (also known as MAPK1) and ERK1 (MAPK3) ([3]) includes DUSP1, DUSP2, DUSP4, DUSP5, DUSP6, DUSP7, DUSP8, DUSP9, DUSP10 and, finally, DUSP16 (10 DUSPs in total).

And the range of DUSPs which can regulate JNK (also known as MAPK8), MAPK9, MAPK10 ([4]) and p38 (also known as MAPK14), MAPK11, MAPK12 and MAPK13 ([5]) includes all of the DUSPs above except for DUSP1 and DUSP6 (8 in total).

(the image is from Wikipedia. Public domain. MAPKs are highlighted with the green squares. When you hover over/click on ERK/JNK/p38 elements on KEGG we get a set of MAPKs. That's how we get the MAPKs groups you can see on the right. The image was modified by me. You can use it if you wan).

(the image is from Wikipedia. Public domain. The parts of this image with better resolution is below. I placed this, so that you could get an overall picture of MAPK pathway. Mitogen-activated Protein Kinase Phosphatases (MKPs) are highlighted with the blue ovals. When you hover over/click on MKP element on KEGG we get a set of DUSPs. That's how we get the DUSPs list you can see on the right top corner. The image was modified by me. You can use it if you wan)

Well, it doesn’t look like a big difference (10 or 8), but imagine if the difference in your particular case with another signalling pathway and proteins was something like 7 and 54 for example.

So, let’s try to narrow down that list of MKPs we might be interested in (and again, we’re interested in those which could be somehow indirectly (or directly) affected by Cd2+.

The logic here is as follows:

If we know what MAPKs genes/genes products exactly of those mentioned above are influenced by Cd2+, we could narrow down the list of MKPs (or, if no MAPKs genes get activated, then we could exclude the possibility that Cd+2 could influence MKPs at all (which would actually dash our hopes and the reason to do all this first Exp.)).

Of course, we could try to analyze the literature (dozens of papers/articles) searching for the answer – what MAPKs exactly are influenced by Cd2+ (and again, if they are influenced by it at all). But, as it was said in the introduction of this series of posts, bioinformatics might help us to save a lot of time. So, let’s leverage its power and try to figure out what MAPKs/MKPs genes are influenced by Cd2+ by carrying out ontological analysis.

Also, probably, there’s just no such information in the literature yet (what MAPKs are influenced by Cd2+), but it might be in the databases developed for biologists/bioinfomaticians.

And, finally, some literature (especially monographs) might just cost a lot (and all databases, I’ve seen so far, were freely available).

Now, let’s try to figure out what ontological analysis is. But before that let’s try to figure out what ontology is. In the philosophical context it’s the study of being (answering the questions like “What is existence?”, “What does it mean to exist?”) [7].

The best definition I could give with my own words as for now for “ontology” in the context of science is

Ontology is the group of concepts/ideas/terms we should use to describe something, which help us to organize information into knowledge and exchange it [8].

And Gene Ontology (GO) is an initiative/project which deals with, obviously, genes and genes products, and tries to provide scientists those concepts/ideas/terms. For example, it provides vocabulary we should use to describe proteins (so that scientists could unambiguously understand each other). And those concepts/ideas/terms are grouped into 3 main fields: cellular component, molecular function (that gene products do) and biological process (where those products participate) (so that we have 3 ontologies). Each term has a unique identifier, and all terms are organized into a hierarchical structure (graph) (with parent-child relationships) [9, 10].

All this stuff is provided by Gene Ontology (GO) Consortium.

And then all that is used by Gene Ontology Annotation (GOA) project to annotate (document) genes products as manually, as electronically/automatically.

We (biologists) might get access to all that (ontological terms (provided by Gene Ontology (GO) Consortium) and annotations (provided by Gene Ontology Annotation (GOA))) with the help of web-interface – QuickGO browser for example (kind of Google search for bioinformaticians) [11].

(The image was created by me. The question mark was taken from Pixabay. Pixabay License)

And, finally, I could describe ontological analysis as the process of finding information about genes/genes products using those ontologies/web-interfaces.

Anyway, all this should become much clearer in practice.

What we are going to do now is to go to QuickGO website and search for “cadmium” [14].

What we are interested on results page is “cellular response to cadmium ion” term (you’ll see a button saying “2,008 annotations” (February, 2020).

Click that button and wait for results. Now, you can click “GO terms” button. There you’ll see “cellular response to cadmium ion” with an icon (parent-child relationship), click that icon.

A special modal window will appear with the hierarchical structure indicating where “cellular response to cadmium ion” (GO:0071276) term is located.

Then you can click on it, and a page dedicated specifically to that term will be opened. On that page you can see that this term has child terms like “cellular detoxification of cadmium ion” and “SCF complex disassembly in response to cadmium stress”. Also you can see a clear definition for “cellular response to cadmium ion” term. We chose “cellular response to cadmium ion” result on that QuickGO results page for “cadmium” search, because it’s the most common term (parent) in our case (well, it’s almost the most common term, because we also can see “response to cadmium ion” term as its parent but we are interested in “cellular response”).

Now, click “Taxon” button on the page with annotations for “cellular response to cadmium ion” term and choose “Homo sapiens”, click “Apply” (at the bottom of the modal window). As a result our 2,008 annotations narrow down to 59.

Now click an “Aspect” button and choose “biological process” option and click “Apply”.

As it was mentioned above genes products can be annotated manually (by experts) and automatically (by computers). Now click an “Evidence” button. There you can choose the evidence code you need. For more information on this go to geneontology [12].

Well, I used QuickGO several years ago. At that time there was an “Inferred from Electronic Annotation (IEA)” option in that “Evidence” window [13]. So, as far as I understand, now that option is not there, and according to QuickGO FAQ page “All manual codes” higher-level grouping (the parent for all manual annotation evidence codes) has an ECO:0000352 (“evidence used in manual assertion” in our “Evidence” window). So, we can just choose that to include all manual annotations. Then click “Apply”.

But if you would want to also see annotations added by computers automatically, then, as far as I understand, you need to add an “Inferred from Electronic Annotation (IEA)” option (with “add” button in the “Evidence button”).

After applying all those filters we now have just 42 annotations.

All we have to do now is to export our data. But before that let’s click “Customise” option. In the window appeared let’s leave just “Symbol” and “Evidence” options. Click “Customise” again to remove the window.

Finally, click “Export” button, choose “Tab-delimited” (tab-separated values/TSV) format and click “Go”.

|| Useful tip

Now, if you open the exported file with .tsv extinction with Excell, you’ll see that all your results are located in just 1 cell (it’s just a chaos). But, if you open that tsv-file in Notepad, copy all the data there and paste them into that same Excell, you’ll see that all pieces of the data are in separate cells. Looks like by copying data initially to Notepad we remove some formatting information.

And finally we have a list of genes (“SYMBOL” column) (we see 42 genes, but there are just 29 unique genes) related to “cellular response to cadmium ion”. And we can already see MAPK1, MAPK3, MAPK8 and MAPK9 among them.

( The image above shows the part of the list of genes related to the “cellular response to cadmium ion” term. The genes were taken with the help of QuickGO. The image was created by me. You can use it if you want)

In the next part (Part 3. B) we're going to find biological processes in which the genes we've got in this post are involved in with the help of GeneCodis and discuss in detail results of all we've done in this (3rd) part.