Bioinformatics #1 The analysis of Cd2+ impact on MAPK-signalling through DUSPs. Part 2. MSA (Multiple Sequence Alignment)

February 7, 2020

(or “Catch me (Cd2+) if you can”)

The objects of the entire 1st experiment of this series of posts are MKPs (MAP kinase phosphatases).

The subject is possible mechanisms cadmium could “use” to influence MKPs (and thus MAPK signalling pathway).

The purpose of the 1st experiment is to leverage some bioinformatics tools to find those possible mechanisms.

Let’s try to carry out MSA (multiple sequence alignment) of MKPs, namely DUSP1.

We are going to use Clustal Omega tool to do it [1].

Proteins, DNAs and RNAs are polymers consisting of repeated monomers.

MSA is an alignment of those sequences.

And the alignment is the process of finding similarities between different sequences.

In a nutshell, this tool is just provided (by us) with sequences (in our case protein sequences) and then it analyses them and tries to find identical (or similar) monomers presented in provided sequences.

(if this concept is new for you, then probably it would be much clear when you look at the image near the bottom of the post)

You can use this tool for example

1. to make a beautiful illustration for your article. Let’s say that you know what residues you are looking for (and you know that they are conservative (so that actually you could even do that alignment by hand)), but this tool helps you to present that information in an easy-to-understand less time-consuming way.

2. to compare sequences of a particular protein of different organisms to try to figure out if there’s something conservative there. And if there is (if there are some identical (or similar) residues there), then you can assume that those motifs/regions are responsible for some important functions (structure defines functions) and probably represent a protein domain/motif. And then analyse those proteins more carefully.

As we mentioned in the previous post, Cys residue is responsible for catalytic activity of phosphatases [2].

Usually catalytic motif ((V)-HC-XX-X-XX-R-(S/T) in our case) is highly conservative among different organisms. So, it should not be a surprise that we will see all those Cys residues of DUSPs aligned in 1 column in different organisms.

What would be interesting to see (for the purposes of our experiment) is if there’re some other conservative Cys residues in those proteins.

Aside from catalytic centre, enzymes also have some sites for other molecules to regulate their activity (allosteric site/regulatory site). In the case of this series of posts (where we are discussing MKPs) those Cys residues will be very important, because Cd+2 possibly indirectly could influence MKPs activity through those Cys residues. (we will discuss this in the next posts, and this possible “indirect” effect is in the main focus of the 1st Exp.).

So, let’s try to align DUSP 1 sequences of different organisms and see if there are any conservative Cys residues (except for Cys of catalytic site) there.

First of all, we need to get those sequences. For that we’ll use UniProt.

UniProt (Universal Protein Resource) is the central place for us to get proteins sequences and information about them [3].

We’re going to analyse DUSP1 because searching for other MKPs gives us just 1-2 DUSPs entries (and searching for DUSP1 gives 5 entries). Just search for “DUSP1” in the search box and you'll see

the entries. Then choose in “Filter by” filter (on the left) “Reviewed” option. This removes “Unreviewed” entries. As a result we get only entries annotated (documented) by experts (rather than automatically generated annotations (for more information on this go to https://www.uniprot.org/help/about)).

(screenshot was taken from Uniprot)

There're 68 results. And we get a lot of entries of proteins which are not actually DUSPs, but somehow relate to them. We need only DUSPs, so we’ll choose the first 4 entries (4th and 5th entries are almost identical (and belong to one organism), so we’ll use just one of these (with Q91790 entry identifier)).

We have the sequences of 4 organisms (Xenopus laevis (Amphibians) and Homo sapiens, Mus musculus, Rattus norvegicus (Mammals)).

Click the “Column” option to get rid of some unnecessary columns (leave only “Length”, “Organism”, “Entry name”, “Gene name”, “Protein names”).

(screenshot was taken from Uniprot)

Click save at the bottom of the modal window. Then choose/check the first 4 entries and download them in ‘FASTA (canonical)’ format with the help of “Download” option.

(screenshot was taken from Uniprot)

The file with results should look like

(the image was created by me with Notepad/Paint, and you can use it if you want. Sequences were obtained from UniProt)

|| Useful tip

FASTA format is a text format used for DNA/RNA/protein sequences representation.

*Aside from sequences themselves it also might contain kind of meta-information (at the beginning) such as UniProt identifier, species name, full proteins name etc. [4]. This is similar to when we use Markdown at Steemit (it also contains some meta-information, aside from text itself).*

Continue reading with a Coil membership.