Briefly Noted

Notes on topics of general interest

This site has moved! You can find all the content here plus new content at the following address: https://briefly-noted.neocities.org/.

This site will no longer be updated.

Suppose you want to create a biomolecule using nucleic acid sequences (e.g., GCACGAGT). How do you go about doing this? Nucleic acid design involves creating a set of sequences which will then combine into the desired biomolecule. Those interested in accomplishing such a feat are typically working in the fields of DNA computing or DNA nanotechnology.

Constructing a biomolecule with the desired structure using nucleic acid sequences is easier than making a protein (also a biomolecule). This is because it is easier to predict how nucleic acid sequences will arrange themselves.

Examples of applications of nucleic acid design include constructing a DNA walker and DNA origami.


This post is part of a series. The most recent post in the series is “Ruzzo–Tompa algorithm”. Learn when new posts appear by subscribing (RSS). You may also follow @briefly-noted@write.as in Mastodon or subscribe for email updates.

#biology #nucleicaciddesign

The Ruzzo-Tompa algorithm is a method for finding distinct subsequences in a sequence of numbers such that the sum of the subsequences reaches a maximum. The difficult part here is ensuring that the subsequences are distinct and non-overlapping.

Consider the following sequence: $(1, 2, 3, -5, 1, 2, 3)$. While one might be tempted to extract the two subsequences $(1, 2, 3)$ and $(1, 2, 3)$, skipping over the -5, the two subsequences are not distinct. The subsequences $(1, 2, 3), (1), (2, 3)$, by contrast, achieve the same score and are distinct. As the sequence gets very long this task gets challenging very quickly. The Ruzzo-Tompa algorithm describes a strategy which makes light work of the task, provided you are a computer.

The algorithm has applications in bioinformatics. It is particularly useful in the study of DNA. Those working in bioinformatics are often interested in finding similar (long) subsequences in different samples of DNA. Finding similar DNA in different organisms is one thing biologists are interested in, often for the same reasons that biologists are interested in learning about in similar physical characteristics in different species.


This post is part of a series. The most recent post in the series is “FASTA format”. Learn when new posts appear by subscribing (RSS). You may also follow @briefly-noted@write.as in Mastodon or subscribe for email updates.

#biology #bioinformatics

This site offers brief notes on a variety of topics. Many of the topics have some connection to (computational) biology.

Contributors

Andrew Jones is a Boston-based software engineer. Although he studied sociology and computer science in university, he has a long-standing interest in computational biology.

FASTA is a text-based format for storing data about nucleotide or amino acid sequences. Each nucleotide or amino acid is represented by a single ASCII letter.

Sequences can be stored in a text file in FASTA format in the following way. A line beginning with > indicates that the sequence will start on the following line. (The line with the > may contain a name or unique identifier for the sequence.) The sequences themselves consist of single-letter codes, many of which are familiar. For example, A indicates the presence of adeninein the sequence.

Here is an example sequence in FASTA format:

>gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase
MNSERSDVTLYQPFLDYAIAYMRSRLDLEPYPIPTGFESNSAVVGKGKNQEEVVTTSYAFQTAKLRQIRA
AHVQGGNSLQVLNFVIFPHLNYDLPFFGADLVTLPGGHLIALDMQPLFRDDSAYQAKYTEPILPIFHAHQ
QHLSWGGDFPEEAQPFFSPAFLWTRPQETAVVETQVFAAFKDYLKAYLDFVEQAEAVTDSQNLVAIKQAQ
LRYLRYRAEKDPARGMFKRFYGAEWTEEYIHGFLFDLERKLTVVK

This post is part of a series. The most recent post in the series is “Nanopore sequencing”. Learn when new posts appear by subscribing (RSS). You may also follow @briefly-noted@write.as in Mastodon or subscribe for email updates.

Nanopore sequencing is a method of sequencing DNA and RNA. It is a technology which is positioned to replace (or supplement) sequencing using the polymerase chain reaction (PCR). (PCR was developed in the 1980s.) Nanopore sequencing is potentially cheaper than sequencing using PCR. Nanopore sequencing uses an electric field to move a sample through a nanopore, a pore of nanometer size. When the sample gets close enough to the nanopore, one can gather information about the sample by looking at changes in electric current density around the nanopore.


This post is part of a series. The most recent post in the series is “Vito Volterra”. Learn when new posts appear by subscribing (RSS). You may also follow @briefly-noted@write.as in Mastodon or subscribe for email updates.

Vito Volterra was an Italian mathematician and physicist (1860–1940). He was born in Anconda, a port city in central Italy. His name is familiar today, especially to biology and ecology students, because it appears in the name of the best-known predatory-prey model, the Lokta-Volterra model. Volterra came up with the model (independently from Lokta) in the context of studying fish catches in the Adriatic Sea.

Volterra is also remarkable in that he was one of the few professors who opposed the Fascist regime of Benito Mussolini. As a result of his opposition, he had to live abroad, returning to Italy shortly before his death in 1940.


This post is part of a series. The most recent post in the series is “Lipids”. Learn when new posts appear by subscribing (RSS). You may also follow @briefly-noted@write.as in Mastodon or subscribe for email updates.

#biology #ecology #vitovolterra

A lipid is a molecule present in biological organisms. If you have heard the term “lipid” before, you might think that it's synonymous with “fats”. Fats are, however, triglycerides, a proper subset of lipids. What unites lipids is their solubility in nonpolar solvents.

Fatty acids are an important category of lipids. Soaps used in cleaning and bathing are made from fatty acids. Triglycerides, mentioned already, are another significant category of lipids. Vegetable oils are mixtures of triglycerides. Cholesterol is an instance of another large class of lipids, sterol lipids.

#biology #lipids


This post is part of a series. The most recent post in the series is “Modeling DNA evolution”. Learn when new posts appear by subscribing (RSS). You may also follow @briefly-noted@write.as in Mastodon.

Suppose we gather the entire DNA sequence from organisms of the same species once a year for 1,000 years. If we count the frequency of bases (C, G, A, T) in each of these sequences we will observe that they are not identical. In some sample sequences C might be higher than in others. In other sequences, G might be more prevalent. Evolutionary biologists are interested in studying and, indeed, formally modeling this variation.

Several models have been proposed for DNA evolution. One such model, a Markov model, assumes that the probability of a base being replaced by another base (or staying the same) depends only on what the base is. Replacement, in a Markov model, does not depend on how prevalent other bases are, in the past or in the present. Although the Markov model does not describe the evolution of DNA particularly well, it does do a better job than a model which assumes bases are replaced at random.

#biology #dna


This post is part of a series. This is the first post in the series. Learn when new posts appear by subscribing (RSS). You may also follow @briefly-noted@write.as in Mastodon.