What is DNA?
DNA (deoxyribonucleic acid) is the hereditary material in almost all organisms. In 1953 the structure of the DNA was enlightened for the first time by the researchers J. Watson and F. Crick. They proposed that DNA consists of two long strands that are built up chain-like. There are four chain links (nucleotides), each consisting of three subunits: a sugar (desoxyribose), a phosphate group and a chemical base (adenine, guanine, cytosine or thymine). The nucleotides only differ in the base from each other and are arranged pairwise in the middle of the DNA. Adenine (A) only binds to thymine (T) and cytosine (C) only to guanine (G). Due to the specific matching of the bases, it is sufficient to list the initial letter of particular bases for stating the chromosomes sequence.
The structure of the DNA is comparable to a ladder, with the sugar and phosphate molecules forming the vertical sidepieces and the base pairs forming the rungs. The DNA is screwed in that way, that it forms a spiral called double helix. The order (sequence) of the bases is in particular interesting for researchers, because they determine the biological information available for building and maintaining an organism.
An important property of the DNA is that it can replicate. This means that each strand of DNA double helix can serve as a pattern for making an exact copy of the sequence of bases, which is critical for cell division.
In eukaryotes (animals with a nucleus in their cells), most DNA is located in the cell nucleus. But a small amount of DNA can also be found in the mitochondria (mtDNA).
Human DNA consists of about 3 billion bases, and more than 99 percent of those bases are alike in all humans. The total length of DNA in a single somatic cell is approximately two meters. This is amazing when you think about the small size of one cell. If you make a point on the surface of your hand with a pen, you mark about 1.000 cells. The human body consists of more than 50 trillion cells and thus of more than 100 billion kilometers of DNA!
What is mtDNA?
In addition to the chromosomal DNA present in the nucleus of our cells, there exists another type of DNA which is found in specialized cell structures called mitochondria. Mitochondria are located in the cytoplasm (the fluid that surrounds the nucleus) and are involved in a wide range of processes like providing cellular energy, cell death, synthesis of different enzymes etc.
In contrary to the linear chromosomal DNA, mitochondrial DNA (mtDNA) is a circular molecule. It is thought to have its origin from the circular genome of bacteria which have been incorporated by eukaryotes during evolution (Endosymbiotic theory).
There are several mtDNA molecules (2-10) found throughout the mitochondrial network with a total number of copies ranging from 100 to 10,000 per cell depending on tissue type. Egg cells have much more mtDNA copies, whereas sperm cells contain much less. Among other things, that’s why the mtDNA is normally inherited by the female (unlike chromosomal DNA, which is inherited by both parents).
A relatively high mutation rate compared to the chromosomal DNA makes the mtDNA useful for tracking ancestry or in forensic laboratories for identification of human remains.
The mammalian mtDNA contains between 15,000 and 17,000 base pairs and is therefore much smaller than the chromosomal DNA. As mitochondria play a dominant role in energy conversion by oxidation of substrates like glucose, many genes on the mtDNA encode for enzymes of the respiratory chain. The other genes encode for transfer-RNAs (tRNAs), which are the intermediators between the genetic code and the amino acids necessary for protein formation, and ribosomal RNAs (rRNAs), which form part of the structure and function of the ribosomes. Interestingly, most multicellular organism contain the same kinds and amount of these genes although the size of the mtDNA may vary considerably (e.g. some plants have huge mtDNA molecules with more than a million base pairs).
The replication of the mtDNA starts in a non-coding area of the mitochondrial DNA molecule, called the control region or D-loop.
Human mtDNA contains 16,569 base pairs consisting of 37 genes, with 13 genes encoding for proteins of the respiratory chain, 22 genes for tRNAs and 2 genes for rRNAs.
What is a chromosome?
At the beginning of the 20th century the chromosomes have been identified as the carrier of the hereditary disposition. Chromosomes are located in the nucleus of each cell and consist of long DNA strands. Only when the DNA is tightly coiled many times around specific proteins (called histones) during cell division, the shape of the chromosomes become visible under a microscope. The constriction point of a chromosome – the centromere – is conspicuous. After duplication of the DNA for an upcoming cell division, both DNA strands are held together in the centromere. The location of the centromere gives the chromosome its characteristic shape. The end pieces of the chromosomes are called telomeres.
Each somatic cell of the human body has 46 chromosomes consisting of 23 pairs of chromosomes. Each pair is composed of one chromosome from the mother and one chromosome from the father. One pair – the sex chromosomes – is responsible for determining the gender. This is the XY pair in men and the XX pair in women. The remaining 22 pairs of chromosomes are called autosomes.
Genes and gene products
In the middle of the DNA ladder are located the base pairs, whose sequence depicts the genetic information. The sequence of the bases can be read like a text that provides instructions for making gene products. Thus, a gene is a part of the DNA that is read off and translated into a molecule, mainly a protein. Humans have between 25,000 and 30,000 genes, whose size vary between a few hundred DNA bases up to more than 2 million bases.
In the first step of making gene products, the DNA double helix is opened between the base pairs. Then the sequence of the bases is transcribed into a molecule called RNA (ribonucleic acid). Like DNA, RNA is build up chain-like and consists of a sugar-phosphate-backbone with attached bases. But RNA is only single stranded and distinguishes from DNA in the sugar, which is ribose instead of desoxyribose. Additionally, the base thymine is replaced by the similar base uracil.
RNA that contains the instruction for making a protein (mRNA), carries the information from the DNA out of the nucleus into the cytoplasm (the fluid that surrounds the nucleus), where the sequence is translated into the protein. Here the mRNA interacts with a specialized complex called ribosome, which reads the sequence of the bases. Each sequence of three bases, called a codon, usually codes for one particular of 20 amino acids. There are also codons that determine the beginning and the end of the translation.
At the ribosome, the amino acids are stringed together in collaboration with the tRNA, that carries the amino acids to the ribosome. The resulting chain can consist from a few to more than 1000 amino acids and folds into a unique three dimensional structure, the protein. Proteins can either act alone or as a (sub)unit of other molecules and are required for the structure, function and regulation of the body’s tissues and organs.
A process called alternative splicing, that can occur in the nucleus of eukaryotes after transcription of the gene, leads to different proteins from a single gene. When the mRNA has been transcribed from the DNA, it includes sections which can be excised (introns). The remaining sections (exons) are assembled and translated into protein. Due to the possibility to alternatively excise sections of the mRNA, a few genes can encode a high number of proteins.
There exist also direct gene products which don’t need the translation from the mRNA into proteins. These direct gene products are e.g. the rRNA, which is part of the ribosome, and the tRNA.
The genetic code
At the ribosome, the mRNA sequence is translated into a chain of amino acids forming the protein. The mediator for translation is the tRNA that carries one of 20 different amino acids and a triplet of nucleotides, the anticodon. Is there an anticodon complementary to a triplet of nucleotides of the mRNA, the codon, the amino acid of this particular tRNA is released and inserted into the protein.
The genetic code determines which sequence of three bases of the codon encodes for which amino acid and for the signals of beginning and end of translation. As a codon consists of three nucleotides and there are four different bases that can be combined in such a triplet, there exist 43 = 64 possibilities to encode for 20 different amino acids. This means a certain redundancy of the genetic code as some amino acids can be encoded by various codons, but each codon specifies only one amino acid (the signals for the end of translation are not translated into amino acids). As a consequence of this degeneracy, point mutations in a gene do not necessarily lead to an exchange of an amino acid in the protein resulting in a possible malfunction of the same protein.
Interestingly, the genetic code is in principle the same in all organism: from the bacteria to the plants to the human. Nonetheless, there are some exceptions. So, in mitochondria and in some organism like ciliates, algae or yeast, slightly different forms of the genetic code are used. Furthermore, some bacteria and archaea are able to produce other variants of amino acids apart from the 20 standard amino acids.
Mutations and their effects
Mutations are modifications in the inheritance of organisms, resulting from changes in the sequence of the nucleotides or changes in the number or length of the chromosomes. Mutations change the genetical information in the DNA and can therefore alter the appearance of particular features.
There are two classes of mutations: small-scale mutations and large-scale mutations. Small-scale mutations are pointmutations which affect only one or a few nucleotides through the exchange, insertion or deletion of single nucleotides. Large-scale mutations alter the chromosomal structure through gene duplication, deletion of large chromosomal regions, exchange of genetic parts between different chromosomes or reversing the orientation of chromosomal segments.
Mutations can be inherited from a parent (hereditary mutations) or acquired during a person’s lifetime (somatic mutations). Only mutations present in the DNA of egg and sperm cells can be inherited. In this case the mutation passed on to the next generation occurs in every cell in the body. Mutations that occur just after fertilization appear in every cell of a person’s body, but have no family history of the disorder.
Somatic mutations occur in the DNA of individual cells, caused by environmental factors such as radiation or toxic chemicals, or just spontaneous. Spontaneous means accidentally, which is not surprising when thinking about the trillions of building blocks a DNA consists of. A lot of mistakes occur during DNA replication for cell division. But the cell has a number of workarounds. Certain enzymes can recognize and repair most of these mistakes. Furthermore many organisms have mechanisms for eliminating otherwise mutated somatic cells.
A mutation in a protein coding region can cause the protein to malfunction, to function less effective or to be missing entirely. The impact of this mutation on health and development depends on how essential the altered protein is. If the altered protein plays a critical role in the body, it can disrupt normal development (in severe cases it can disrupt the embryonical development) or cause a medical condition.
But only a small percentage of all mutations cause genetic disorders. The overwhelming majority of mutations have no significant effect on health or development. An exchange of a nucleotide (point mutation) in the sequence of a protein coding region that is not recognized by cell’s DNA repair machinery, does not inevitably lead to malfunction of the protein. The nucleotide triplets (codons), which code for amino acids, are redundant. This means that multiple triplets can encode for one particular amino acid. Therefore a nucleotide within a triplet can be exchanged but the translated amino acid and thus the resulting protein is still the same.
There are genetic changes that are very rare and there are genetic changes called polymorphisms, that are very common in the population. Polymorphisms are considered to be normal variations in the DNA and lead to the natural differences between people (such as eye color, hair color and blood type) but may also influence the risk of developing certain disorders.
A very small percentage of mutations are advantageous for an organism and its future generations. By developing new versions of proteins or just altering the regulation of the proteins, these mutations help better adapt to changes in the organism’s environment.