Since it's an extended holiday weekend here in the US, science news has slowed down considerably, allowing me the time to write up a post I hope will bring some clarity an issue that's popped up more than once, including just last week. The topic is junk DNA, namely does it exist, and if so, how do we identify it? I'm going to go over a few of the types of potential junk DNA and then wrap up with a my own conclusions on the topic, just so I can refer to this in the future without re-starting the debate over whether junk DNA really is junk.
Read more …
The concept of junk DNA arose when researchers started sequencing large pieces of the genome, and found that very little of it coded for proteins. With the completion of the human genome, we can give a more precise figure on that: only 1.4 percent of the genome is likely to code for an actual protein. But other genomes tell a very different story. For example, the pufferfish (Fugu rubripes) has a genome that's 1/8 the size of humans, but has roughly the same number of genes. The difference, it appears, is largely in the junk: Fugu doesn't have much of it, and so serves as a useful point of comparison. Comparing the two genomes can help us identify how junky all the bits of non-coding DNA are. There are a few classes of DNA sequences that appear to have a substantial junk content:
Inter-regulatory sequences: before a protein gets produced, you have to copy the DNA that encodes it into an RNA message, a process that is regulated by the DNA surrounding the message. The DNA sequences that regulate a gene's expression can reside up to hundreds of kilobases away from the actual gene. But does the DNA in between the regulatory sequences matter? In some cases yes, but Fugu suggests that those are the exceptions. Many of the same regulatory DNA sequences are used in both humans and fish, but in Fugu those sequences are often much closer together, with the intervening sequences eliminated. This suggests that much of the sequence near genes is junk.
Introns: In eukaryotes (all multicellular animals), the protein coding portion of a gene is split up into exons. The intervening DNA (termed introns) is eliminated from the final RNA message. All told, the DNA sequence of introns accounts for about 24 percent of the human genome. These introns contain regulatory sequences that signal for the elimination of the intron from the final message, and can contain sequences that regulate gene expression as well. But these account for a small fraction of the total intron sequence. Many organisms (such as flies and Fugu) have much smaller introns than humans, and small, rapidly dividing eukaryotes such as yeast have gotten rid of the majority of their introns.
Pseudogenes: Large duplications of genetic material go on all the time. Some of the duplicated genes (and their accompanying introns and regulatory regions) develop new functions, but others don't get used, and mutations eventually silence them. The human genome is littered with dead copies of genes, called pseudogenes. It's always possible that further mutation will do something useful with these genes, but in many cases, it's highly unlikely. In the case of odorant receptors, over half the nearly 1,000 present in the human genome are now pseudogenes; it's hard to imagine all of them being put to use in the future.
Disabled retroviruses and transposons: Many viruses reproduce by inserting a copy of themselves into the genome. When this process goes badly, an inactive virus is left behind; this process accounts for approximately 2 percent of the human genome. More significant are the transposons, mobile genetic elements that have hopped around the genome and now account for nearly half of it. Most of these transposon copies are non-functional, and will never hop again. Combined, these disabled parasites account for a significant fraction of what is commonly considered junk.
Other stuff: There are other regions of the genome that appear to simply not contain genes. Those regions are largely absent in the Fugu genome, and do not have any obvious function.
People tend to refer to all of these classes of DNA elements collectively as junk, which is where much of the problem arises. Although the junk is probably useless on average, there are clear exceptions. I've covered at least three cases where transposons or pieces of them have been used to form a functional gene product. These cases are often announced with press releases proclaiming something along the lines of "a new use for junk DNA is found." This tends to obscure the fact that these transposons are only useful within the context of a normal gene. Even in cases where the actual transposons may be doing something useful, it's far from clear whether any individual element or the huge number of transposons present are actually required for the useful activity.
So, when I refer to junk DNA, I'm not referring to any specific DNA sequence (which may or may not be useful), but to the collective populations of several types of DNA sequences that, on average, appear to be junk. By extension, I'd say that a lot of the genome appears to be junk. Fortunately, we've reached the point where we can begin to test this experimentally; if I'm wrong about much of the genome being junk, you may see a mea culpa here in the future.