Discovery of “Dark Matter” in Livestock Genomes

  • The genomes (DNA) of livestock animals each contain about 20,000 genes that code for proteins.
  • Livestock genomes additionally contain thousands of genes that do not code for proteins, and when active these genes produce long noncoding RNAs.
  • Most long noncoding RNAs are specific to a species, however about 50 were identified in all five species studied: human, mouse, cow, pig, and chickens.
  • Long noncoding RNAs are speculated to regulate the activities of protein-coding genes.
  • DNA marker-assisted breeding programs aimed at improving livestock production efficiencies, like getting more milk for less feed in dairy cows, could be enhanced by exploiting additional genetic markers directly associated with the controls of gene activity.


Paradoxes are uncomfortable. They remind us of how little we understand. Worse, it sometimes seems the more we know, the less we understand, and that’s a bitter-sweet paradox in itself. Nowhere are paradoxes more apparent than in our understanding of life, and in particular the scientific understanding of the encyclopedia of life—the genome present in every living cell. Many scientists conclude that without understanding these genomic paradoxes, humans cannot fully exploit the amazing potential of genetics to improve human health and enhance the efficiencies of livestock production systems. The latter occurs primarily through DNA marker-assisted selective breeding of livestock [1, 2]. This process exploits the genetic (DNA) variations present in a large population of a livestock species to help select for the high-performing animals that then go into breeding programs. The aim is to improve animal productivity in each generation. It’s a little like how a savings account grows with each year of interest.

Genomic Paradoxes Rule!

The genome contains all of the genetic material in a cell, which includes all genes. It is the blueprint for life, and it shouts out that all life is related. The genome contains a huge amount of encrypted information that is only partially deciphered [3]. But many scientists note that there are glaring paradoxes. First, how does the genome direct the form and function of a complex living organism, like a human or dairy cow, when the genome only contains about 21,000 genes that code for proteins, a similar number to that present in a tiny worm consisting of only a thousand cells [4, 5]? Second, only a minuscule 1.5% of the mammalian genome contains genes that code for proteins [1, 6, 7], the mainstay of cellular functions, and only about 8% of the genome, including these genes, is thought to be functional [8]. Yet, mammals have diligently carried the remaining 92% of their genomes on their respective chromosomal backs for millions of years of evolutionary history. For each mammalian species, the DNA sequence in this big chunk of the genome changed a lot with evolutionary time, but it was always there. Usually, when traveling on a long trek, non-essential baggage gets dumped early. What, if anything, does this 92% of the genome do? Third, the sizes of the genomes of multicellular life forms are unrelated to their biological complexities [9]. Surely, mammals like the dairy cow and in particular humans have some of the largest genomes. Not so. Embarrassingly, the humble single-celled amoeba living the quiet life in a pond has a genome size much larger than that of a cow or human [9]. Adding insult to injury, even toads and onions have genomes slightly larger than those of mammals [9]. The three big paradoxes are likely interrelated and scream the obvious; something is badly amiss in scientists’ understanding of genomes.

Discovering the “Dark Matter” in Livestock Genomes

Recently, a team of eleven scientists from the University of California at Davis applied a powerful new technology, called RNA-Seq, to detect tens of thousands of active genes in eight very different tissues taken from the cow, pig, and chicken [10]. The lead author was Colin Kern and the team’s intriguing results were published in the journal BMC Genomics [10]. In all, the investigators produced a staggering four billion pieces of scientific data! They also discovered more than they bargained for, perhaps much more: the “dark matter” of livestock genomes [11].

What Is an Active Gene?

An active gene is one that produces RNA, the first cousin of DNA. This process is called transcription. Often genes are active in one tissue but not another, or active in very early life but not in mature life (or vice versa). Other genes have mundane but essential house-keeping roles in all cells throughout life. In the transcription process, the coded molecular information in a gene’s DNA is transcribed and then processed into an RNA. It is a little like a wayward monk transcribing the bible in the middle ages who leaves out a few chapters and rearranges some verses here and there, but in the end, his finished book is still amazingly coherent. The molecular codes present in most RNAs contain detailed instructions for making specific proteins. Proteins are used to build structures in cells like molecular machines, communication systems, power plants, regulatory systems (traffic lights), repair systems, and transport systems. They even have a testy management group that keeps things on track and takes no nonsense. Multiple teams of scientists in the 1960s first deciphered the protein code in RNA, a milestone of human achievements [12]. Since that time, many scientists became obsessed with protein-coding genes and their transcribed RNAs; they were lured by the intriguing code of molecular order in the seeming chaos of life. But their obsession hid the larger picture.

The RNA World Is Big

Kern and colleagues, as well as many others, demonstrated that there is a great deal more going on in the RNA world than just producing RNAs coding for proteins [1, 2, 10, 13]. The breakthrough in this area of science was due to a new technology, RNA-Seq, that rapidly characterizes tens of thousands of RNAs and their abundances well before a much-anticipated mid-morning expresso the next morning. In the past, this type of measurement was laboriously performed for one specific RNA at a time and, importantly, the RNA had to be already known to exist; the scientists then were oblivious to the bigger RNA world. The new technology measures all RNAs! That’s also an enormous challenge for scientists to digest after their morning expresso when they find their computers full of massive quantities of nondescript data and rude complaints from a data manager.

At first, Kern and colleagues identified the active genes that coded for all of the proteins made in eight tissues from the cow, pig, and chicken. This was relatively easy for the investigators as these genes were already well-documented and many produced highly abundant RNAs in the tested tissues from all three species, i.e. there was a lot of experimental data for these genes. The numbers of protein-coding RNAs discovered were about the same for the cow, pig, and chicken tissues. Other scientists point out that most genes that encode proteins are also present in related animal species, and only a minority are unique to a species [14]. Thus, Kern and colleagues indicated that it is unlikely that the protein-coding genes alone could explain the enormous form and function differences of the cow, pig, and chicken. Kern and colleagues then discovered that there were considerable and unexpected complexities associated with how individual genes produce protein-coding RNAs in each species. Complexity starts to rear its head.

Next, the investigators looked very closely at the remaining RNAs, the ones that did not code for proteins [10]. They restricted their analysis to only long noncoding RNAs (the variety of tiny RNAs present in most tissues was not part of their analysis). The surprise for Kern and colleagues was that there were about 10,000 long noncoding RNAs produced in each of the cow, pig, and chicken species, and about half of these RNAs had never been previously detected in any species. The investigators had discovered the “dark matter” of the genome. They explained that these long noncoding RNAs were hidden for a long time because they were hard to find for a variety of reasons; long noncoding RNAs were present in cells in tiny amounts, many were unique to a particular species, and the few that were common to multiple species showed only weak relatedness in their RNA molecular codes.

By adding the long noncoding RNAs discovered in humans and mice to the cow, pig, and chicken RNAs, Kern and colleagues [10] then discovered a very special collection of about 50 long noncoding RNAs common to all five species. The investigators suggested that this small group of RNAs was very important for regulating the physical packing of the genome within chromosomes and by inference the evolutionarily ancient processes that are common to all of these animal species, i.e. helping to regulate the remarkable biological transformation from a fertilized egg to a complex animal. Kern and colleagues also noted the mystery surrounding the functions of most of the huge number of long noncoding RNAs that were uniquely present in each livestock species. They speculated that many of the long noncoding RNAs could regulate the activities of other genes, particularly, when, where, and how much of the protein-coding RNAs are produced. These collective results from Kern and colleagues clearly indicate that a much bigger fraction of the genome than 8% is functional and that there are many new elements of complexity associated with the regulation of gene activity that may underpin the huge variation in the biological complexities of animals.

The three genomic paradoxes are starting to crumble and reveal a new biological order. Kern and colleagues and others suggested that long noncoding RNAs primarily regulate the activities of protein-coding genes [1, 2, 10, 11, 13]. The investigators inferred that since many of these newly discovered RNAs were specific to one species, then they are likely to be important for the unique form and function of each species, and possibly contribute to some of the genetics-based production differences between individuals within a livestock species population. This genetics-based individual variation in a species population is the bread and butter of selective breeding programs widely used in livestock industries. The revelation inferred from the research of Kern and colleagues is that population variation for complex production traits within a livestock population, like milk quantity, feed efficiency, and muscle deposition, could be mostly about how gene activities are regulated rather than the genes themselves.


The pace of genetic improvement in livestock has markedly accelerated over the last 50 years as producers applied intensive selection pressure in their breeding programs to produce more productive animals. Perhaps the best example is the dairy cow. Scientists at the USDA calculated that from 1980 to 2015, milk production in the USA increased by about 60% and at the same time, the size of the national dairy herd decreased by about 16% [15]. This history of exceptional improvement in dairy industry efficiency also occurred elsewhere in the world and was achieved by better herd management, better nutrition and pastures, improved disease control, and importantly, continued genetic improvement of animals through intensive selective breeding for commercially desirable dairy production traits [16].

More recently, livestock genetic improvement programs have been accelerated by the application of DNA marker-assisted selective breeding of animals [17], i.e. using DNA variations in the genome to help select specific high-performing animals for entry into breeding programs. Some scientists suggest that this approach is limited as the hundreds of thousands of DNA markers used in these DNA marker-assisted breeding programs are largely not positioned in the genomic regions where all the biological action occurs. You always get a better picture closer to the action.

The research undertaken by Kern and colleagues [10], and others [2, 18, 19], hints that the efficiency of DNA marker-assisted selective breeding programs could be further improved if additional DNA markers were included that represented genetic variation in the very special regions of the genome that regulate the complexities of gene activity and include the multitude of newly discovered genes producing long noncoding RNAs [2]. This strategy could allow the livestock industries, especially the dairy industry, to further improve their efficiencies of production. Generating more from less is the name of the game in the livestock and poultry industries and this seeming paradoxical goal now may be even more likely to be achievable.

How wonderful that we have met with a paradox. Now we have some hope of making progress. (Niels Bohr)


1. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57-74.

2. Andersson L, Archibald AL, Bottema CD, Brauning R, Burgess SC, Burt DW, et al. Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional Annotation of Animal Genomes project. Genome Biol. 2015;16:57.

3. Tellam R. Family trio sings for genomic supper: International Milk Genomics Consortium; 2018 [Available from:].

4. Pertea M, Shumate A, Pertea G, Varabyou A, Breitwieser FP, Chang YC, et al. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol. 2018;19(1):208.

5. Spieth J, Lawson D, Williams G, Howe K. Overview of gene structure in C. elegans. 2014 [Available from:].

6. International Human Genome Sequencing C, Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860-921.

7. Consortium IHGS. Finishing the euchromatic sequence of the human genome. Nature. 2004;431(7011):931-945.

8. Rands CM, Meader S, Ponting CP, Lunter G. 8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage. PLoS Genet. 2014;10(7):e1004525.

9. Latorre A, Silva F. The size of the genome and the complexity of living beings 2013 [Available from:].

10. Kern C, Wang Y, Chitwood J, Korf I, Delany M, Cheng H, et al. Genome-wide identification of tissue-specific long non-coding RNA in three farm animal species. BMC Genomics. 2018;19(1):684.

11. Martin L, Chang HY. Uncovering the role of genomic “dark matter” in human disease. J Clin Invest. 2012;122(5):1589-1595.

12. Yanofsky C. Establishing the triplet nature of the genetic code. Cell. 2007;128(5):815-818.

13. Mercer TR, Dinger ME, Mattick JS. Long non-coding RNAs: insights into functions. Nat Rev Genet. 2009;10(3):155-159.

14. Elsik CG, Tellam RL, Worley KC, Gibbs RA, Muzny DM, Weinstock GM, et al. The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science. 2009;324(5926):522-528.

15. USDA. Agricultural production and prices: United States Department of Agriculture Economic Research Service; 2017 [].

16. Agriculture Victoria. Innovation doubles milk production: a review of pre-farm gate RD&E’s contribution 1980-2010. 2017 [Available from:].

17. Silva M, dos Santos D, Boison S, Utsunomiya A, Carmo A, Sonstegard T, et al. The development of genomics applied to livestock breeding. Livestock Science. 2014;166:66-75.

18. Nguyen QH, Tellam RL, Naval-Sanchez M, Porto-Neto LR, Barendse W, Reverter A, et al. Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data. Gigascience. 2018;7(3):1-17.

19. Naval-Sanchez M, Nguyen Q, McWilliam S, Porto-Neto LR, Tellam R, Vuocolo T, et al. Sheep genome functional annotation reveals proximal regulatory elements contributed to the evolution of modern breeds. Nat Commun. 2018;9(1):859.


Contributed by
Dr. Ross Tellam (AM)
Research Scientist
Brisbane, Australia