Developing a Better Cattle Reference Genome

  • The original cattle genome has had many research and commercial applications since its release in 2009, but technological limitations meant that it was far from perfect.
  • A new study describes the release of an improved cattle reference genome created using the latest technology.
  • The new genome is 10 times more accurate than the previous one and has fewer gaps, which could help improve genetic selection of cattle and basic research on cattle.


Cows are one of our major domestic animals, with about 1.4 billion domesticated cattle being raised for meat and dairy all over the world [1]. Humans have long drawn from the existing genetic variation in cattle populations to select a variety of breeds with useful traits [2]. The sequencing of the cattle genome enhanced the selection of cattle by allowing the use of genomic tools to select traits [3-5].

“Cattle are a very important agricultural species both for beef production and dairy production worldwide,” says Dr. Monique Rijnkels of Texas A&M University. “To understand the biology, but also to be able to select the most productive and efficient animals, having a good genome is important,” she says. “There are so many applications for this kind of selection, whether it’s for production traits like more milk or more fats or more protein, or being tolerant of warmer climates, being resistant to certain diseases, better quality beef, faster growth, all those kinds of things,” says Rijnkels.

“Ever since the first genome was put together, people have really adopted this sort of genomic selection approach,” says Rijnkels. “But if you think a certain genetic marker is somewhere connected to some gene that confers an important trait, but that turns out not to be correct because the genome assembly is not correct, then when you select for that region in the genome you very soon lose the connection with the gene because they were never really connected,” she says.

A new study describes the release of an improved cattle reference genome assembly built using the latest technology [6]. “The original cattle genome project was a large, multi-institutional effort that cost tens of millions of dollars,” says Dr. Benjamin Rosen of the USDA Agricultural Research Service, one of the researchers who conducted the new study. “It was updated over the years, but improvements in sequencing technology and assembly algorithms made it clear that we could start from scratch and generate a better product at a fraction of the cost and effort,” he says. “The new assembly used only a couple of instrument platforms and a budget 400-fold lower, with sequence data collection performed in a single small laboratory,” says Rosen. “My collaborators Juan Medrano and Tim Smith marshalled the resources and generated the data while I drove the more technical aspects of assembling the genome,” he says. The new genome assembly’s name, ARSUCD1.2, comes from the affiliations of Tim Smith from the Agricultural Research Service (ARS) and Juan Medrano from the University of California at Davis (UCD).

“The first bovine assembly—UMD3.1—was released in 2009, and from that time there have been significant advances in sequencing technology and technologies to assemble a genome, as well as improved bioinformatics platforms that allow the development of significantly improved, more continuous assemblies,” says Medrano. “Having a continuous and reliable genome reference assembly is fundamental for all aspects of genomics research,” he says. Using newer sequencing technology resulted in fewer gaps in the new genome and greater accuracy. “The new assembly was improved by more than 200-fold in continuity and 10-fold in accuracy,” says Medrano.

“Assembling a genome is similar to putting together a very large puzzle,” says Rosen. “The most important difference between the new assembly and the previous reference is in the continuity of the genome, i.e., how many pieces your genome is broken up into,” he says. “The old assembly was made up of more than 72,000 pieces while the new assembly contains 345, making it more than 200-fold more continuous,” says Rosen. “This is a direct result of improvements in sequencing and assembly methods,” he says.

“Before this we had several iterations of an assembly based on sequences that were derived with older technology and there were substantial issues with that assembly,” says Rijnkels, who was not involved in the new study. “There were lots of gaps and there were a lot of mis-assemblies and so they improved significantly on that by basically going back to the drawing board,” she says. “They used new sequence technologies and improved alignment or assembly algorithms to put it all together in a more continuous way and with improved base accuracy,” says Rijnkels. “That has all kinds of advantages over what we had before,” she says.

The improvements allowed for better gene annotations, and should allow researchers to be more certain about the location of the genes and genetic markers that are used for basic research and genetic selection. “New extensive gene expression data and the availability of more and longer transcripts for gene placement and orientation of sequences across gaps had a large impact on significantly improving gene annotation,” says Medrano.

The improved genome has many potential applications. “An improved assembly has benefits for identification, reconstruction and fine mapping of loci important for production and health traits, for the identification of regulatory regions of genes, and for fine mapping genes associated with economically important traits,” says Medrano. “The improved annotation corrected mis-assembled regions of the genome, identified missing genes, and allowed the reconstruction of complex, highly repetitive regions of the genome,” he says [7].

“That has implications for genomic selection because now we have more confidence in where genes and genetic markers are,” says Rijnkels. The new genome enables researchers to create more accurate gene models and more accurately determine the genetic variants associated with different traits that researchers want to select for. “Having a more accurate genome makes it much more reliable now to select for variations and also to then try to understand mechanisms that underlie these variations,” says Rijnkels. “This really will help us understand the biology and help in the genetic selection, and help in every aspect of understanding cattle biology,” she says.

The new reference genome sequence is already available in the GenBank repository. “The new reference has been publicly available and in use since April 2018,” says Rosen. “It is utilized by a very broad and diverse research community across academia, government and industry around the world,” he says. “Millions of animals have been genotyped across various platforms and the improved accuracy of the genome will allow for better translation between the various platforms,” says Rosen.

“Immediately after ARSUCD1.2 was released with an inherent increased accuracy, it was adopted in December 2018 as the cattle reference genome by the US genomic evaluation system and by the 1000 Bull Genomes Project, which is a large database of genetic variants for genomic prediction, and practically by all the cattle genomics community,” says Medrano [8,9].

“It’s good that now there is a paper that serves as a landmark that this genome is out there for everybody,” says Rijnkels. “The community is really happy with this new reference genome, and the paper allows those maybe not following the field so much to know that we do have an improved assembly and we encourage everybody to use it because it really is better and more accurate,” she says.

The new genome was built using the same animal, the Hereford cow L1 Dominette, that was used for the previous cattle reference genome [5]. “I think that makes it a lot easier to go back and reuse data that was already analyzed against the old genome, so you don’t have to worry about whether differences you see are because of using data from a different animal,” says Rijnkels. “That was a good call,” she says.

The new study also improved on the previous one by choosing a different source of DNA. Where the original Hereford assembly used blood as the source of DNA, the new one uses genomic DNA extracted from frozen lung tissue as the source. Specific genomic regions, particularly those with important immune function loci, undergo rearrangement in blood cells that can make it hard to properly align and organize them in a genome assembly. “I think that was a good call too to use lung tissue, as the genetic structure is a little bit more stable and it makes research into these immune loci more accessible,” says Rijnkels.

“Extracting DNA from lung rather than blood was done intentionally to help us better assemble immune gene clusters,” says Rosen. “An important outcome of this assembly is the ability to better interrogate immune loci to identify variants affecting health traits,” he says.

By providing a 200-fold improvement in sequence continuity and a 10-fold improvement in per-base accuracy over previous cattle assemblies, the new cattle reference genome promises to serve as a solid foundation for a new era of basic research and genetic selection in cattle.


1. Robinson T.P., Wint G.R., Conchedda G., Van Boeckel T.P., Ercoli V., Palamara E., Cinardi G., D’Aietti L., Hay S.I., Gilbert M. Mapping the global distribution of livestock. PLoS One. 2014 May 29;9(5):e96084.

2. Weigel K.A., VanRaden P.M., Norman H.D., Grosu H. A 100-Year Review: Methods and impact of genetic selection in dairy cattle-From daughter-dam comparisons to deep learning algorithms. J Dairy Sci. 2017 Dec;100(12):10234-50.

3. Saatchi M., Schnabel R.D., Rolf M.M., Taylor J.F., Garrick D.J. Accuracy of direct genomic breeding values for nationally evaluated traits in US Limousin and Simmental beef cattle. Genet Sel Evol. 2012 Dec 7;44:38.

4. García-Ruiz A., Cole J.B., VanRaden P.M., Wiggans G.R., Ruiz-López F.J., Van Tassell C.P. Changes in genetic selection differentials and generation intervals in US Holstein dairy cattle as a result of genomic selection. Proc Natl Acad Sci U S A. 2016 Jul 12;113(28):E3995-4004.

5. The Bovine Genome Sequencing and Analysis Consortium, Elsik C.G., Tellam R.L., Worley K.C. The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science 2009;324:522–8.

6. Rosen B.D., Bickhart D.M., Schnabel R.D., Koren S., Elsik C.G., Tseng E., Rowan T.N., Low W.Y., Zimin A., Couldrey C., Hall R., Li W., Rhie A., Ghurye J., McKay S.D., Thibaud-Nissen F., Hoffman J., Murdoch B.M., Snelling W.M., McDaneld T.G., Hammond J.A., Schwartz J.C., Nandolo W., Hagen D.E., Dreischer C., Schultheiss S.J., Schroeder S.G., Phillippy A.M., Cole J.B., Van Tassell C.P., Liu G., Smith T.P.L., Medrano J.F. De novo assembly of the cattle reference genome with single-molecule sequencing. Gigascience. 2020 Mar 1;9(3). pii: giaa021.

7. Schwartz J.C., Gibson M.S., Heimeier D., Koren S., Phillippy A.M., Bickhart D.M., Smith T.P., Medrano J.F., Hammond J.A. The evolution of the natural killer complex; a comparison between mammals using new high-quality genome assemblies and targeted annotation. Immunogenetics. 2017 Apr;69(4):255-69.

8. 1000 Bull Genomes Project. Accessed on 11 May 2020.

9. Null D.J., VanRaden P.M., Rosen B.D., O’Connell J.R., Bickhart D.M. Using the ARSUCD1.2 reference genome in U.S. evaluations. Interbull Bull 2019;55:30–4.


Contributed by
Dr. Sandeep Ravindran
Freelance Science Writer