- Genomic technologies produce lots of data.
- Putting all this data together is a complex problem that requires using both mathematics and statistics, as well as computer science and informatics.
- Collating biological data and extracting the relevant information is referred to as systems biology.
- Using systems biology to analyze animal traits may enhance dairy cow production.
We live in an information-rich world. Each of us is capable of downloading gigabytes of data on our mobile or desktop devices each day. The upswing in data generation is also true of dairy science, which has moved into the big data realm. My students can create more data in an afternoon than I created in an entire PhD project when I was a student. Needless to say, capturing and analyzing this data is both challenging and rewarding. Since genomic data became more accessible, a number of approaches have been developed to bring the data together in useful ways (see e.g. [1,2]). Gradually these approaches have become more sophisticated and insightful. A recent study by Widmann et al.  provides a great example of how integrating different sources of large-scale genomic data can shed light on how dairy cows convert their feed into milk.
Animal feed represents a significant cost on the farm, so making sure that the feed is appropriate for the animal’s needs, and that the cow uses the feed efficiently, is a major concern to farmers. Widmann et al.  measured the feed intake and the impact it had on animal production traits in an experimental cattle herd. They set out to determine how the utilization of that feed by the cow was related to its genetic background. To do this, they developed an approach that used a range of related measurements, such as how much the cows ate above what they require for basic physiological functions, or how much energy was ingested. Using these measurements, they developed an integrated measure of all the things that might affect efficient use of the consumed feed.
The genetic type (genotype) of each cow was also measured with a test that covered the entire genome of each animal. They generated approximately 130 million data points from this genotyping test; and nearly 50,000 data points for daily weight gain and feed intake, plus another 50,000 data points for measurements of metabolites (the by-products of chemical reactions that occur in cells). Each measurement was analyzed for every genotype data point, generating a total of 13,035,000,000,000 (1.3×10^13) results. How can such a huge number of analyses be managed and interpreted? The scientists used a series of mathematical and statistical methods to condense the data and extract meaningful information. Generically, this approach to analysis of biological processes is referred to as systems biology.
They first analyzed two genes that had previously been implicated in metabolism and growth. Variants of these genes, known as NCAPG and GDF8, showed a significant effect on feed conversion in the tested animals. The scientists then assembled the very large data set for a multi-stage analysis. To do this, they first looked at those points that indicated that they could influence feed efficiency related traits. From there, they developed a table from data that showed how functionally connected the genes close to those genetic markers were. This information was incorporated into the next stage of analysis, which gave the scientists a list of genes organized in a hierarchy or network indicating the relative importance of the surrounding genetic regions. These networks reflected what was happening within the physiological systems that the animals used to convert feed into body weight (or by extrapolation to milk). The complex set of data was simplified with this method, but interpretation of the results still captured the complexity of the system.
Using this approach they discovered a highly interconnected network of genes that had a significant impact on the efficient use of feed by the animals in the herd. When looking closely at the network, it was apparent that there were two key drivers or controllers of many key genes: TP53 and TGFβ. This is an interesting finding because we know that TP53 is a gene that regulates many processes inside cells, and because of this powerful role, it causes cancer if it becomes mutated. TGFβ is a more surprising finding; it is more enigmatic and has a range of functions that seem to depend on where and when it is activated. When considering both the genes, and the complete networks, the findings are intriguing and will no doubt stimulate follow-up studies.
Ultimately, developing selective breeding programs that incorporate information based on complex systems will provide dairy cattle breeders and farmers with cows that reach their full potential for the efficient production of milk. This will provide these farmers with improvements that contribute to highly profitable and sustainable dairy farms.
1. Lemay DG, Neville MC, Rudolph MC, Pollard KS, German JB (2007) Gene regulatory networks in lactation: identification of global principles using bioinformatics. BMC Syst Biol 1: 56.
2. Wei J, Ramanathan P, Martin IC, Moran C, Taylor RM, Williamson P (2013) Identification of gene sets and pathways associated with lactation performance in mice. Physiol Genomics 45: 171-181.
3. Widmann P, Reverter A, Weikard R, Suhre K, Hammon HM, Albrecht E, Kuehn C (2015) Systems Biology Analysis Merging Phenotype, Metabolomic and Genomic Data Identifies Non-SMC Condensin I Complex, Subunit G (NCAPG) and Cellular Maintenance Processes as Major Contributors to Genetic Variability in Bovine Feed Efficiency. PLoS One 10: e0124574.
Professor Peter Williamson
Associate Professor, Physiology and Genomics
University of Sydney, Australia