MamExp (MammoXpressionist): an interactive gene expression browser interface for collaborative comparative lactation genomics

MamExp (MammoXpressionist): an
interactive gene expression browser interface for collaborative
comparative lactation genomics.

 

Christophe
Lefèvre

ITRI BioDeakin, Deakin University

 

Executive Summary:

Objectives:

The main objectives were to 1) provide better
integration of the gene expression database to facilitate access
through the IMGC web portal, 2) add new lactation gene expression
data from the public domain and, 3) implement more comprehensive
mapping of orthology relationships to improve the automatic retrieval
of expression data across experiments and species.

 

Progress:

1) Access to the gene expression database through the
IMGC web portal has been implemented. This provides a transparent
access for registered users of the IMGC web portal. Access is
provided to allow user to browse and query the database. The database
management is still provided by the database management team at
Deakin and, because data management requires a detailed technical
knowledge of database management tools and process, this is
preferable. However, knowledgeable users may request administrative
access and this could be arrange when necessary.

 

2) New lactation related data have been added to the
gene expression database. This includes data from experiments
conducted on human, mouse and bovine species. Some of the annotation
publicly available for the data had to be complemented when possible
with unigene identifiers in order to allow the mapping of gene across
species and the retrieval of gene expression across experiments and
species. Further annotation and integration of publicly available
data, including goat species and other experiments, are still ongoing
in our development version of the database. Once new data have been
tested they will be added and made visible in the database of the
IMGC portal. We anticipate that all lactation related data available
in the public domain will be made available before the IMGC symposium
in October.

 

3) Orthologous gene mapping is used to retrieve
orthologous genes across species. At the moment the mapping of gene
from one species onto another is done using the blast algorithm to
identify sequence similarities between genomes. Originally this was
conducted independently and the results of best similarity were
stored into an auxiliary database. This process is now conducted in
the database system, which automates this annotation process and
stores similarity results in the database. One advantage of this new
approach is to allow the automatic retrieval of gene expression for
gene families rather than the simple one-to-one mapping of gene. This
has now been implemented in our development and is being tested and
debugged. When the new code is validated, the option will be made
available on the IMGC database. We have chosen to use the unigene
database has a main reference to the gene mapping process as this is
available for most model species (human, mouse, cow) and microarray
are often designed from these references. However, sometimes this is
not available, for example with the goat. In such cases, one
advantage of the system is that it allows the mapping of available
sequence information onto the unigene database from other species and
we are evaluating this process to link the goat data to other
species. Another advantage of the approach is that it is not
absolutely dependant on the unigene database. In the future we plan
to deploy further mapping using different genome references such as
the ENSEMBL genome annotation data. The use of precompiled
orthologous maps from other provider and data curated, such as the
homologene database is still under consideration. Due to technical
and human resource limitations we have not been able to identify the
proper resources and processes to be used and we are still
investigated this aspect.

 

2. The significance and industry benefit from this work

 

The significance of this work for the research community
and the industry is the unique resource that the gene expression
database will provide to researchers in order to access and query a
large and rapidly expending gene expression data compendium from
experiments related to lactation in a more integrated and transparent
way. This will facilitate a more critical review of data and will
allow the development of complex queries over multiple experiments or
the comparison of gene expression in different species.

The interactive comparative genomic approach that the
system provide will enable a better understanding of gene regulation
during lactation in mammalian species and allow further exploration
and integration of the biological pathways involved in the lactation
process and the differences between species. This will allow a better
understanding of lactation biology, including milk evolution and
function, as well as specific issues for animal management, milk
production and milk processing, therefore capitalizing on gene
expression experiments conducted internationally to provide valuable
information to the dairy industry.