To print: Click here or Select File and then Print from your browser's menu
Integration of bioinformatics resources: Critical need today
Thursday, July 31, 2003 08:00 IST
Our
Bureau, Mumbai
As the biotechnology industry worldwide is emerging
at a significant size and pace today, the resource pool that provides the
infinite data is so critical. At the same time, as the industry struggles to
deal with increasingly greater volumes of complex and often interrelated data,
the need for integration in the management of bioinformatics data is growing
more acute than ever. In order to maximize the usefulness and reusability of a
data source, a code of conduct for data providers has been formulated and the
principles of which are gaining widespread support globally.
Although
the basic elements now appear to be in place for truly integrated
bioinformatics, at the technological point of view, the services of worldwide
web seems to offer the infrastructure to support the integration and
accessibility of data that researchers need. Using this as the basic
architecture, there are serious attempts in every country to build a reliable
and stable network.
In India, the recently initiated National
Biotechnology Information System Network by the Department of Biotechnology
(BTISNET) linking 11 prestigious R&D institutions in the country is one of
such attempts in the world. This virtual private network initiated by DBT is to
enhance the activities of bioinformatics and support its national wide efforts
to develop human resources in bioinformatics in a big way.
The DBT
effort is to be of much help to R&D driven biotechnology companies who are
going for collaborative research with the major distribution information centers
that are to be linked by the network. According to sources involved in this
project the institutions that are to share bioinformatics related information
and be part of the intranet will be able to share over 100 databases, and help
more than 10,000 users to access the data from all parts of the country. It can
also help distributing various study material needed for
bioinformatics.
Basically, bioinformatics that has a key role to play in
all genomics, proteomics and sequencing related activities, deals with the whole
gamut of biological data. It covers the development of data analysis tools,
modeling of biological macromolecules and other complexes and has applications
in metabolic pathways as well as in designing of new drug molecules, peptide
vaccines, proteins etc.
In this context, a recent thought by world
experts in bioinformatics to have a Code of Conduct for biological data
providers would allow an easy integration of bioinformatics resources. Their
code provides a solid framework within which both providers and consumers of
data can exchange the data and develop beneficial relationships within the
bioinformatics community.
The six tenets of this code of conduct can be
summarized as reuse of existing code and make use of open source resources, use
of existing data formats to avoid reinventing the wheel, designing simple and
sensible new data formats and avoid proprietary binary data types, better
understanding of interfaces between an application and the data source represent
formal agreements between the data provider and the data consumer, encouraging
choice for data consumers when designing interfaces and supporting adhoc
queries.
At present there are several technologies are available to
advance the goals of this code of conduct. Database federations and data
warehouses traditionally have been used to integrate disparate data sources.
Yet, the advent of web services offers new possibilities above and beyond what
the traditional methods are capable of. A database federation can have a global
(federation) schema that provides users with a uniform view of all databases in
the federation and thus insulates them from the component databases. For
example, if a user runs a query through a federation comprising 10 databases,
results will be received as if from a single database in a common format, rather
than 10 different sets of results.
A data warehouse represents the
materialization of a global scheme in that it is loaded periodically with data
from component databases. It organizes these disparate databases into a data
warehouse, with or without a common schema. Some of the more established
examples include genomic unified schema (GUS) - a data warehouse that attempts
to predict protein function based on protein domains - and EnsEMBL, a
collaborative project of the European Molecular Biology Laboratory (EMBL,
Heidelberg, Germany), European Bioinformatics Institute (EBI, Cambridge, United
Kingdom) and the Sanger Center (Cambridge, United Kingdom) that automatically
tracks sequenced fragments of the human genome and assembles them into longer
stretches.
Web services are intended to enable the exchange of data among
heterogeneous systems in humanreadable, platformneutral XML message form. Web
services architecture represents an attempt to allow remote access of data and
application logic in a loosely combined fashion. Previous attempts of achieving
this, such as with distributed COM (DCOM) and Java/Remote method invocation
(RMI), required a tight integration between the client and server and used
platform and implementation specific binary data formats.
Since the
subject area of bioinformatics can be defined as the application of techniques
from computer science to solve problems in molecular biology. This exciting area
is a relatively young field, and the pace of research is driven by the large and
rapidly increasing amount of data being produced from, for example, efforts to
sequence the genomes of a variety of organisms. The areas where computer science
can be applied range from assembly of sequence fragments, analysis of DNA, RNA
and protein sequences, prediction and analysis of protein sequence and function,
and the analysis and simulation of general metabolic function and regulation. In
the words of Dr. Hwa A. Lim (HAL), Chairperson and CEO of D'Trends, Inc,
bioinformatics is certainly not number-crunching for molecular biologists, but
is about the application of techniques from computer science such as modelling,
simulation, data abstraction, data manipulation and pattern discovery techniques
in order to analyse biological data. The data generated by the experimental
scientists requires annotation and detailed analysis in order to turn it into
knowledge which can then be applied to improving health care via, for example,
new drugs and gene therapy, medical practices, food production - all of which
are now high-profile issues nationally.
- (Content Courtesy:
D'Trends, Inc and PharmaGenomics)