Designer genes that ensure a desired characteristic — say, blue eyes, genius level IQ, pearly white teeth or marathon running capabilities, perhaps? How about therapeutics that are personalised to your genetic makeup in order to be more effective? Ever wondered about growing synthetic organs in the laboratory for transplants? Or the prospect of solving a complex murder case with just one skin cell they happened to leave behind at the crime scene? Or simply, cows that give milk enriched for Vitamin D?
Today biology is moving at such a breakneck pace that none of the above scenarios seem absurd or even distant. In fact, some of them are even currently in practice.
Bioinformatics is the science of data management and computational analysis of biological data. Vast improvements, over the past two decades, in technology that support biological discovery have enabled us to make such strides as described above. However, this technology results in data outputs that can range from a few GB to over a terabyte. A bioinformaticist/computational biologist is an individual who has the know-how of how to optimally store and use this data and how to analyse it computationally. This is a person who has a basic understanding of the biology they are querying, sound knowledge of data-basing principles and the ability to write computer code to perform their analysis. Besides these, their skill sets usually include in-depth knowledge of the science behind their analysis techniques.
For example, a bioinformaticist who deals mainly with finding associations between genes and diseases should be well versed in statistical methodology while one who deals studying the structure of various proteins will require a good understanding of the physics and chemistry behind such studies.
Collaborative biology
The human genome project is a great example of the new collaborative biology that is becoming necessary today. Genetic materials in most higher organisms are made of DNA which is essentially a long string of nucleotides. DNA is organised into genes that translated into proteins and “junk” DNA that may contain regulatory elements for the genes. Proteins thus formed are the building blocks for the organismal form. The human genome has between 3098 to 3194 million nucleotide pairs and 23000 genes that code for proteins. Sequencing a full genome necessitates breaking it into smaller bits that a sequencing machine is able to handle. Once a molecular biologist is able to do this and retrieve sequences of millions of small chunks, a computer scientist is able to apply techniques from word matching and language processing to piece the chunks together in a meaningful way. A statistician is able to look at many genomes and provide a list of genes that may be of interest in the study of a disease. A physicist/chemist is able to computationally predict the structure and function of protein formed from the genome.
Chasing the genes
Say that you are interested in finding a gene that causes disease X in humans. Your hypothesis is that people with disease X have one version of the causative gene while healthy individuals have another version. To prove this hypothesis, scientific process demands that you demonstrate the difference by observation in a sampling of humans with and without the disease as well as prove through laboratory studies that converting the gene from the healthy to the diseased version indeed causes changes in your model system that you use in lieu of human subjects. A priori, if you have no idea of what your gene of interest might be, you can imagine a scientist spending most of their fruitful years chasing after the 23000 genes in the human genome! Complicating the matter is the fact that most diseases are complex and involve many genes working in tandem. The human body also has fail-safes built in, so that the other genes may be able to compensate for one malfunctioning gene making its discovery even harder. Recent developments enable us get a complete picture of all the genes at once. Microarray technology captures how almost all the human genes behave in a healthy individual and one with the disease of interest. We are then able to compare the profiles of these two individuals to find the gene(s) that act differently in disease and even see if and how this set of genes works as a network. Knowledge of the mechanism that causes a disease helps greatly in finding a suitable cure.
Another example: we are now able to compare genome samples from many different individuals to gain more insight into the differences that make each of us unique. Single points in the genome of an individual that might be different from the majority of the population are known as single nucleotide polymorphisms or SNPs (pronounced SNIPs). While getting the whole genome sequence of an individual is possible today, it is still very expensive. For as little as 300 USD, genetic testing companies determine specific SNPs in an individual’s genome. This is enough information for them to determine, the race, ethnicity, eye, skin and hair colour, propensity for weight gain, risk to various diseases and many other things about someone without any other information about this person. Those blue eyes in someone with the genetic code for brown eyes may not be far fetched at all!
Scope for research
My journey into bioinformatics started at Stella Maris College in Chennai with a B.Sc in Zoology. I found that genetics was an ideal platform for combining my twin loves of biology and mathematics. Bioinformatics was still a very new field then and I was fortunate to find a good master’s program at Stella Maris College. I proceeded to Iowa State University, USA for a Ph.D in bioinformatics and computational biology. As part of my doctoral studies, I researched the replication of HIV in the human body through statistical models. To hone my skills as a statistician, I simultaneously got a master’s in Statistics from ISU. I am currently a research associate at UCSF studying the genetics of asthma patients and their response to current and emergent therapies. While research in itself is a great option for those so inclined, a bioinformaticist is able to find employment at various stages of education. Currently, many Indian institutions offer B.Tech programs in Bioinformatics. Those with such a degree or a master’s in bioinformatics with strong coding skills are sought after by companies specialising in building and selling bioinformatics tools and software. Those with good acumen for statistics can likely find jobs in biotechnology companies as in-house statisticians.
The writer is a San Francisco-based computational biologist specialising in statistical genetics.
Email: mishar@gmail.com