Age of the deciphering of the Homo sapiens genome

an article added by: Donis F. at 11272007


DNA :: Age of the deciphering of the Homo sapiens genome ::

 French | Spanish | Portuguese | Italian | German | Japanese | Chinese | Korean | Russian | Arabic Bookmark and Share

“Plastics.” When a family friend whispered this word to Dustin Hoffman’s character in the 1967 film The Graduate, he was advocating not just a novel career choice but an entirely different way of life. If that movie were made today, in the age of the deciphering of the Homo sapiens genome, the magic word might well be “bioinformatics.”

Corporate and government-led scientists have already compiled the three gigabytes of paired A’s, C’s, T’s and G’s that spell out the Homo sapiens genetic codea quantity of information that could fill more than 2,000 standard computer diskettes. But that is just the initial trickle of the flood of information to be tapped from the Homo sapiens genome. Researchers are generating gigantic databases containing the details of when and in which tissues of the body various genes are turned on, the shapes of the proteins the genes encode, how the proteins interact with one another and the role those interactions play in disease. Add to the mix the data pouring in about the genomes of so-called model organisms such as fruit flies and mice, and you have what Gene Mayers, Jr., vice president of informatics research at Celera Genomics in Rockville, Md., calls “a tsunami of information.” The new discipline of bioinformaticsa marriage between computer science and biologyseeks to make sense of it all. In so doing, it is destined to change the face of biomedicine.

“For the next two to three years, the amount of information will be phenomenal, and everyone will be overwhelmed by it,” Myers predicts. “The race and competition will be who can mine it best. There will be such a wealth of riches.”

A whole host of companies are vying for their share of the gold. Jason Reed of the investment banking firm Oscar Gruss & Son in New York City estimates that bioinformatics could be a $2-billion business by 2005. He has complied information on more than 50 private and publicly traded companies that offer bioinformatics products and services. These companies plug into the effort at various points: collecting and storing data, searching databases, and interpreting the data. Most sell access to their information to pharmaceutical and biotechnology companies for a hefty subscription price that can run into the millions of dollars.

The reason drug companies are so willing to line up and pay for such servicesor to develop their own expensive resources in-houseis that bioinformatics offers the prospect of finding better drug targets earlier in the drug development process. This efficiency could trim the number of potential therapeutics moving through a company’s clinical testing pipeline, significantly decreasing overall costs. It could also create extra profits for drug companies by whittling the time it takes to research and develop a drug, thus lengthening the time a drug is on the market before its patent expires.

“Assume I’m a pharmaceutical company and somebody can get [my] drug to the market one year sooner,” explains Stelios Papadopoulos, managing director of health care at the New York investment banking firm SG Cowen. “It could mean you could grab maybe $500 million in sales you would not have recovered.”

Before any financial windfalls can occur, however, bioinformatics companies must contend with the current plethora of genomic data while constantly refining their technology, research approaches and business models. They must also focus on the real challenge and opportunityfinding out how all the shards of information relate to one another and making sense of the big picture.

“Methods have evolved to the point that you can generate lots of information,” comments Michael R. Fannon, vice president and chief information officer of Human Genome Sciences, also in Rockville. “But we don’t know how important that information is.”

Divining that importance is the job of bioinformatics. The field got its start in the early 1980s with a database called GenBank, which was originated by the U.S. Department of Energy to hold the short stretches of DNA sequence that scientists were just beginning to obtain from a range of organisms. In the early days of GenBank a roomful of technicians sat at keyboards consisting of only the four letters A, C, T and G, tediously entering the DNA-sequence information published in academic journals. As the years went on, new protocols enabled researchers to dial up GenBank and dump in their sequence data directly, and the administration of GenBank was transferred to the National Institutes of Health’s National Center for Biotechnology Information (NCBI). After the advent of the World Wide Web, researchers could access the data in GenBank for free from around the globe.

Once the Human Genome Project (HGP) officially got off the ground in 1990, the volume of DNA-sequence data in GenBank began to grow exponentially. With the introduction in the 1990s of high-throughput sequencingan approach using robotics, automated DNA-sequencing machines and computersadditions to GenBank skyrocketed. GenBank held the sequence data on more than seven billion units of DNA by July 2000.

Around the time the HGP was taking off, private companies started parallel sequencing projects and established huge proprietary databases of their own. Today companies such as Incyte Genomics in Palo Alto, Calif., can determine the sequence of approximately 20 million DNA base pairs in just one day. And Celera Genomics says that it has 50 terabytes of data storage. That’s equivalent to roughly 80,000 compact discs, which in their plastic cases would take up almost half a mile of shelf space.

But GenBank and its corporate cousins are only part of the bioinformatics picture. Other public and private databases contain information on gene expression (when and where genes are turned on), tiny genetic differences among individuals called single-nucleotide polymorphisms (SNPs), the structures of various proteins, and maps of how proteins interact.

Mixing and Matching

One of the most basic operations in bioinformatics involves searching for similarities, or homologies, between a newly sequenced piece of DNA and previously sequenced DNA segments from various organisms. Finding near-matches allows researchers to predict the type of protein the new sequence encodes. This not only yields leads for drug targets early in drug development but also weeds out many targets that would have turned out to be dead ends.

A popular set of software programs for comparing DNA sequences is BLAST (for Basic Local Alignment Search Tool), which first emerged in 1990. BLAST is part of a suite of DNA-and protein-sequence search tools accessible in various customized versions from many database providers or directly through NCBI. NCBI also offers Entrez, a so-called metasearch tool that covers most of NCBI’s databases, including those housing three dimensional protein structures, the complete genomoes of organisms such as yeast, and references to scientific journals that back up the database entries.

An early example of the utility of bioinformatics is cathepsin K, an enzyme that might turn out to be an important target for treating osteoporosis, a crippling disease caused by the breakdown of bone. In 1993 researchers at SmithKline Beecham, based in Philadelphia, asked scientists at Human Genome Sciences to help them analyze some genetic material they had isolated from the osteoclast cells of people with bone tumors. (Osteoclasts are cells that break down bone in the normal course of bone replenishment; they are thought to be overactive in individuals with osteoporosis.)

Human Genome Sciences scientists sequenced the sample and conducted database homology searches to look for matches that would give them a clue to the proteins that the sample’s gene sequences encoded. Once they found near-matches for the sequences, they carried out further analyses and discovered that one sequence in particular was overexpressed by the osteoclast cells and that it matched those of a previously identified class of molecules: cathepsins.

For SmithKline Beecham, that exercise in bioinformatics yielded in just weeks a promising drug target that standard laboratory experiments could not have found without years and a pinch of luck. Company researchers are now trying to find a potential drug that blocks the cathepsin K target. Searches for compounds that bind to and have the desired effect on drug targets still take place mainly in a biochemist’s traditional “wet” lab, where evaluations for activity, toxicity and absorption can take years. But with new bioinformatics tools and growing amounts of data on protein structures and biomolecular pathways, some researchers say, this aspect of drug development will also shift to computers, in what they term “in silico” biology.

legal disclaimer

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

related articles

1. Discovering Genes for New Medicines
Most readers are probably familiar with the idea of a gene as something that transmits inherited traits from one generation to the next. Less well appreciated is that malfunctioning genes are deeply involved in most diseases, not only inherited ones. Cancer, atherosclerosis, osteoporosis, arthritis and Alzheimer’s disease, for example, are all characterized by specific changes in the activities of genes. Even infectious disease usually provokes the activation of identifiable genes in a patient’s immune system. Moreover, ac...

2. How to Make and Separate cDNA Molecules
Cells use messenger RNA to make protein. We discover genes by making complementary DNA (cDNA) copies of messenger RNA. First we have to clone and produce large numbers of copies of each cDNA, so there will be enough to determine its constituent bases. Molecular biologists have developed ways to insert cDNA into specialized DNA loops, called vectors, that can reproduce inside bacterial cells. A mixture of cDNAs from a given tissue is called a library. Researchers at HGS have now prepared Homo sapiens cDNA libraries from almost all n...

3. How to Find a Partial cDNA Sequence
Researchers find partial cDNA sequences by chemically breaking down copies of a cDNA molecule to create an array of fragments that differ in length by one base. In this process, the base at one end of each fragment is attached to one of four fluorescent dyes, the color of the dye depending on the identity of the base in that position. Machines then sort the labeled fragments according to size. Finally, a laser excites the dye labels one by one. The result is a sequence of colors that can be read electronically and that corresponds ...

4. Origin of Species by Means of Natural Selection
The questions we do not yet have the wit to ask will be a growing preoccupation of science in the next 50 years. That is what the record shows. Consider the state of science more than a century ago, in 1899. Then, as now, people were reflecting on the achievements of the previous 100 years. One solid success was the proof by John Dalton in 1808 that matter consists of atoms. Another was the demonstration (by James Prescott Joule in 1851) that energy is indeed conserved and the earlier surmise (by French physicist Sadi Carnot) that the...

5. Several companies have sprouted up to provide bioinformatics tools
Unprecedented fanfare greeted the June 26, 2000 announcement that scientists had completed a draft of the Homo sapiens genome sequence. The truth is, however, that figuring out the order of the letters in our genetic alphabet was the easy part. Now comes the hard part: deciphering the meaning of the genetic instruction article. The next stage goes by a deceptively prosaic name: annotation. Strictly speaking, “annotation” comprises everything that can b...

6. The original plan was to repeat the sequencing more times
Correct errors and proofread. The original plan was to repeat the sequencing up to 12 times to prune away the mistakes that inevitably accompany a project involving 3.1 billion pieces of datum. In the rush to make the joint announcement, the privately funded Celera Genomics and the publicly funded international consortium Human Genome Project settled temporarily for le...

7. If the biotechnology company called Myriad Genetics
If the biotechnology company called Myriad Genetics has its way, thousands of healthy women in the U.S. will hear doubly bad news. First, a close relativeperhaps a sisterwill announce that she has breast cancer. Second, the patient’s physician thinks this particular cancer has probably been caused by a mutation that the healthy relative has an even chance of also carrying. This patient has been advised to suggest to all her female relatives that they be tested for t...

8. Burgeoning genetic revolution is already causing seismic reverberations
In spite of these problems, the burgeoning genetic revolution is already causing seismic reverberations in the business world. Pharmaceutical companies have staked hundreds of millions of dollars on efforts to discover genes connected to disease, because they could show the way to molecules that might then be good targets for drugs or diagnostic reagents. The prospect of commercial exploitation of the genome is motivating protests in some quarters. Most of the political flack is being taken by an initiative known as the Human...