How to Find a Partial cDNA Sequence

an article added by: Donis F. at 11272007


In: Categories » Health » DNA » How to Find a Partial cDNA Sequence

Researchers find partial cDNA sequences by chemically breaking down copies of a cDNA molecule to create an array of fragments that differ in length by one base. In this process, the base at one end of each fragment is attached to one of four fluorescent dyes, the color of the dye depending on the identity of the base in that position. Machines then sort the labeled fragments according to size. Finally, a laser excites the dye labels one by one. The result is a sequence of colors that can be read electronically and that corresponds to the order of the bases at one end of the cDNA being analyzed. Partial sequences hundreds of bases in length can be pieced together in a computer to produce complete gene sequences.

Once we have made a cDNA, we can copy it to produce as much as we want. That means we will have enough material for determining the order of its bases. Because we know the rules that cells use to turn DNA sequences into the sequences of amino acids that constitute proteins, the ordering of bases tells us the amino acid sequence of the corresponding protein fragment. That sequence, in turn, can be compared with the sequences in proteins whose structures are known. This maneuver often tells us something about the function of the complete protein, because proteins containing similar sequences of amino acids often perform similar tasks.

Analyzing cDNA sequences used to be extremely time-consuming, but in recent years biomedical instruments have been developed that can perform the task reliably and automatically. Another development was also necessary to make our strategy feasible. Sequencing equipment, when operated on the scale we were contemplating, produces gargantuan amounts of data. Happily, computer systems capable of handling the resulting megabytes are now available, and we and others have written software that helps us make sense of this wealth of genetic detail.

Assembling the Puzzle

Our technique for identifying the genes used by a cell is to analyze a sequence of 300 to 500 bases at one end of each cDNA molecule. These partial cDNA sequences act as markers for genes and are sometimes referred to as expressed sequence tags. We have chosen this length for our partial cDNA sequences because it is short enough to analyze fairly quickly but still long enough to identify a gene unambiguously. If a cDNA molecule is like a article from a article, a partial sequence is like the first page of the articleit can identify the article and even give us an idea what the article is about. Partial cDNA sequences, likewise, can tell us something about the gene they derive from. At HGS, we produce about a million bases of raw sequence data every day.

Our method is proving successful: we have identified thousands of genes, many of which may play a part in illness. Other companies and academic researchers have also initiated programs to generate partial cDNA sequences.

HGS’s computers recognize many of the partial sequences we produce as deriving either from one of the genes researchers have already identified by other means or from a gene we have previously found ourselves. When we cannot definitely assign a newly generated partial sequence to a known gene, things get more interesting. Our computers then scan through our databases as well as public databases to see whether the new partial sequence overlaps something someone has logged before.

When we find a clear overlap, we piece together the overlapping partial sequences into ever lengthening segments called contigs. Contigs correspond, then, to incomplete sequences we infer to be present somewhere in a parent gene. This process is somewhat analogous to fishing out the phrases “a midnight dreary, while I pondered” and “while I pondered, weak and weary/Over many a. . . volume” and combining them into a fragment recognizable as part of Edgar Allan Poe’s “The Raven.”

At the same time, we attempt to deduce the likely function of the protein corresponding to the partial sequence. Once we have predicted the protein’s structure, we classify it according to its similarity to the structures of known proteins. Sometimes we find a match with another Homo sapiens protein, but often we notice a match with one from a bacterium, fungus, plant or insect: other organisms produce many proteins similar in function to those of Homo sapienss. Our computers continually update these provisional classifications.

In 1994, for example, we predicted that genes containing four specific contigs would each produce proteins similar to those known to correct mutations in the DNA of bacteria and yeast. Because researchers had learned that failure to repair mutations can cause colon cancer, we started to work out the full sequences of the four genes. When a prominent colon cancer researcher later approached us for help in identifying genes that might cause that illnesshe already knew about one such genewe were able to tell him that we were already working with three additional genes that might be involved.

Subsequent research has confirmed that mutations in any one of the four genes can cause life-threatening colon, ovarian or endometrial cancer. As many as one in every 200 people in North America and Europe carry a mutation in one of these mismatch repair genes, as they are called. Knowing this, scientists can develop tests to assess the mismatch repair genes in people who have relatives with these cancers. If the people who are tested display a genetic pre-disposition to illness, they can be monitored closely. Prompt detection of tumors can lead to lifesaving surgery, and such tests have already been used in clinical research to identify people at risk.

Our database now contains more than a million cDNA-derived partial gene sequences, sorted into 170,000 contigs. Overall, more than half of the new genes we identify have a resemblance to known genes that have been assigned a probable function. As time goes by, this proportion is likely to increase.

If a tissue gives rise to an unusually large number of cDNA sequences that derive from the same gene, it provides an indication that the gene in question is producing copious amounts of mRNA. That generally happens when the cells are producing large amounts of the corresponding protein, suggesting that the protein may be doing a particularly vital job. HGS also pays particular attention to genes that are expressed only in a narrow range of tissues, because such genes are most likely to be useful for intervening in diseases affecting those tissues. Of the many genes we have discovered, we have identified a percentage that seem especially likely to be medically important.

New Genes, New Medicines

Using the partial cDNA sequence technique for gene discovery, researchers have for the first time been able to assess how many genes are devoted to each of the main cellular functions, such as defense, metabolism and so on. The vast store of unique information from partial cDNA sequences offers new possibilities for medical science. These opportunities are now being systematically explored.

Databases such as ours have already proved their value for finding proteins that are useful as signposts of disease. Prostate cancer is one example. A widely used test for detecting prostate cancer measures levels in the blood of a protein called prostate specific antigen. Patients who have prostate cancer often exhibit unusually high levels. Unfortunately, slow-growing, relatively benign tumors as well as malignant tumors requiring aggressive therapy can cause elevated levels of the antigen, and so the test is ambiguous.

HGS and its partners have analyzed mRNAs from multiple samples of healthy prostate tissue as well as from benign and malignant prostate tumors. We found about 300 genes that are expressed in the prostate but in no other tissue; of these, about 100 are active only in prostate tumors, and about 20 are expressed only in tumors rated by pathologists as malignant. We and our commercial partners are using these 20 genes and their protein products to devise tests to identify malignant prostate disease. We have similar work under way for breast, lung, liver and brain cancers.

Databases of partial cDNA sequences can also help find genes responsible for rare diseases. Researchers have long known, for example, that a certain form of blindness in children is the result of an inherited defect in the chemical breakdown of the sugar galactose. A search of our database revealed two previously unknown Homo sapiens genes whose corresponding proteins were predicted to be structurally similar to known galactose-metabolizing enzymes in yeast and bacteria. Investigators quickly confirmed that inherited defects in either of these two genes cause this type of blindness. In the future, the enzymes or the genes themselves might be used to prevent the affliction.

Partial cDNA sequences are also establishing an impressive record for helping researchers to find smaller molecules that are candidates to be new treatments. Methods for creating and testing small-molecule drugsthe most common typehave improved dramatically in the past few years. Automated equipment can rapidly screen natural and synthetic compounds for their ability to affect a Homo sapiens protein involved in disease, but the limited number of known protein targets has delayed progress. As more Homo sapiens proteins are investigated, progress should accelerate. Our work is now providing more than half of SmithKline Beecham’s leads for potential products.

Databases such as ours not only make it easier to screen molecules randomly for useful activity. Knowing a protein’s structure enables scientists to custom-design drugs to interact in a specific way with the protein. This technique, known as rational drug design, was used to create some of the new protease inhibitors that are proving effective against HIV (although our database was not involved in this particular effort). We are confident that partial cDNA sequences will allow pharmacologists to make more use of rational drug design.

One example of how our database has already proved useful concerns cells known as osteoclasts, which are normally present in bone; these cells produce an enzyme capable of degrading bone tissue. The enzyme appears to be produced in excess in some disease states, such as osteoarthritis and osteoporosis. We found in our computers a sequence for a gene expressed in osteoclasts that appeared to code for the destructive enzyme; its sequence was similar to that of a gene known to give rise to an enzyme that degrades cartilage. We confirmed that the osteoclast gene was responsible for the degradative enzyme and also showed that it is not expressed in other tissues. Those discoveries meant we could invent ways to thwart the gene’s protein without worrying that the methods would harm other tissues. We then made the protein, and SmithKline Beecham has used it to identify possible therapies by a combination of high-throughput screening and rational drug design. The company has also used our database to screen for molecules that might be used to treat atherosclerosis.

One extremely rich lode of genes and proteins, from a medical point of view, is a class known as G-protein coupled receptors. These proteins span the cell’s outer membrane and convey biological signals from other cells into the cell’s interior. It is likely that drugs able to inhibit such vital receptors could be used to treat diseases as diverse as hypertension, ulcers, migraine, asthma, the common cold and psychiatric disorders. HGS has found more than 70 new G-protein coupled receptors. We are now testing their effects by introducing receptor genes we have discovered into cells and evaluating how the cells that make the encoded proteins respond to various stimuli. Two genes that are of special interest produce proteins that seem to be critically involved in hypertension and in adult-onset diabetes. Our partners in the pharmaceutical industry are searching for small molecules that should inhibit the biological signals transmitted by these receptors.

Last but not least, our research supports our belief that some of the Homo sapiens genes and proteins we are now discovering will, perhaps in modified form, themselves constitute new therapies. Many Homo sapiens proteins are already used as drugs; insulin and clotting factor for hemophiliacs are well-known examples. Proteins that stimulate the production of blood cells are also used to speed patients’ recovery from chemotherapy.

The proteins of some 200 of the full-length gene sequences HGS has uncovered have possible applications as medicines. We have made most of these proteins and have instituted tests of their activity on cells. Some of them are also proving promising in tests using experimental animals. The proteins include several chemokines, molecules that stimulate immune system cells.

Developing pharmaceuticals will never be a quick process, because medicines, whether proteins, genes or small molecules, have to be extensively tested. Nevertheless, partial cDNA sequences can speed the discovery of candidate therapies. HGS allows academic researchers access to much of its database, although we ask for an agreement to share royalties from any ensuing products.

The systematic use of automated and computerized methods of gene discovery has yielded, for the first time, a comprehensive picture of where different genes are expressedthe anatomy of Homo sapiens gene expression. In addition, we are starting to learn about the changes in gene expression in disease. It is too early to know exactly when physicians will first successfully use this knowledge to treat disease. Our analyses predict, however, that a number of the resulting therapies will form mainstays of 21st-century medicine.

legal notice

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

Useful tools and features

Link to this article from your page    Send this article to you or to a friend
If you like this article (tutorial), please link to it from your web page using the information above.

related articles

1. Where Science and Religion Meet
The combination of world-class scientific researcher, savvy political activist, federal program chief, and serious Christian, is not often found in one person. Yet that constellation of traits is vigorously expressed in Francis S. Collins. Collins leads the U.S. Human Genome Project, an ambitious effort to analyze the Homo sapiens genetic inheritance in its ultimate molecular detail. A physician by training, he became a scientific superstar in 1989, when he was a researcher at the University...

2. Haemophilus influenzae which can cause meningitis and deafness
J. Craig Venter, the voluble director of the Institute for Genomic Research (TIGR) in Rockville , Md. , is much in demand these days. A tireless self-promoter, Venter set off shock waves in the world of Homo sapiens genetics in May 1998 by announcing, via the front page of the New York Times, a privately funded $300-million, three-year initiative to determine the sequence of almost all the three billion chemical units that make up Homo sapiens DNA, o...

3. Deciphering the Code of Life
When historians look back at this turning of the millennium, they will note that the major scientific breakthrough of the era was the characterization in ultimate detail of the genetic instructions that shape a Homo sapiens being. The Human Genome Projectwhich aims to map every gene and spell out letter by letter the literal thread of life, DNAwill affect just about every branch of biology. The complete DNA sequencing of more and more organisms, including Homo sapienss, will answer many important questions, such as how organisms evolved,...

4. Discovering Genes for New Medicines
Most readers are probably familiar with the idea of a gene as something that transmits inherited traits from one generation to the next. Less well appreciated is that malfunctioning genes are deeply involved in most diseases, not only inherited ones. Cancer, atherosclerosis, osteoporosis, arthritis and Alzheimer’s disease, for example, are all characterized by specific changes in the activities of genes. Even infectious disease usually provokes the activation of identifiable genes in a patient’s immune system. Moreover, ac...

5. How to Make and Separate cDNA Molecules
Cells use messenger RNA to make protein. We discover genes by making complementary DNA (cDNA) copies of messenger RNA. First we have to clone and produce large numbers of copies of each cDNA, so there will be enough to determine its constituent bases. Molecular biologists have developed ways to insert cDNA into specialized DNA loops, called vectors, that can reproduce inside bacterial cells. A mixture of cDNAs from a given tissue is called a library. Researchers at HGS have now prepared Homo sapiens cDNA libraries from almost all n...

6. Origin of Species by Means of Natural Selection
The questions we do not yet have the wit to ask will be a growing preoccupation of science in the next 50 years. That is what the record shows. Consider the state of science more than a century ago, in 1899. Then, as now, people were reflecting on the achievements of the previous 100 years. One solid success was the proof by John Dalton in 1808 that matter consists of atoms. Another was the demonstration (by James Prescott Joule in 1851) that energy is indeed conserved and the earlier surmise (by French physicist Sadi Carnot) that the...

7. Several companies have sprouted up to provide bioinformatics tools
Unprecedented fanfare greeted the June 26, 2000 announcement that scientists had completed a draft of the Homo sapiens genome sequence. The truth is, however, that figuring out the order of the letters in our genetic alphabet was the easy part. Now comes the hard part: deciphering the meaning of the genetic instruction article. The next stage goes by a deceptively prosaic name: annotation. Strictly speaking, “annotation” comprises everything that can b...

8. The original plan was to repeat the sequencing more times
Correct errors and proofread. The original plan was to repeat the sequencing up to 12 times to prune away the mistakes that inevitably accompany a project involving 3.1 billion pieces of datum. In the rush to make the joint announcement, the privately funded Celera Genomics and the publicly funded international consortium Human Genome Project settled temporarily for le...

9. If the biotechnology company called Myriad Genetics
If the biotechnology company called Myriad Genetics has its way, thousands of healthy women in the U.S. will hear doubly bad news. First, a close relativeperhaps a sisterwill announce that she has breast cancer. Second, the patient’s physician thinks this particular cancer has probably been caused by a mutation that the healthy relative has an even chance of also carrying. This patient has been advised to suggest to all her female relatives that they be tested for t...

10. Burgeoning genetic revolution is already causing seismic reverberations
In spite of these problems, the burgeoning genetic revolution is already causing seismic reverberations in the business world. Pharmaceutical companies have staked hundreds of millions of dollars on efforts to discover genes connected to disease, because they could show the way to molecules that might then be good targets for drugs or diagnostic reagents. The prospect of commercial exploitation of the genome is motivating protests in some quarters. Most of the political flack is being taken by an initiative known as the Human...