Physical mapping of genomes

A further increase in mapping resolution is accomplished by manipulating cloned DNA fragments directly. Because DNA is the physical material of the genome, the procedures are generally called physical mapping. One goal of physical mapping is to identify a set of overlapping cloned fragments that together encompass an entire chromosome or an entire genome. The resulting physical map is useful in three ways. First, the genetic markers carried on the clones can be ordered and hence contribute to the overall genome mapping process. Second, when the contiguous clones have been obtained, they represent an ordered library of DNA sequences that can be exploited for future genetic analysis --- for example, to correlate mutant phenotypes with disruptions of specific molecular regions. Third, these clones form the raw material that will be sequenced in large-scale genome projects.

In the preparation of physical maps of genomes, vectors that can carry very large inserts are naturally the most useful. Cosmids, YACs (yeast artificial chromosomes), BACs (bacterial artificial chromosomes), and PACs (phage P1-based artificial chromosomes) have been the main types. Cosmids and YACs were introduced in Chapters 12 and 13 . BACs ( Figure 14-12 ) are based on the 7-kb F plasmid of E. coli. Recall that F can carry large fragments of E. coli DNA as F′ derivatives ( Chapter 7 ). In a similar manner, as cloning vectors, they can also carry inserts of fragments of foreign DNA as large as 300 kb, although the average is about 100 kb. PACs are produced by a type of engineering similar to that of phage P1; they carry inserts comparable to those of BACs.

Although the maximum insert sizes of BACs and PACs are not as large as those of YACs, the former types have several advantages over YACs. First, they can be amplified in bacteria and isolated and manipulated simply with basic bacterial plasmid technology. Second, BACs and PACs form fewer hybrid inserts than YACs do. Hybrid inserts are composed of several different fragments; their presence can thwart attempts to order the clones.

However, despite these useful vectors, the task of genomic cloning is a daunting one. Even so-called small genomes still contain huge amounts of DNA. Consider, for example, the 100-Mb genome of the tiny nematode Caenorhabditis elegans; because an average cosmid insert is about 40 kb, at least 2500 cosmids would be required to embrace this genome, and many more would be required to narrow the number to such a complete set. YACs can contain on the order of 1 Mb, so here the task is somewhat simpler.

Cloning a whole genome begins by amassing a large number of randomly cloned inserts. The contents of these clones must be characterized in some way, and overlaps must be determined. A set of overlapping clones is called a contig. In the early phases of a genome project, contigs are numerous and represent cloned “islands” of the genome. But, as more and more clones are characterized, contigs enlarge and merge into one another, and eventually the project should end up with a set of contigs that equals the number of chromosomes.




Figure 14-12. Structure of a bacterial artificial chromosome (BAC), used for cloning large fragments of donor DNA. CMR is a selectable marker for chloramphenicol resistance. oriS, repE, parA, and parB are F genes for replication and regulation of copy number. cosN is the cos site from λ phage. HindIII and BamHI are cloning sites at which foreign DNA is inserted. The two promoters are for transcribing the inserted fragment. The NotI sites are used for cutting out the inserted fragment.

Ordering by FISH.

If good chromosomal landmarks are known, FISH analysis can be used to locate the approximate positions of the large inserts. Figure 14-14 shows results of a FISH analysis that generates a rough ordering of BACs and PACs clones in human chromosomes.  FISH je metoda za stavljanje u poredak BAC i PAC klonove.

Ordering by clone fingerprints.

The genomic insert carried by a vector has its own unique sequence, which can be used to generate a DNA fingerprint. For example, a multiple restriction-enzyme digestion can generate a set of bands whose number and positions are a unique “fingerprint” of that clone. The different bands generated by separate clones can be aligned either visually or by using a computer program to determine if there is any overlap between the inserted DNAs. In this way, the contig can be built up.top link

Ordering by sequence-tagged sites.

Unique short sequences of large cloned inserts can be used as tags to align the various clones into contigs. For example, if clone A has tags 1 and 2 and clone B has tags 2 and 3, clones A and B must overlap in the region of tag 2. The practical procedure is to amass a large set of random clones with small genomic inserts (say, in λ phage) and sequence short regions of each. From these sequences, pairs of PCR primers are designed that will amplify the short specific sequence of DNA flanked by the primers. These short DNA sequences are known as sequence-tagged sites (STSs). Even though initially the location of these STSs in the genome is not known, a panel of many STSs can be used to characterize clones with large genomic inserts (such as YAC clones). The clones that are shown to have specific STSs in common must have overlapping inserts and therefore can be aligned into contigs. An example of this process is shown in Figure 14-15 .

Short stretches of sequence are sometimes obtained from cDNA clones. These stretches are known as expressed sequence tags (ESTs). ESTs are obtained by sequencing into the cDNA insert by using a primer based on the vector sequence. They can be used to align the cDNAs on the contig, thus anchoring the gene map to the physical map. Further, if part of the open reading frame (ORF) of the transcript is contained within the EST, the “virtual” translation of the ORF can provide a “sneak preview” of the function of the protein encoded by the mRNA from which the cDNA was derived.

The combination of these physical methods has resulted in the cloning of whole genomes of several organisms. For example, the C. elegans genome is now available as sets of cosmid or YAC contigs. Furthermore, the DNA of the contigs has been arranged on nitrocellulose filters in ordered arrays; so, to find out where a specific piece of DNA of interest lies in the genome, that DNA is used as a probe on the contig filters, and a positive hybridization signal announces the precise location of the DNA ( Figure 14-16 ).top link


Figure 14-15. Using sequence-tagged sites (STSs) to order overlapping clones (YACs, in this example) into a contig. Five different YACs are tested to determine which STSs they contain (top), and these data are used to assemble a physical map (bottom).


An example: cloning and mapping the human Y chromosome.

Several of the smaller human chromosomes have been fully cloned as overlapping sets of YAC clones (contigs). We shall examine the cloning of the Y chromosome as an example because it illustrates several of the techniques of physical mapping. The STS map of the Y chromosome was in fact obtained by two different methods --- YAC alignment and deletion analysis.

YAC alignment.  Flow sorting yielded a sample of Y chromosomes, from which λ clones were made. From clones that did not contain repetitive DNA, STS primers were designed. In all, 160 primer pairs were made. A Y chromosome YAC library of 10,368 clones was obtained in which the average insert size was 650 kb. From these numbers, each point on the Y chromosome was estimated to have been sampled an average of four times. The YAC clones were divided into 18 pools of 576 YACs, and the pools were screened with the STS primers. Subdivision of positive pools led rapidly to the assignment of a particular STS to specific YACs. The total STS content of each YAC was assessed, and overlaps between the YACs were determined in the same way as that shown in the generalized example in Figure 14-15 .

Deletion analysis.  Various types of Y chromosome deletions occur naturally. For example, some XX males contain truncated fragments of the Y, whereas some XY females have deletions of the region containing the maleness (testis-determining) gene (see Chapters 2 and 23 ). These Y deletions were maintained in cell culture and formed the basis for aligning the Y chromosome STSs. Each deletion was tested for STS content. Because by nature the deletions were nested sets, the STS content could be used not only to develop an STS map, but also to map the coverage of the deletions. The principle is illustrated in Figure 14-17 . The STS maps produced by YAC alignment and by deletion analysis were identical.

Genome sequencing

Several different strategies have been successfully applied to genome projects. Their advantages and disadvantages depend on the size and complexity of the genome. Of particular importance is the frequency of repetitive DNA in the genome.

Random clone sequencing.

The first genome to be cloned was that of the bacterium Haemophilus influenzae. Genomic DNA was mechanically sheared and used to obtain a large number of random clones that were presumed to overlap each other in numerous ways. Primers based on adjacent vector DNA were used to sequence short regions at the ends of the cloned Haemophilus inserts. Then these short sequences were used (much like sequence-tagged sites) to align the genomic clones. Because so many random short sequences were obtained, together they encompassed most of the Haemophilus genome. Gaps were filled in by “primer walking”; that is, by using the end of a cloned sequence as a primer to sequence into adjacent uncloned fragments.top link

Sequencing ordered clones.

Most genomic sequencing programs start with a set of ordered clones. We have seen that an ordered set of YAC clones was developed for the human Y chromosome and other human chromosomes. However, YAC clones are not suitable for sequencing directly. YACs are subcloned into overlapping BACs or PACs. The BACs or PACs are again aligned into contigs by using STSs or the alignment of clone fingerprints. The BAC or PAC clones are again subcloned into smaller inserts for sequencing. At this level, multiple overlapping clones are sequenced randomly (without establishing clone alignment) so that any BAC or PAC clone is sequenced as many as five times in all.top link

Sequencing unordered clones.

One current strategy is to sequence the two ends of cloned genomic fragments from sequencing primers at the ends of the vector. If the length of the sequenced stretches and the lengths of the cloned fragments are sufficiently long, these sequences can be compiled to create long contiguous stretches of sequence that can extend over repetitive DNAs contained within the genome (see Chapters 3 and 20 for a discussion of transposable elements and other repetitive DNAs). The advantage of such a strategy is that the time- and labor-intensive process of clone mapping is avoided. This strategy is currently being tested for the Drosophila and human genomes.

Isolating human disease genes by positional cloning.

We shall follow the methods used to identify the genomic sequence of the cystic fibrosis (CF) gene as an example. No primary biochemical defect was known at the time that the gene was isolated, so it was very much a gene in search of a function. Linkage to molecular markers had located the gene to the long arm of chromosome 7, between bands 7q22 and 7q31.1. The CF gene was thought to be inside this region, flanked by the gene met (a proto-oncogene; see Chapter 22 ) at one end and a molecular marker, D788, at the other end. But between these markers lay 1.5 centimorgans (map units) of DNA, a vast uncharted terrain of 1.5 million bases. Additional markers within the region were obtained by using new probes derived from a chromosome 7 library made by flow sorting.

However, the two key techniques that were used to traverse the huge genetic distances were chromosome walking ( Chapter 13 ) and a related technique called chromosome jumping. The latter technique provides a way of jumping across potentially unclonable areas of DNA and generates widely spaced landmarks along the sequence that can be used as initiation points for multiple bidirectional chromosomal walks.

Chromosome jumping is illustrated in Figure 14-19 . In this procedure, large fragments are created by partial restriction cleavage of the DNA in the region believed to contain the gene of interest. Each DNA fragment is then circularized, thus bringing the beginning and end of the fragment together. This junction is cut out and cloned into a phage vector, which together with the other junction segments make up a jumping library. A probe from the beginning of the stretch of DNA under investigation can be used to screen the jumping library to find the clone that contains the beginning sequence. When this clone is found, the other end of the junction sequence is excised and used to screen the library again to make a second jump. From each jump position, chromosome walks can be made in both directions to search for genelike sequences.

A restriction map of the overall region was obtained with rare-cutting restriction enzymes, and the restriction sites were used to position and orient the sequences obtained from jumping and walking. When enough sequencing had been done to cover representative parts of the overall region, the hunt for any genes along this stretch began. Genes were sought by several techniques. First, human genes were known to be generally preceded at the 5′ end by clusters of cytosines and guanines, called CpG islands, and several of these clusters were found. Second, it was reasoned that a gene would show homology to the DNA of other animals, because of evolutionary conservation, so candidate sequences were used to probe what were called zoo blots of genomic DNA from a range of animals. Third, genes should have appropriate start and stop signals. Fourth, genes should be transcribed, and transcripts should be found.

Ultimately, a strong candidate gene was found spanning 250 kb of the region. Some CF symptoms are expressed in sweat glands; so, from cultured sweat gland cells, cDNA was prepared, and a 6500-nucleotide cDNA homologous to the candidate gene was detected. On sequencing this cDNA in normal and CF patients, the cDNA of the patients showed the deletion of three base pairs, eliminating a phenylalanine from the protein. Therefore it was very likely that this was the CF coding sequence. Thus the CF gene had been found. From its cDNA nucleotide sequence, an amino acid sequence was inferred. In turn, from this inferred sequence, the three-dimensional structure of the protein was predicted. This protein is structurally similar to ion-transport proteins in other systems, suggesting that a transport defect is the primary cause of CF. When used to transform mutant cell lines from CF patients, the wild-type gene restored normal function; this phenotypic “rescue” was the final confirmation that the isolated sequence was in fact the CF gene

Figure 14-19. Manipulating cloned genomic fragments for chromosome jumping, a modified type of chromosome walking that can bypass regions difficult to clone, such as those containing repetitive DNA (see text)


Summary

Genetics focuses on the nature of genes, and a major goal is to characterize their structure and function. Recombinant DNA technology has allowed individual genes to be isolated in a test tube and then characterized at the molecular level. The technology is based on restriction enzymes, which cut DNA into defined fragments. Restriction target sites can be mapped and act as DNA landmarks. Restriction fragments often have sticky ends, enabling them to be inserted into a vector capable of replicating in a bacterial cell. Such molecular hybrids are known as recombinant DNA. Bacteria amplify a single recombinant DNA molecule to form a DNA clone. Common vectors are plasmids, phages, and cosmids. An entire genome can be cloned in a set of clones known as a library. A specific clone can be found in a library by using a probe that specifically binds to the DNA or to the protein of the desired clone. Specific clones can also be isolated by their ability to transform null mutants. Tagging also is useful for cloning a gene: transforming DNA or a transposon is used to cause a mutation by insertion, and the DNA adjacent to the tag is isolated. Chromosome walking provides a way of isolating a gene by sequential isolation of overlapping clones, starting from a marker linked to the desired gene. Cloned DNA can be sequenced by several methods, including the arrest of DNA chain growth by dideoxynucleotides. The polymerase chain reaction uses primers to amplify DNA sequences. It is a way of rapidly isolating DNA whose structure is already partly sequenced and of detecting small amounts of one specific type of DNA. Gel electrophoresis separates variously sized DNA or RNA molecules from a mixture. Probes can detect specific DNA or RNA molecules on the gel, in procedures known as Southern and Northern analyses, respectively.top link





nastavak