We have developed a highly sophisticated computerized scanning system which permits the quantitation of expression of each of many thousands of cloned sequences in colonic biopsy specimens. We have used this sytem with biopsy samples of flat mucosa, premalignant adenoma, and carcinoma, from well defined populations which vary in their risk (both genetic and dietary) for, and pattern of development of, colon cancer. Thus, we have begun to develop a large computerized data base on expression of each of 4000 sequences from an HT-29 human colon carcinoma cDNA library, and a number of the proto-onc genes, in human tissue during the progression to colon cancer. From these data, we have identified a number of sequences which may characterize various stages in the progression of the disease, and molecular analysis, including nucleotide sequencing of some of these, has begun. We propose to use biopsy material from our extensive and well defined patient population together with this method of analysis to further define the changes of gene expression which characterize premalignant alterations, including an expansion of the proliferative compartment which we have documented in the flat mucosa of high risk individuals. In addition, the heterogenity of gene expression in tumors will be studied to identify markers which distinguish early form late adenomas, and others which may be indicative of invasive or metastatic potential. Utilizing methodology related to that we have reported, the cell type, number and location of cells within the colonic crypt or tumor which express these sequences will be analyzed by in situ hybridization of labeled probes to frozen, fixed sections of biopsy tissue taken at various stages from patients at differing risk for development of colon cancer. Finally, the molecular analysis of sequences identified in these experiments will be continued.