Structure and Analysis of Eukaryotic Genes

Split genes Multigene families Functional analysis of eukaryotic genes

Split genes and introns

  • The mRNA-coding portion of a gene can be split by DNA sequences that do not encode mature mRNA
  • • Exons code for mRNA, introns are segments of genes that do not encode mRNA.
  • • Introns are found in most genes in eukaryotes • Also found in some bacteriophage genes and in some genes in archaea

R-loops can reveal introns
mRNA coding regions (exons) separated (by introns) on the chromosome

Examples of R-loops in mammalian hemoglobin genes

Examples of Rloops

Types of exons

-PCR to detect RNA

  • Finding exons with computers
  • Ab initio computation –
  • Uses an explicit, sophisticated model of gene structure, splice site properties, etc to predict exons •
  • Compare cDNA sequence with genomic sequence – BLAST2 alignments between cDNA and genomic sequences

Eukaryotic and prokaryotic gene structure 

Find exons for HBB

  • Sequence for the human beta-globin gene (HBB): – Accession number L48217 – Thalassemia variant •
  • Sequence for HBB mRNA – NM_000518 •
  • Retrieve those from GenBank at NCBI– Get the files in FASTA format •
  • Run Genscan and BLAST2 sequences

Genscan analysis of the HBB gene

BLAST2: HBB gene vs. cDNA

HBB gene vs.cDNA

Introns are removed by splicing RNA precursors

Alternative splicing can generate multiple polypeptides from a single gene

The mRNA for Protein A is made by splicing together exons 1, 2 and 3:

Alternative splicing can generate multiple polypeptides from a single gene, part 2

Or, by an alternative pathway of splicing that skips over exon2, Protein B can be made:

Multigene families, e.g. encoding hemoglobin

Blot-hybridization analysis showing multiple beta-like globin genes in mammals

  • A: clones, gel
  • B: clones, blot hybridization
  • C: genomic DNA, blot hybridization

Functional analysis of isolated genes

Gene Expression: where and how much?

  • A gene is expressed when a functional product is made from it.
  • • One wants to know many things about how a gene is expressed, e.g. –In which tissues?
  • –At what developmental stages?
  • –In response to which environmental conditions?
  • –At which stages of the cell cycle?
  • –How much product is made?

RNA blot-hybridizations = Northern

RNA blot-hybridization: Stage specificity

RT-PCR to detect RNA

In situ hybridization and immunoreactions

Sequence everything, find function later

  • Determine the sequence of hundreds of thousands of cDNA clones from libraries constructed from many different tissues and stages of development of organisms of interest.
  • • Initially, the sequences are partials and are referred to as expressed sequence tags (ESTs).
  • • Use these cDNAs in high-throughput screening and testing, e.g.
  • expression microarrays (next presentation).
    Massively parallel screening of high-density chip arrays
  • • Once the sequence of an entire genome has been determined, a diagnostic sequence can be generated for all the genes
  •  Synthesize this diagnostic sequence (a tag) for each gene on a high-density array on a chip, e.g. 6000 to 20,000 gene tags per chip.
  • • Hybridize the chip with labeled cDNA from each of the cellular states being examined.
  • • Measure the level of hybridization signals from each gene under each state.
  • • Identify the genes whose expression level differs in each state. The genes are already available.

Expression profiling using microarrays

Find clusters of co-regulated genes

Search the databases

  • What can be learned from the DNA sequence of a novel gene or polypeptide?
  • Many metabolic functions are carried out by proteins conserved from bacteria or yeast to humans – one may find a homolog with a known function.
  • Many sequence motifs are associated with a specific biochemical function (e.g. kinase, ATPase). A match to such a motif identifies a potential class of reactions for the novel polypeptide

Databases, cont’d

  • One may find a match to other genes with no known function, but their pattern of expression may be known.
  • Types of databases:
  • – Whole and partial genomic DNA sequences
  • – Partial cDNAs from tissues (ESTs
  • = expressed sequence tags)
  • – Databases on gene expression – Genetic maps

Express the protein product

  • Express the protein in large amounts – In bacteria – In mammalian cells – In insect cells (baculovirus vectors) • Purify it
  • Assay for various enzymatic or other activities, guided by (e.g.) – The way you screened for the clone – Sequence matches
    The phenotype of directed mutation
  • Mutate the gene in the organism of interest, and then test for a phenotype
  • The gain of function – Over-expression – Ectopic expression (where normally is silent)
  • Loss of function – Knock-out expression of the endogenous gene (homologous recombination, antisense) – Express dominant-negative alleles – Conditional loss-of-function, e.g. knock-out by recombination only in selected tissues

Production of biological processes

Localization on a gene map

  • E.g., use gene-specific probes for in situ hybridizations to mitotic chromosomes.
  • Align the hybridization pattern with the banding pattern
  • Are there any previously mapped genes in this region that provide some insight into your gene?

