Eukaryotic and prokaryotic gene structure
Introduction Gene structure
Genes contain the information necessary for living cells to survive and reproduce. In most organisms, genes are made of DNA, where the particular DNA sequence determines the function of the gene. A gene is transcribed (copied) from DNA into RNA, which can either be non-coding (ncRNA) with a direct function, or an intermediate messenger (mRNA) that is then translated into protein.
Each of these steps is controlled by specific sequence elements, or regions, within the gene. Every gene, therefore, requires multiple sequence elements to be functional. This includes the sequence that actually encodes the functional protein or ncRNA, as well as multiple regulatory sequence regions. These regions may be as short as a few base pairs, up to many thousands of base pairs long.
Much of gene structure is broadly similar between eukaryotes and prokaryotes.
These common elements largely result from the shared ancestry of cellular life in organisms over 2 billion years ago. Key differences in gene structure between eukaryotes and prokaryotes reflect their divergent transcription and translation machinery. Understanding gene structure is the foundation of understanding gene annotation, expression, and function
Access to freely available diagrams is important for scientists, medical professions and the general public.However, current open-access gene structure figures are limited in their scope (Supplementary figures 1-4), typically showing one or a few aspects (e.g. exon splicing, or promoter regions). Information-rich diagrams that use a consistent color scheme and layout should help the comprehension of complex concepts.
This work provides two diagrams that summarise the complex structure and terminology of genes. Common elements of gene structure are presented in a consistent layout and format to highlight the relationships between components. Key differences between eukaryotes and prokaryotes are indicated.
Common gene structure features
The structures of both eukaryotic and prokaryotic genes involve several nested sequence elements. Each element has a specific function in the multi-step process of gene expression. The sequences and lengths of these elements vary, but the same general functions are present in most genes. Although DNA is a double-stranded molecule, typically only one of the strands encodes information that the RNA polymerase reads to produce protein-coding mRNA or non-coding RNA. This ‘sense’ or ‘coding’ strand, run in the 5′ to 3′ direction where the numbers refer to the carbon atoms of the backbone’s ribose sugar. The open reading frame (ORF) of a gene is therefore usually represented as an arrow indicating the direction in which the sense strand is read
The structure of eukaryotic genes includes features not found in prokaryotes (Figure 1). Most of these relate to post-transcriptional modification of pre-mRNAs to produce mature mRNA ready for translation into protein. Eukaryotic genes typically have more regulatory elements to control gene expression compared to prokaryotes. This is particularly true in multicellular eukaryotes, humans for example, where gene expression varies widely among different tissues.
A key feature of the structure of eukaryotic genes is that their transcripts are typically subdivided into exon and intron regions. Exon regions are retained in the final mature mRNA molecule, while intron regions are spliced out (excised) during post-transcriptional processing. Indeed, the intron regions of a gene can be considerably longer than the exon regions. Once spliced together, the exons form a single continuous protein-coding region, and the splice boundaries are not detectable. Eukaryotic post-transcriptional processing also adds a 5′ cap to the start of the mRNA and a poly-adenosine tail to the end of the mRNA. These additions stabilize the mRNA and direct its transport from the nucleus to the cytoplasm, although neither of these features is directly encoded in the structure of a gene.
The overall organization of prokaryotic genes is markedly different from that of the eukaryotes (Figure 2). The most obvious difference is that prokaryotic ORFs are often grouped into a polycistronic operon under the control of a shared set of regulatory sequences. These ORFs are all transcribed onto the same mRNA and so are co-regulated and often serve related functions. Each ORF typically has its own ribosome binding site (RBS) so that ribosomes simultaneously translate ORFs on the same mRNA. Some operons also display translational coupling, where the translation rates of multiple ORFs within an operon are linked. This can occur when the ribosome remains attached at the end of an ORF and simply translocates along to the next without the need for a new RBS. Translational coupling is also observed when the translation of an ORF affects the accessibility of the next RBS through changes in RNA secondary structure. Having multiple ORFs on a single mRNA is only possible in prokaryotes because their transcription and translation take place at the same time and in the same subcellular location.[
The operator sequence next to the promoter is the main regulatory element in prokaryotes. Repressor proteins bound to the operator sequence physically obstructs the RNA polymerase enzyme, preventing transcription.Riboswitches are another important regulatory sequence commonly present in prokaryotic UTRs. These sequences switch between alternative secondary structures in the RNA depending on the concentration of key metabolites. The secondary structures than either block or reveal important sequence regions such as RBSs. Introns are extremely rare in prokaryotes and therefore do not play a significant role in prokaryotic gene regulation.[
Click here to download a pdf form of notes