|
Comments to Table S1:
Essentiality assertions for all E. coli ORFs determined by
genetic footprinting
under conditions of aerobic logarithmic cell growth in rich medium.
This table lists all protein-coding E. coli genes in the order they occur along the chromosome.
Genes are identified by their common names, unique identifiers in the ERGO database (Overbeek et al.,
2003), b-numbers (Blattner et al., 1997), as well as Swiss-Prot IDs and functional annotations (as of July 2002). Positions of
start codons are based on ORF calling within the ERGO database and may not correspond to those in
other databases. Every ORF is assigned to one of twelve broad functional categories (AAM, Amino
acids metabolism; CHM, Carbohydrate metabolism; NCM, Nucleotide and cofactor metabolism; LPC,
Lipids, lipopolysaccharides, lipoproteins, peptidoglycan, cell wall; NAM, Nucleic acid metabolism;
PMS, Protein metabolism and secretion; MSM, Miscellaneous metabolism; BEN, Bioenergetics; SMC,
Signalling, motility, chemotaxis; RCD, Expression regulators and cell cycle/division; MTR, Membrane
transporters; and PHT, Phage and transposase related). ORFs lacking specific functional annotations in
Swiss-Prot as of July 2002 were considered uncategorized (UNC). Evolutionary retention index (ERI)
for each ORF was determined as described in Experimental procedures.
Essentiality assertions were automatically determined based on the number of transposon insertions
detected within an ORF, and on the relative intensity of electrophoretic bands corresponding each
transposition event. These automatic calls were further curated manually using a graphical
chromosomal viewer
(examples).
All ORFs were sorted into four categories based on the following
criteria:
E
= essential. Includes genes with no detectable insertions within their coding
sequence (cds), and genes with only a few insertions within the 3'-most 20% or 5'-most 5%
of the gene.
A numerical measure of confidence in an assertion was calculated for each essential ORF (column "assertion error").
Assertion error shows the probability of missing an ORF by chance if insert locations were completely random.
It was calculated as follows: po(L) = exp(-rL), where r is the local insertion density and L
is the length of the ORF in base
pairs. In our case, r was determined by counting the number of inserts within a 10 kb-long region centered on each ORF,
excluding its coding sequence, and all essential ORFs and unanalyzed regions (gaps) in the area
(see also Materials and Methods in the main article).
N
= non-essential. Genes with one or more insertions located within the central 5 to
80% of the cds length were considered to be non-essential, except for a few relatively long
ORFs (>1000 bp). These were asserted as "ambiguous" if the insertion density within the
cds was significantly below the genomic average (3.2 inserts per 1000 bp).
X
= Indicates the gene was not covered. These include genes for which no reliable
PCR data could be obtained for various technical reasons.
?
= ambiguous. ORFs, for which experimental evidence was insufficient to make
specific conclusion about essentiality were asserted as ambigous.
For a more detailed description of these criteria and a discussion of potential sources of
erroneous assertions, follow this link.
For consistency all essentiality calls in Table S1 are based
exclusively on our experimental data, without any corrections by context or otherwise. For example,
accD is called nonessential and pdxJ is called essential regardless of convincing arguments to the
contrary. Table S1 includes raw genetic footprinting data: the number of transposon insertions
within each ORF and their locations (in amino acid coordinates) relative to the translational start.
These data can be used to refine essentiality assessments. More detailed information, including
inserts in non-coding regions, relative intensities of PCR products, and raw experimental data
(primer positions, gel images) are available from the authors upon request.
References:
Blattner, F. R., Plunkett, G., Bloch, C. A., Perna, N. T., Burland, V., Riley, M., ColladoVides, J.,
Glasner, J. D., Rode, C. K., Mayhew, G. F., et al. (1997). The complete genome sequence of
Escherichia coli K-12. Science 277, 1453-1462.
Overbeek, R., Larsen, N., Walunas, T., D'Souza, M., Pusch, G., Selkov, E. J., Liolios, K., Joukov,
V., Kaznadzey, D., Anderson, I., et al. (2003). The ERGO(TM) genome analysis and discovery
system. Nucleic Acids Res 31, 164-171.
|