Table 1: The observed N-terminus of 223 E. coli genes

Locus Predicted N-terminusa Observed N-terminusb
accA m*slnfldfeqpiael slnfldfeqpia
accC mldkivianrgeia mldkivianrge
aceE m*serfpndvdpietr serfpndvdpie
aceF m*aieikvpdigadev aieikvpdigad
adk mriillgapgagkg mriillgapgag
agp mnktliaaavagivllasnaqa*qtvpegyqlqqvlm qtvpegyqlqqv
ahpC m*slintkikpfknqa slintkikpfkn
ahpC (67)y*avstdthfthka(104) avstdthfthka
aldA M*SVPVQHPMYIDGQF SVPVQ(KG)PMYXD
araF mhkftkalaaiglaavmsqsama*enlklgflvkqpee enlklgflvkqp
arcA mqtphilivedelvt mqtphilivede
argD M*AIEQTAITRATFDE AIEQTAI(FS)RATF
argG m*ttilkhlpvgqrig ttilkhlpvgqr
argI m*sgfyhkhflklld sgfyXkXflkGL
argT mkksilalsllvglstaassya*alpetvrigtdttya plpetvrigtdt
aroG mnyqnddlrikeik mnyqnddlrike
aroK mrfqfmscrrslseaglsltnslsstekm*aekrniflvgpmg aekrniflvgpm
artI MKKVLIAALIAGFSLSATA*AETIRFATEASYPP AETIRFATEASY
artJ mkklvlaallasftfgasa*aekinfgvsatypp aekinfgvsaty
asd mknvgfigwrgmvg mknvgfigwrgm
asnS m*svvpvadvlqgr AVvpvadvlq
aspC MFENITAAPADPIL MFENITAAPADP
atpA mqlnsteiselikq mqlnsteiseli
atpD m*atgkivqvigavvd atgkivqvigav
atpF MNLNATILGQAIAF
bcp mnplkagdiapkfs mnplkagdiapk
btuB mikkaslltacsvtafsawa*qdtspdtlvvtanr qdtspdtlvvta
carA liksallvledgtq miksallvledg
cpdB mikfsatllatliaasvna*atvdlrimettdlh atvdlrimettd
crr m*glfdklkslvsddk glfdklkslvsd
cspC m*akikgqvkwfnesk akikgqvkXfne
cysI msm*sekhpgplvvegkl sekhpgplvveg
cysK m*skifednsltight skifednsltig
cysP mavnllkknslalvaslllaghvqa*tellnssydvsrel tellnsxydvsr
dapA mftgsivaivtpmd mftgsivaivtp
dapD mqqlqniietafer mqqlqniietaf
dkaA MQEGQNRKTSSL MQENQNRK(PS)FXL
dnaK m*gkiigidlgttnsc gkiigidlgttn
dppA mrislkksgmlklglslvamtvaasvqa*ktlvycsegspegf ktlvyXsegspe
dps m*staklvkskatnll staklvkskatn
dsbA MKKIWLALAGLVLAFSASA*AQYEDGKQYTTLEK AQYEDGKQYTTL
dsbC mkkgfmlftllaafsgfaqa*ddaaiqqtlakmgikssdiq ddaaiqqtlakm
eco mktilpavlfaafattsawa*aesvqplekiapyp aesvqplekiap
efp m*atyysndfraglki atyysndfragl
eno m*skivkiigreiids skivkiigreii
fabD m*tqfafvfpgqgs tqfafvfpgq
fabI mgflsgkrilvtgv mgflsgkrilvt
fabI m*gflsgkrilvtgva gflsgkrilvtg
fba m*skifdfvkpgvitg skifdfvkpgvi
fklB m*ttptfdtieaqasy ttptfdtieaqa
fkpA MKSLFKVTLLATTMAVALHAPITFA*AEAAKPAT

AADSKA

AEAAKPATAADS
fliC m*aqvintnslslitq aqvintnslsli
fliY MKLAHLGRQALMGVMAVALVAGMSVKSFA*D

EGLLNKVKERGTL

DEGLLNKVKERG
folE M*PSLSKEAALVHEAL PSLSKEAALVTE
frr misdirkdaevrmd misdirkdaevr
ftsZ mfepmeltndavik mfepmeltndav
fumA m*snkpfhyqapfpl snkpfhyqapf
fusA m*arttpiaryrnigi arttpiaryrni
gadA (378)f*klkdgedpgytl(72) Llkdgedpgytl
galU m*aaintkvkkavipv aaintkvkkavi
galU maaint*kvkkavipvaglgt ANLkavipvagl
galU maain*tkvkkavipvaglgt TINLkavipvag
gapA m*tikvgingfgrigr tikvgingfgri
gcvT M*AQQTPLYEQHTLCG AQQTPLYEQHTL
gdhA mdqtyslesflnhv mdqtysleXfln
glnA m*saehvltmlneh saehvltmln
glnH mksvlkvslaaltlafavssha*adkklvvatdtafv adkklvvatdta
glnS m*seaearptnfirqi seaearptnfir
gltD M*SQNVYQFIDLQRVD SQNVYQFIDLQR
glyA mlkremniadydae mlkremniadyd
glyS m*sektflveigteel sektflveigte
gpmA m*avtklvlvrhgesq avtklvlvrhge
guaB mlriakealtfddv mlriakealtfd
guaC mrieedlklgfkdv mrieedlklgfL
hdeA mkkvlgvilggllllpvvsna*adaqkaadnkkpvn adaqkaadnkkp
hdeB MGYKMNISSLRKAFIFMGAVAALSLVNAQSALA*

ANESAKDMTCQEFI

ANESAKDMTHQE
hemX MTEQEKTSAVVEET MTEQEKTSAAXE
hisD M*SFNTIIDWNSCT SFNTIIDPNX(PYEK)T
hisJ MKKLVLSLSLVLAFSSATAAFA*AIPQNIRIGTD

PTY

AIPQNIRIGTDP
hlpA vkkwllaaglglalatsaqa*adkiaivnmgslfq adkiaivnmXsl
hmpA mldaqtiatvkati mldaqtiatvka
hns m*sealkilnnirtlr sealkilnnirt
htpG mkgqetrgfqse mkgqetXgfq
hupA mnktqlidviaeka mnktqlidviae
hupB mnksqlidkiaaga mnksqlidkiaa
icdA meskvvvpaqgkki meskvvvpaqgk
ilvC M*ANYFNTLNLRQQLA ANYFNTLNLRQQ
ilvI memlsgaemvvrsl memlsgaemvvr
imp MKKRIPTLLATMIATALYSQQGLA*ADLASQC ADLAS
kdsA mkqkvvsigdinva m(GM)qkvvsigdin
leuA m*sqqviifdttlrdg sqqviifdttlr
leuB M*SKNYHIAVLPGDGI SKNYHIAVLPGD
leuC m*aktlyeklfdahvv aktlyeklfdah
livJ mnikgkallagcialafsnmala*edikvavvgamsgp edikvavvgams
livK mkrnaktiiagmialaishtama*ddikvavvgamsgp ddikvavvgams
lolA MMKKIAITCALLSSLVASSVWA*DAASDLKSRLD

KVS

DAASDLKSRLDK
lpdA mm*steiktqvvvlgag steiktqvvvlg
malE mkiktgarilalsalttmmfsasala*kieegklviwingd kieegklviXin
manX v*tiaivigthgwaaet tiaivigthgwa
mdh mkvavlgaaggigq mkvavlgaaggi
mdoG MMKMRWLSAAVMLTLYTSSSWA*FSIDDVAKQ

AQSLA

FXIDDVAKQAXS
metE m*tilnhtlgfprvgl tilnhtlgfprv
mglB mnkkvltlsavmasmlfgaaaha*adtrigvtiykydd adtrigvtiyky
minD m*ariivvtsgkggvg ariivvtsgkgg
mopA m*aakdvkfgndarvk aakdvkfgndar
mopB mnirplhdrvivkr mnirplhdrviv
mreB mlkkfrgmfsndls mlkkfrgmfsnd
nadE mtlqqqiikalgen mtlqqqiikalg
nfnB mdiisvalkrhstk mdiisvalkrhs
nuoB mdytltridpngen mdytltridpng
nuoG m*atihvdgkeyevng atihvdgkeyev
nuoI mtlkellvgfgtqv mtlXellvgfgt
nusA mnkeilavveavsn mnXeilavveXv
ompA MKKTAIAIAVALAGFATVAQA*APKDNTWYTGA

KLG

APKDNTWYTGAK
ompC mkvkvlsllvpallvagaana*aevynkdgnkl aevynkdgn
ompF mmkrnilavivpallvagtana*aeiynkdgnkvdly aeiynkdgnkvd
ompF (36)k*avglhyfsk(313) avglhyfsk
oppA mtnitkrslvaagvlaalmagnvala*advpagvtlaekqt advpagvtlaekq
osmC M*TIHKKGQAHWEGDI TIHKKGQAHIEG
osmY mtmtrlkisktllavmltsavatgsaya*ennaqttnesagqk ennaqttnesag
pal MQLNKVLKGLMIALPVMAIAA**CSSNKNASNDGS
panB mkpttisllqkykq mkpttiSLLQXY
pckA mrvnngltpqelea mrvnngltpqel
pgk m*svikmtdldlagkr svikmtdldlag
pnp llnpivrkfqygqh mlnpivrkfqyg
potD mkkwsrhllaagalalgmsaaha*ddnntlyfynwtey ddnntlyfynXt
potF MTALNKKWLSGLVAGALMAVSVGTLA*AEQKT

LHIYNW

AENKTLXIYNV
ppa m*sllnvpagkdlped sllnvpagkdlp
ppa (92)l*kmtdeagedakl(68) kmtdeagedakl
ppiB mvtfhtnhgdivik mvtfhtnhgdiv
proS mrtsqyllstlket mrtsqyllstlk
prsA vpdmklfagnatpe (PA)pdmklfagnat
pstS mkvmrttvatvvaatlsmsafsvfa*easltgagatfpap easltgagatfp
ptsH mfqqevtitapngl mfqqevtitapn
ptsI misgilaspgiaf misgilaXpgi
purA M*GNNVVVLGTQWGDE GNNVXXLGTQXA(VL)
purC mqkqaelyrgkakt mqkqaelyrgka
purH mqqrrpvrrallsv mqqrrpvrrall
purM M*TDKTSLSYKDAGV TDKTSLSXXDD
pykF mkktkivctigpkt mkktkivAtigp
pyrB m*anplyqkhiisin anplyqkhiis
pyrC m*tapsqvlkirrpdd tapsqvlkirrp
pyrG m*ttnyifvtggvvss ttnyifvtggvv
pyrI mthdnklqveaikr mthdnklqveai
pyrI m*thdnklqveaikrg thdnklqveaik
rbsB mnmkklatlvsavalsatvsanama*kdtialvvstlnnp kdtialvvstln
rfaD miivtggagfigsn miivtggagfig
rho mnltelkntpvsel mnltelkntpvs
rpiA mtqdelkkavgwa mtqdelkkavg
rplA m*akltkrmrvirekv akltkrmrviIe
rplC MIGLVGKKVGMT MIGLVGKKVG
rplD MELVLKDAQSALTV MELVLKDAQSAL
rplF m*srvakapvvvpagv srvakapvvvpa
rplI mqvilldkvanlgs mqvilldkvanl
rplL M*SITKDQIIEAVAAM SITKDXIIEXV
rplM (34)R*RLRGKHKAEYTP(96) RLRGKHKAEYTP
rplY MFTINAEVRKEQGK MFTINAEVRREQ
rpoA mqgsvteflkprlvd mqgsvteflkprl
rpsA MTESFAQLFEESLK MTESFAQLFEES
rpsA m*tesfaqlfees tesfaqlfe
rpsB m*atvsmrdmlkagvh atvsmrdmlkag
rpsF mrhyeivfmvhpdq mrhyeivfmvXp
rpsJ mqnqririrlkafd mqnqriXirlLa
rpsP MVTIRLARHGAKKR MVTIRLAR(EA)GA(VP)
sbp mnkwgvgltfllaatsvma*kdiqllnvsydptr kdiqllnvsydp
sdhA mklpvrefdavvig mklpvrefdavv
sdhB mrlefsiyrynpd mrlefsiyryn
serA m*akvslekdkikfll akvslekdkikf
serC m*aqifnfssgpamlp aqifnfssgpam
slp MNMTKGALILSLSFLLAA**CSSIPQNIKGNN
sodA m*sytlpslpyaydal sytlpslpyayd
sodB m*sfelpalpyakdal sfelpalpyakd
sseA M*STTWFVGADWLAEH STTXFVGADDXA
sspA m*avaankrsvmtlfs avaankrsvmtl
sucB M*SSVDILVPDLPESV SSVDILVPDLPE
sucC mnlheyqakqlfar mnlheyqakqlf
sucD m*silidkntkvicqg silidkntkvic
sufI mslsrrqfiqasgialcagavplkasa*agqqqplpvpplle agqqqplpvppl
surA mknwktlllgiamiantsfa*apqvvdkvaavvnn apqvvdkvaavv
talB m*tdkltslrqyttvv tdkltslrqytt
thrC mklynlkdhneqvs mklynlkdhneq
tig mqvsvettqglgrr mqvsvettqglg
tig (35)k*kvridgfrkgkv(381) kvridgfrkgkv
tig (42)r*kgkvpmnivaq(374) kgkvpmnivaq
tig (43)r*kgkvpmnivaqr(373) kgkvpmnivaqr
tig (44)k*gkvpmnivaqry(372) gkvpmnivaqry
tnaA menfkhlpepfrir menfkhlpepfr
tolC mkkllpiliglslsgfsslsqa*enlmqvyqqarlsn enlmqvyqqarl
tpiA mrhplvmgnwklng mrXplvmgnXkl
tpx M*SQTVHFQGNPVTVA SQTVHFQGNPVT
trpA meryeslfaqlker meryeslfaqlk
trpB M*TTLLNPYFGEFGGM TTLLNPYFGEFG
tsf m*aeitaslvkelrer aeitaGlvkelr
tufA/B M**SKEKFERTKPHVNV
tufA/B (308)y*ilskdeggrhtp(70) ilskdeggrhtp
upp mkivevkhplvkhk mkivevkhplvk
ushA mkllqrgvalallttftlasetala*yeqdktykitvlht yeqdktykitvl
uspA m*aykhiliavdlspe aykhiliavdls
valS mektynpqdieqpl meFtynpqdieq
xylF mkiknilltlctsllltnvaaha*kevkigmaiddlrl kevkigmaiddl
yacI VLEEYRKHVAERAA MLEEYRKHVAER
yacK mqrrdflkysvalgvasalplwsravfa*aerptlpipdlltt aerptlpipTll
yaeT mamkklliasllfssatvyg*aegfvvkdihfegl aegfvvkdihfe
yaeT (350)R*KIRFEGNDTSKD(448) KIRFEGNDTSXD
yajG MFKKILFPLVALFMLAG**CAKPPTTIEVSP
ybdQ MYKTIIMPVDVFEM MYKTIIMPVDVF
ybiS MNMKLKTLFAAAFAVVGFCSTASA*VTYPLPTDG

SRLVG

VTYPLPTDGSRL
yceI mkksllgltfaslmfsagsava*adykidkegqhafv adykidkegqha
ychF mgfkcgivglpnvg XXfkXgivglpn
ychF m*gfkcgivglpnvgk (AS)fkXgivglpnv
ydcG MDRRRFIKGSMAMAAVCGTSGIASLFSQAAFA*A

DSDIADGQTQRFD

ADSDIADGQTQR
ydfG MIVLVTGATAGFGE MIVLVTGATAGF
yeaD MKLKDCV*MIKKIFALPVIEQI MIKKIFALPVIE
yebL MKCYNITLLIFITIIGRIMLHKKTLLFAALSAALW

GGATQAADA*AVVASLKPVGFIAS

AVVASLKPVGFI
yeeQ (104)A*ADIVVHPGETV(976) ADIVVHPGTTT
yfiA m*tmnitskqmeitpa tmnitskqmeiF
ygaG MPLLDSFTVDH (A)PLLDSFTV
ygaG M*PLLDSFTVDHT PLLDSFTVD
ygaU M*GLFNFVKDAGEKLW GLFNFVKDAGEK
ygfZ M*AFTPFPPRQPTASA AFTPFPPRQPTA
yggX M*SRTIFCTFLQREAA SRTIFXTFLQIE
ygiN MLTVIAEIRTRPGQ MLTVIAEIRTRP
yhbG m*atltaknlakaykg atltaknlaXay
yhbN MKFKTNKLSLNLVLASSLLAASIPAFA*VTGDTD

QPIHIESD

VTGDTDQPIHIE
yhfO MMYGVYRA*MKLPIYLDYSATTP MKLPIYLDYSAT
yhjJ mqgtkirllaggllmmatagyvqa*dalqpdpawqqgtl dalqpdpaXqqg
yhjW (228)a*rvdessdnnsll(245) rvdessdnnsll
yiaE MERS*MKPSVILYKALP M(NI)PSVI(NVD)YTAIP
yifE M*AESFTTTNRYFDNK AESFTTTNRYFD
yigW MKKFAAVIAVMALCSAPV*MAAEQGGFSGPSAT

QS

AAAEQGGFSGPSA
yihK vieklrniaiiahv Mieklrniaiia
yjbJ MNKDEAGGNWKQFK MNKDEAGGNXKQ
yjbP mrkitqaisavcllfalnssavala*sspsplnpgtnvar sspsplnpgtnv
yjbP MRKITQAISAVCLLFALNSSAVA*LASSPSPLNPG

TNV

LASSPSPLNPGT
yjgF M*SKTIATENAPAAIG SKTIATENAPAA
yjjK M*AQFVYTMHRVGK A(DE)FVYTMXRV(LI)(GA)
ynaF MNSVITQKVSSGVTLYADTKTGGF*MNRTILVPID

ISDS

MNRTILVPIDIS
yphF MPTKMRTTRNLLLMATLLGSALFARA*AEKEMT

IGAIYLDT

AEKEMTIGAIYL
ytfJ mtlrkilaltclllpmmasa*hqfetgqrvppigi hqfetgqrvppi
ytfQ MWKRLLIVSAVSAAMSSMALA*APLTVGFSQVG

SES

APLTVGFSQVGS

  1. The predicted N-terminal sequence is from the completed genome of E. coli K-12 strain MG1655.

A "*" in the protein sequence shows the observed start site based on the N-terminal sequence tag.

A "**" in the protein sequence shows the predicted N-terminus of the mature protein based on published literature. For observed N-termini matching the internal region of an E. coli gene, the number of amino acids between the predicted N- and C- terminus of the conceptual protein and the observed N-terminus is shown in parenthesis.

  1. The observed N-termini are based on the data in Tables a1-a10. An "X" indicates no amino acid was identified for that position during Edman sequencing. Parentheses indicate two amino acids were observed for that position during Edman sequencing. An underlined amino acid indicates a discrepancy between the predicted and observed protein sequence. For genes with no sequence data, the observed N-terminus of the expressed gene was blocked to Edman sequencing. These protein were identified by sequencing internal peptides of the protein