PPLRE Data Examples

From GM-RKB
Jump to: navigation, search

The PPLRE Data Examples contains Examples of Passages drawn from the PPLRE Corpus that are likely to be helpful in the illustration of different issues encountered in the PPLRE Project.



BINARY RELATION (examples)


ONE SENTENCE (binary relation)


With one binary relation; Short-distance between entities (4 words)


With one binary relation; Short-distance between entities (4 words)


With one binary relation; Long-distance(13 words)

  • Reference: PPLRE Corpus 7181.a.5
    • "This suggests that [PROTEIN MmpL7] acts in complex with the synthesis machinery to efficiently transport [OTHER PDIM] across the [LOCATION cell membrane]."
  • Notes:

With one binary relation (but not an experimentally validated one)


With two distinct binary relations


With three related binary relations

  • Reference: PPLRE Corpus 30.a.1
    • "During phenylethylamine-dependent growth, aromatic [PROTEINa amine dehydrogenase], [PROTEINb bazurin], and a single [PROTEINc cytochrome c] were localized in the [LOCATION periplasm]."
  • Notes:

TWO ADJACENT SENTENCES (binary relation)


One relation

... tbd..

Two relations (one with coreference)

  • Reference PPLRE Corpus 99.2-3
    • "Western blotting of lysates of wild-type 8325-4 and Newman and the corresponding [PROTEINa ebpS] mutants showed that [PROTEINb EbpS] migrated with an apparent molecular mass of 83 kDa."
    • "The protein was found exclusively in [LOCATIONa cytoplasmic membrane] fractions purified from protoplasts or lysed cells, in contrast to the clumping factor [PROTEINc ClfA], which was [LOCATIONb cell-wall-associated]."
  • Notes:*
    • Relation(s):
      • OPL(PROTEINb, LOCATIONa)
      • OPL(PROTEINc, LOCATIONb)
    • The first relation requires coreference resolution of This protein. And may then be analyzable with a Text Graph-based approach.
    • An applicable Dependency Relation Recognition Classifiers for the second relation:
      • PROTEIN ⇒ LOCATION

Two relations (one negative relation; coreference required; long pattern; SRL enhanceable)

  • Reference PPLRE Corpus 7373.4-5
    • [PROTEIN TcpQ] is a predicted [LOCATIONa periplasmic] protein required for TCP biogenesis.
    • Fractionation studies revealed that the protein is not localized to the [LOCATIONa periplasm] but is associated predominantly with the [LOCATIONb outer membrane] fraction.
  • Notes:*
    • Relation(s):
      • OPL(PROTEIN, LOCATIONa) ⇒ FALSE
      • OPL(PROTEIN, LOCATIONb) ⇒ TRUE
    • Candidate Dependence Grammar-based Classifiers
      • PROTEIN == (protein (== predicted) (== LOCATIONa))
      • LOCATIONa == (localized (== not) (PROTEIN)) == revealed == associated == with == fraction == LOCATIONb
    • Candidate SRL-enhanced Dependence Grammar-based Classifier
      • PROTEIN == associated == with == fraction == LOCATIONb

MULTIPLE SENTENCES (binary relation)


One Relation; Strengthened Prediction

  • Reference: PPLRE Corpus 7181.a.2,4-5
    • "Transport of [OTHER PDIM] requires [PROTEIN MmpL7], a member of the MmpL family of RND permeases."
    • "Overexpression of the interaction domain of [PROTEINa MmpL7] acts as a dominant negative to [LIPID PDIM] synthesis by poisoning the interaction between [PROTEINb synthase] and transporter."
    • "This suggests that [PROTEINa MmpL7] acts in complex with the synthesis machinery to efficiently transport [LIPID PDIM] across the [LOCATION cell membrane]."
  • Notes:
    • Relation(s):
      • OPL(PROTEINa, LOCATION)
    • In this example a sentence-level algorithm would have likely made a prediction with medium confidence.
    • One opportunity is to strengthen the prediction based on the context provided by a Text Graph representation such as:
      • PROTEIN <= requiresPDIM
      • PROTEIN <= actsPDIM
      • PROTEIN <= actsPDIM ⇒ LOCATION
      • PL(<PROTEIN_1>,<LOCATION>) ⇒ confidence=0.69
    • A Text Graph Relation Recognition Classifier would strengthen the confidence of the relation because of the strengthened association between PROTEIN_1 and the PDIM named entity.
    • PDIM happens to be a Lipid.

  • Reference: PPLRE Corpus 200.a.0,2,6,7,9
    • <sent=0>" Two proteins, [PROTEIN PS1] and [PROTEIN PS2], were detected in the culture medium of [ORGANISM Corynebacterium glutamicum] and are the major proteins secreted by this bacterium."
    • <sent=2> The gene encoding [PROTEIN PS1], csp1, was cloned in lambda gt11 using polyclonal antibodies raised against [PROTEIN PS1] to screen for producing clones."
    • <sent=6> It had the same [OTHER M(r)] as the [PROTEIN PS1] protein band detected in the supernatant of [ORGANISM C. glutamicum] cultures and presumably corresponds to the mature form of [PROTEIN PS1]."
    • <sent=7> The minor protein band appears to be the precursor form of [PROTEIN PS1]."
    • <sent=9> This is consistent with the [OTHER M(r)] determined for [PROTEIN PS1] from [ORGANISM C. glutamicum] culture supernatant and [ORGANISM E. coli] whole-cell extracts."
  • Notes
    • Relation of interest
      • OP(C. glutamicum, PS1)
    • An example of where the "shortest sentence-level distance" heuristic benefits from counting instances. One sentence happens to contain a confounding organism (E. coli). By counting however we can assume that the connection to "C. glutamicum" is stronger for protein "PS1".
    • Note that sentence six (6) has two mentionings of the protein (PS1) and one mentioning of an organism (C. glutamicum). How should this sentence be counted? As a single cooccurrence or two of them? The easiest for now is two (2).

One Relation; Brand New Prediction

  • tbd: need to find an example from our multiple sentence ones. E.g. a lone organism.


TERNARY RELATION (examples)


ONE SENTENCE (ternary relation)


One ternary relation (O/P/L order); Two simple binary dep.grammar patterns

  • Reference: PPLRE Corpus 8201.a.0
  • [ORGANISM Helicobacter pylori] [PROTEIN urease] is an [LOCATION extracellular], cell-bound enzyme with a molecular weight of approximately 600,000 (600K enzyme) comprising six 66K and six 31K subunits.
  • Notes:
    • ORGANISM <= PROTEIN
    • PROTEIN ⇒ enzyme ⇒ LOCATION

One ternary relation (P/L/O order); Two simple binary dep.grammar patterns

  • Reference: PPLRE Corpus 4990.a.1
    • [PROTEIN Proteinase P1], the main [LOCATION extracellular] heat-stable proteinase fraction of [ORGANISM P. fluorescens P1], has been purified to homogeneity .
  • Notes:

One ternary relation (P/L/O order); Two binary dep.grammar patterns

  • Reference: PPLRE Corpus 610.a.0
    • "[PROTEIN Protein K] is an [LOCATION outer membrane] protein found in pathogenic encapsulated strains of [ORGANISM Escherichia coli]."
  • Notes:
    • Two Dependency Relation Recognition Classifiers could be:
      • PROTEIN <= protein ⇒ LOCATION
      • PROTEIN <= protein ⇒ foundin ⇒ strainsof ⇒ ORGANISM
    • This an example where discourse analysis can help. The OP() is simpler to identify in the fifth sentence of the abstract: "These data suggest that [PROTEIN protein K] is a functional porin in [ORGANISM E. coli]."

One (ternary) Relation (P/L/O order); Two medium binary dep.grammar patterns

  • Reference: PPLRE Corpus 44.a.0
    • "SGAP is an [PROTEIN aminopeptidase] present in the [LOCATION extracellular] fluid of [ORGANISM Streptomyces griseus] cultures."
  • Notes:

One ternary relation (P/L/O order); Solvable by One Ternary Pattern


One ternary relation (P/L/O order); with one obfuscatory protein entity

  • Reference: PPLRE Corpus 4151.a.0
    • "A protein ([PROTEIN1 NosA] ) in the [LOCATION outer membrane] of [ORGANISM Pseudomonas stutzeri] that is required for copper to be inserted into [PROTEIN2 N2O reductase] has been extracted and purified to homogeneity .
  • Notes:

One ternary relation (P/O/L order); in short range but with an extraneous protein

  • Reference: PPLRE Corpus 4501.a.0
    • [PROTEINa Exotoxin A] of [ORGANISM Pseudomonas aeruginosa] is a [LOCATION secreted] bacterial toxin capable of translocating a catalytic domain into mammalian cells and inhibiting protein synthesis by the ADP-ribosylation of [PROTEINb cellular elongation factor 2].
  • Notes
    • A binary Dependency Relation Recognition Classifier could be:
      • PROTEINa of ORGANISM
      • PROTEINa is toxin LOCATION
      • ORGANISM of PROTEINa toxin capable of translocating inhibiting by ADP of PROTEINb
      • PROTEINb of ADP by inhibiting translocating of capable toxin LOCATION

One ternary relation (O/P/L order); with one obfuscatory location named entity

  • Reference: PPLRE Corpus 25.a.0
    • "A [PROTEIN membrane proteinase] from [ORGANISM Pseudomonas aeruginosa], called [PROTEIN insulin-cleaving membrane proteinase] ([PROTEIN ICMP]), was located in the [LOCATIONa outer membrane] leaflet of the [LOCATIONb cell envelope]."
  • Notes
    • Relation
      • OPL(ORGANISM,PROTEIN,LOCATIONa)
    • It is unclear how to avoid the incorrect relation OPL(ORGANISM,PROTEIN,LOCATIONa). Note that the dependency grammar based shortest path seems reasonable and is very similar to the correct path below.
      • LOCATIONa leaflet in located PROTEIN
    • From visual inspection the correct relation appears to require a path that involves the incorrect location LOCATIONa; but in fact, a the shortest Dependency Grammar pattern skips over it.
      • LOCATIONb of leaflet in located PROTEIN

One ternary relation (O/P/L order); from many possible permutations and with negation

  • Reference: PPLRE Corpus 6061.a.6
    • "Isolated [ORGANISM T. pallidum] [LOCATIONa outer membrane] was devoid of the [PROTEINa 19-kDa 4D] protein and the normally abundant [PROTEINb 47-kDa] lipoprotein known to be associated with the [LOCATIONb cytoplasmic membrane] ; only trace amounts of the [LOCATIONc periplasmic endoflagella] were detected."
  • Notes:
    • Relations:
      • OPL(ORGANISM, PROTEINb, LOCATIONb) ⇒ true
    • Contains only one relation out of the six possible permutations.
    • Patterns:
      • ORGANISM LOCATIONa devoid of protein PROTEINa
      • ORGANISM LOCATIONa devoid of protein lipoprotein PROTEINb
      • LOCATIONb with associated known lipoprotein protein PROTEINa
      • LOCATIONb with associated known lipoprotein PROTEINb
      • LOCATIONc of amounts detected devoid of protein PROTEINa
      • LOCATIONc of amounts detected devoid of protein lipoprotein PROTEINb
      • LOCATIONa devoid of protein PROTEINa
      • LOCATIONa devoid of protein lipoprotein PROTEINb
    • One possible fix is to separate the sentence into to at the semi-colon

Two ternary relations (L/P/O order); broken one-to-many assumption

  • Reference: PPLRE Corpus 8798.a.1
    • "The EPS levan is synthesized by the [LOCATION extracellular] enzyme [PROTEIN levansucrase] in [ORGANISMa Pseudomonas syringae], [ORGANISMb Erwinia amylovora], and other bacterial species."
  • Notes:
    • Dependency Grammar Shortest Paths
      • ORGANISMb ORGANISMa in PROTEIN
      • ORGANISMa in PROTEIN
      • LOCATION PROTEIN
    • One challenge faced is that the protein is related to more than one organism. The one-to-many assumption for the OP() relation is violated.
    • Another challenge is to associative the two organisms. Notice how in the classifiers above the path from PROTEIN to ORGANISMb artificially goes through ORGANISMa.

Two ternary relations (L/P/O order); broken one-to-many assumption

  • Reference: PPLRE Corpus 10230.a.3
    • We confirm and extend previous studies by demonstrating that [PROTEIN Lst] is located in the [LOCATION outer membrane] and is surface exposed in both [ORGANISMa N. gonorrhoeae] and [ORGANISMb N. meningitidis]."
  • Notes:
    • DG Classifiers
      • ORGANISMa in exposed surface located PROTEIN
      • ORGANISMb ORGANISMa in exposed surface located PROTEIN
      • LOCATION in located PROTEIN
    • Confuses the discourse based nearest neighbor algorithm because the more distant organism (N. meningitis) is associted to a different protein in another sentence.

Two ternary relations (O/L/PP order)

  • Reference: PPLRE Corpus 24.a.1
    • "A [ORGANISM Y. pestis] dsbA mutant [LOCATION secreted] reduced amounts of the [PROTEINa V antigen] and [PROTEINb Yops] and expressed reduced amounts of the full-sized [PROTEINc YscC] protein."
  • Notes:
    • Relations
    • In this example the organism and the first protein are seven words apart and this is over the optimal thresholds of the nearest neighbor classifier v1.0.
    • Candidate Dependency Grammar Patterns
      • ORGANISM LOCATION reduced amounts of PROTEIN
      • ORGANISM LOCATION reduced amounts of PROTEIN PROTEIN
      • ORGANISM LOCATION reduced expressed amounts of protein PROTEIN
      • LOCATION reduced amounts of PROTEIN
      • LOCATION reduced amounts of PROTEIN PROTEIN
      • LOCATION reduced expressed amounts of protein PROTEIN
    • Candidate Surface Patterns
      • ORGANISM dsbA mutant LOCATION reduced amounts of the PROTEIN
      • ORGANISM dsbA mutant LOCATION reduced amounts of the PROTEIN and PROTEIN
      • ORGANISM dsbA mutant LOCATION reduced amounts and expressed reduced amounts of the PROTEIN
      • LOCATION reduced amounts of the PROTEIN
      • LOCATION reduced amounts of the PROTEIN PROTEIN
      • LOCATION reduced amounts of the PROTEIN PROTEIN and expressed reduced amounts of the PROTEIN

Several ternary relations

  • Reference: PPLRE Corpus 491.a.2
    • "In this study, [ORGANISM Escherichia coli] [PROTEINa TonB] was found to be distributed in sucrose density gradients approximately equally between the [LOCATIONa cytoplasmic membrane] and the [LOCATIONb outer membrane] fractions, while two proteins with which it is known to interact, [PROTEINb ExbB] and [PROTEINc ExbD], as well as the [PROTEINd NADH] oxidase activity characteristic of the [LOCATIONa cytoplasmic membrane], were localized in the [LOCATIONa cytoplasmic membrane] fraction."
  • Notes:
    • Relations
      • OP(ORGANISM,PROTEINa), OP(ORGANISM,PROTEINb), OP(ORGANISM,PROTEINc), OP(ORGANISM,PROTEINd)
      • PL(PROTEINa, LOCATIONa), PL(PROTEINa, LOCATIONb)
      • PL(PROTEINb, LOCATIONa)
      • PL(PROTEINc, LOCATIONa)
      • PL(PROTEINd, LOCATIONa)
    • It is a long sentence.
    • There are many relations: 9
    • Not all permutations are valid: 3 (11?) are invalid.
    • Candidate Classifiers
      • ORGANISM PROTEINa
      • ORGANISM PROTEINa found distributed localized proteins PROTEINb characteristic PROTEINd
      • ORGANISM PROTEINa found distributed localized proteins PROTEINb
      • ORGANISM PROTEINa found distributed localized proteins PROTEINb PROTEINc
      • LOCATIONa between distributed found PROTEINa
      • LOCATIONa of characteristic PROTEINd
      • LOCATIONa of characteristic PROTEINb
      • LOCATIONa of characteristic PROTEINb PROTEINc
      • LOCATIONb fractions LOCATIONa between distributed found PROTEINa
      • LOCATIONb fractions LOCATIONa between distributed localized proteins PROTEINb characteristic PROTEINd
      • LOCATIONb fractions LOCATIONa between distributed localized proteins PROTEINb
      • LOCATIONb fractions LOCATIONa between distributed localized proteins PROTEINb PROTEINc
    • The problem may be simplified by inducing a model that groups entities. E.g. "ExbB and ExbD, as well as the NADH oxidase.” into a single group.
    • This sentence is an example where discourse analysis helps the nearest neighbor algorithm. In the sentence that follows, for example, the PL(PROTEINa,LOCATIONa) relation is simpler to infer. The sentence starts "Neither the N-terminus of TonB nor the cytoplasmic membrane pmf...". from which we can see the simpler pattern [LOCATIONa pmf PROTEINa] emerge.

Several ternary relations (L/O/P order)

  • Reference: PPLRE Corpus 560.a.8
    • "Purified [LOCATION secreted] proteases of [ORGANISM P. aeruginosa] (i.e. "[PROTEINa elastase], the [PROTEINb lysine-specific protease], and [PROTEINc alkaline proteinase]) converted [PROTEINd proLasA] to the active enzyme."
  • Notes:
    • Relations
      • Three of the four possible permutations.
    • In surface-based Nearest-Neighbor classification the use of multi-sentence analysis helps because the distance between the location and the protein is much closer in other sentences; close enough to be within the optimal threshold.
    • Candidate DG Classifiers
      • ORGANISM of proteases PROTEINa PROTEINc
      • ORGANISM of proteases converted PROTEINd
      • ORGANISM of proteases PROTEINa PROTEINb
      • ORGANISM of proteases PROTEINa
      • LOCATION proteases PROTEINa PROTEINc
      • LOCATION proteases converted PROTEINd
      • LOCATION proteases PROTEINa PROTEINb
      • LOCATION proteases PROTEINa

Several ternary relations; Compound Entity Mention

  • Reference: PPLRE Corpus 8767.a.2
    • The [PROTEIN frcBCA] genes encode the characteristic components of an ATP-binding cassette transporter ([PROTEIN FrcB], a [LOCATION periplasmic] substrate binding protein, [PROTEIN FrcC], an integral [LOCATION membrane] permease, and [PROTEIN FrcA], an ATP-binding [LOCATION cytoplasmic] protein), which is the unique high-affinity (Km of 6 M) fructose uptake system in [ORGANISM S. meliloti].

Inappropriate passage for support

  • Reference: PPLRE Corpus 4371.a.0
    • The [PROTEIN phospholipase C] (PLC) gene of [ORGANISM Pseudomonas aeruginosa] encodes a heat-labile [LOCATION secreted] hemolysin which is part of a Pi-regulated operon.
  • Notes:
    • This sentence contains the appropriate three entities, and they are in fact a true relation relation, however according to the experts this sentence does not support the ternary relation.
    • The sentence does support the OP(P. aeruginosa, phospholipase C) binary relation.
    • Patterns:
      • ORGANISM of gene PROTEIN
      • LOCATION hemolysin encodes gene PROTEIN

No ternary relation

  • Reference: PPLRE Corpus 2461.a.3
    • Using the two PA14-isogenic mutants [PROTEIN [PROTEIN plcS] and [PROTEIN dsbA], we show that [INVERTEBRATE Drosophila] loss-of-function mutants of [PROTEIN Spatzle, the [LOCATION extracellular] ligand of [PROTEIN Toll], and [PROTEIN Dorsal] and [PROTEIN Dif], two NF-B-like transcription factors, allow increased [ORGANISM P. aeruginosa] infectivity within fly tissues.
  • Notes:
    • This sentence is an example where the context is wrong. The experiment is about a bacteria infecting an invertebrate (Drosophial i.e. house fly).
    • Unclear how to remedy. The presence of an "Invertebate"
    • This example can lead to many false positives.
    • Pattern:
      • ORGANISM infectivity allow PROTEIN
    • The word "infectivity" may be associated with false positives

= TWO ADJACENT SENTENCES (ternary relation)


One ternary relation

  • Reference: PPLRE Corpus 357.a.0-1
    • A 2.5 kb DNA fragment contain a gene encoding a [PROTEINa phospho-alpha-(1-1)-glucosidase] ([PROTEINb phosphotrehalase]), designated [PROTEIN treA], was isolated from a [ORGANISM Bacillus subtilis] chromosomal library by complementation of the tre-12 mutation.
    • The major [PROTEIN TreA] activity was found in the [LOCATION cytoplasm].


  • Notes:
    • Two relevant patterns. The ORG/PROT one is long.
      • PROTEIN <= encoding <= gene <= isolated ⇒ from ⇒ library ⇒ ORGANISM
      • PROTEINa <= activity <= found ⇒ in ⇒ LOCATION
    • There are three protein name synonyms in the first sentence. However these can be resolved through a simple coreference resolution.
    • of the words TreA in the first sentence and treA in the second sentence.



One Relation; Lone Location Entity

  • Reference: PPLRE Corpus 200.a.5-6
    • "The major protein band, of lower M(r), was detected in the [LOCATION periplasmic fraction]."
    • "It had the same M(r) as the [PROTEINa PS1] protein band detected in the supernatant of [ORGANISM C. glutamicum] cultures and presumably corresponds to the mature form of [PROTEINa PS1]."
  • Notes:
    • Two relevant patterns on the two sentences. Notice that the concept band is a candidate connector.
      • PROTEIN <= band ⇒ detected ⇒ in ⇒ supernatant ⇒ of ⇒ cultures ⇒ ORGANISM
      • band <= detected ⇒ in ⇒ LOCATION
    • The permutation from the two proteins with identical name can be resolved via simple Co-reference Resolution.

One Relation; Lone Location Entity

  • Reference: PPLRE Corpus 1160.a.0
    • "Secretins, a superfamily of multimeric [LOCATION outer membrane] proteins, mediate the transport of large macromolecules across the [LOCATION outer membrane] of Gram-negative bacteria."
    • "Limited proteolysis of [PROTEIN secretin PulD] from the [ORGANISM Klebsiella oxytoca] pullulanase secretion pathway showed that it consists of an N-terminal domain and a protease-resistant C-terminal domain that remains multimeric after proteolysis."
  • Notes:
    • A simple pattern can discover the ORG/PROT relation.
      • PROTEIN ⇒ from ⇒ ORGANISM
    • The location however stands alone in the earlier sentence.

Three Relations; Lone Organism Entity

  • Reference: PPLRE Corpus 30.a.0-1
    • "A lysozyme-osmotic shock method is described for fractionation of [ORGANISM Alcaligenes faecalis] which uses glucose to adjust osmotic strength and multiple osmotic shocks."
    • "During phenylethylamine-dependent growth, aromatic [PROTEINa amine dehydrogenase], [PROTEINb azurin], and a single [PROTEINc cytochrome c] were localized in the [LOCATION periplasm]."
  • Notes:
    • Contains three relations
      • OPL(A. faecalis, amine dehydrogenase, periplasm)
      • OPL(A. faecalis, azurin, periplasm)
      • OPL(A. faecalis, cytochrome c, periplasm).
    • A dependency grammar pattern can be used to identify the PROTEIN/LOCATION relation.
      • PROTEINc ^ PROTEINb <= PROTEINa <= localizedin ⇒ LOCATION
    • No shallow semantic connection appears to exist between the two sentences and so the ORGANISM stands alone.
    • A Nearest Entity approach could work.

MULTIPLE SENTENCES (ternary relation)


One Relation; Lone Organism Entity; Weak-coreference

  • Reference: PPLRE Corpus 7181.a.t,1-2,5-6
    • "Phthiocerol dimycocerosate (PDIM), a surface-exposed polyketide lipid necessary for [ORGANISM Mycobacterium tuberculosis] virulence, is the product of several polyketide synthases including PpsE."
    • "Transport of [LIPID PDIM] requires [PROTEIN MmpL7], a member of the MmpL family of RND permeases."
    • ...
    • ...
    • ...
    • "This suggests that MmpL7<PROTEIN> acts in complex with the synthesis machinery to efficiently transport PDIM across the cell membrane<LOCATION>.”
    • "Coordination of synthesis and transport may not only be a feature of MmpL-mediated transport in M. tuberculosis<ORGANISM>, but may also represent a general mechanism of polyketide export in many different microorganisms."
  • Notes:

Two Ternary Relations; Lone Organism Entity

  • Reference: PPLRE Corpus 8611.a.0-1
    • "Plant signal molecules such as acetosyringone and certain monosaccharides induce the expression of [ORGANISM Agrobacterium tumefaciens] virulence (vir) genes, which are required for the processing, transfer, and possibly integration of a piece of the bacterial plasmid DNA (T-DNA) into the plant genome."
    • "Two of the vir genes, [PROTEINa virA] and [PROTEINb virG], belonging to the bacterial two-component regulatory system family, control the induction of vir genes by plant signals."
    • "[PROTEINa virA] encodes a [LOCATION membrane-bound] sensor kinase protein and [PROTEINb virG] encodes a [LOCATIONb cytoplasmic] regulator protein."
  • Notes:
    • Also illustrates the case where the organism is infrequently mentioned.
    • The organism and one of the proteins are mentioned in the title.
    • The relations are:
      • OPL(A. tumefaciens, virA, cytoplasm)
      • OPL(A. tumefaciens, virG, cytoplasm).
    • The organism and the protein however are never mentioned in the same sentence.
    • The organism name is not even in a sentence that neighbors the sentence with the PL() relation.




Unclassified

  1. The location is very far from the organism. The protein has redundant entries hemolysin/shlA

PROTEIN_NAME=hemolysin ORGANISM_NAME=S. marcescens ORGANISM_ID=615 LOCATION_NAME=outer membrane LOCATION_ID=GO0009279 TUPLE_ID=152 PSID=4555 Log-phase cells of Serratia marcescens cultured at 30 degrees C were approximately 10-fold more hemolytic than those grown at 37 degrees C. By using a cloned gene fusion of the promoter-proximal part of the hemolysin gene (shlA) to the Escherichia coli alkaline phosphatase gene (phoA), hemolysin gene expression as a function of alkaline phosphatase activity was measured at 30 and 37 degrees C. No difference in alkaline phosphatase activity was observed as a function of growth temperature, although more hemolysin was detectable immunologically in whole-cell extracts of cells grown at 30 degrees C. The influence of temperature was, however, growth phase dependent, because the hemolytic activities of cells cultured to early log phase at 30 and 37 degrees C were comparable. Given the outer membrane location of the hemolysin, lipopolysaccharide (LPS) was examined as a candidate for mediating the temperature effect on hemolytic activity.

  1. Example of a sentence with redundant localization entities results in two redundant relation cases.

PROTEIN_NAME=Filamentous hemagglutinin ORGANISM_NAME=B. pertussis ORGANISM_ID=520 LOCATION_NAME=cell surface LOCATION_ID=GO0005618 LOCATION_NAME=secreted LOCATION_ID=GO0005576 LOCATION_NAME=extracellular LOCATION_ID=GO0005576 TUPLE_ID=304 PSID=8398 Filamentous hemagglutinin (FHA) is the major adhesin of B. pertussis. It is a protein of approximately 220 kDa, found both associated at the bacterial cell surface and secreted into the extracellular milieu.

  1. Example

FourSentencePassage:1

 PSID=17422    PMID=12912902
 OrganismId=[562]      NCBIOrganismName=Escherichia coli
 LocationId=[GO0005737]        PPLRELocationName=cytoplasm
 ProteinId=[P69910]    SprotProteinName=Glutamate decarboxylase beta
  ProteinNamesCount=2 ProteinNamesInPassage=glutamate decarboxylase    GadB

1:In Escherichia coli, expression of glutamate decarboxylase (GadB ), a 330kDa hexamer, is induced to maintain the physiological pH under acidic conditions, like those of the passage through the stomach en route to the intestine . 2:GadB, together with the antiporter GadC, constitutes the gad acid resistance system, which confers the ability for bacterial survival for at least 2h in a strongly acidic environment . 3:GadB undergoes a pH-dependent conformational change and exhibits an activity optimum at low pH. We determined the crystal structures of GadB at acidic and neutral pH. They reveal the molecular details of the conformational change and the structural basis for the acidic pH optimum . 4:We demonstrate that the enzyme is localized exclusively in the cytoplasm at neutral pH, but is recruited to the membrane when the pH falls . DEBUG: Location[GO0005886, cytoplasmic membrane]

  1. Example

OneSentencePassage:2

 PSID=13846    PMID=11134504
 OrganismId=[319]      NCBIOrganismName=Pseudomonas syringae pv. phaseolicola
 LocationId=[GO0005576]        PPLRELocationName=extracellular
 ProteinId=[Q9F0B0]    SprotProteinName=Harpin hrpZ
  ProteinNamesCount=1 ProteinNamesInPassage=hrpZ

13846 319 Pseudomonas syringae pv. phaseolicola;P. syringae pv. phaseolicola GO0005576 secreted Q9F0B0 hrpZ 0>Here, we show that the hrpZ gene product from the bean halo-blight pathogen, Pseudomonas syringae pv. phaseolicola (HrpZPsph ), is secreted in an hrp-dependent manner in P. syringae pv. phaseolicola and exported by the type III secretion system in the mammalian pathogen Yersinia enterocolitica .</sent>


  1. Example for a "Yeast" (non Prokaryote) organism

PSID=14741.5 (PUBMED 8643535) Using a specific antibody, we demonstrated that Smf1p is located in the yeast plasma membrane.

This example comes from Mark Craven's dataset http://www.biostat.wisc.edu/~craven/ie/ 5 NP_SEGMENT:PROTEIN smf1p{UNK:PROTEIN} 8 NP_SEGMENT:LOCATION the{ART} yeast{N} plasma_membrane{N:LOCATION}

  1. Of a document with two passages containing the semantic relation

<sent=0>In Neisseria meningitidis, translocation of capsular polysaccharides to the cell surface is mediated by a transport system that fits the characteristics of ABC (ATP-binding cassette ) transporters .</sent> <sent=1>One protein of this transport system, termed CtrA, is located in the outer membrane .</sent> <sent=2>By use of a CtrA-specific monoclonal antibody, we could demonstrate that CtrA occurs exclusively in N. meningitidis and not in other pathogenic or nonpathogenic Neisseria species .</sent> <sent=3>Nucleotide sequence comparison of the ctrA gene from different meningococcal serogroups indicated that CtrA is strongly conserved in all meningococcal serogroups, independent of the chemical composition of the capsular polysaccharide .</sent> <sent=4>Secondary structure analysis revealed that CtrA is anchored in the outer membrane by eight membrane-spanning amphipathic beta strands, a structure of proteins that function as porins .</sent>


  • Example of a two sentence ternary (O/LP)

1482 ecfE 8412 1 Escherichia coli 8412 0 inner membrane 8412 1 1 0.0257 3 P0AEH1 562 Gram-negative GO0005886 <sent=0>We have identified a new protease in Escherichia coli, which is required for its viability under normal growth conditions .</sent> <sent=1>This protease is anchored in the inner membrane and the gene encoding it has been named ecfE, since it is transcribed by EE polymerase .</sent> alread in ePSORTdb, add this PMID find


  • Example of two proteins with the same name
    • note different localizations

1474 alkaline phosphatase P00634 18476 0 Escherichia coli 18476 2 periplasm 18476 0 1 0.9691 98 562 Gram-negative GO0030288 <sent=0>We isolated a collection of mutants defective in the export of alkaline phosphatase to the periplasm .</sent> <sent=1>Two classes of mutants were obtained : one class with lesions unlinked to the phoA gene and a second class harboring linked mutations .</sent> <sent=2>Among the former class, one mutant is cold sensitive for growth and may be defective in a component of the Escherichia coli secretory apparatus .</sent> 60 alkaline phosphatase Q05205 3704 0 Lysobacter enzymogenes 3704 0 secreted 3704 0 1 0.4668 50 69 GO0005576 Lysobacter enzymogenes produces an alkaline phosphatase which is secreted into the medium.