This page provides an overview for many of the external resources that we plan to use for the PPLRE Project.

PSORTdb

See: PPLRE ePSORTdbOrganismProteinLocalization Table See: PPLRE cPSORTdbOrganismProteinLocalization Table

PubMed

See: PubMed See: PubMed Central

= Stanford Parser

See: Stanford Parser See: PPLRE Stanford Parser

NCBI

See: NCBI.

It is the source of our OrganismID (See: “NCBIOrganismName and NCBIOrganismTreeNodes Tables” section of detailed data design)

Swiss-Prot

See: Swiss-Prot.
It is the data source for the SProtProteinProkaryote table. (see section 5.1.16)

TrEMBL (Translation of EMBL)

“TrEMBL is automatically generated (from annotated EMBL coding sequences (CDS)) and annotated using software tools. Contains all of what is not in SWISS-PROT. SWISS-PROT + TrEMBL = all known protein sequences. Once in SWISS-PROT, the entry is no more in TrEMBL, but still in EMBL (archive).”

UniProtKB (Universal Protein Knowledge Base)

“UniProtKB is the central hub for the

collection of functional information on proteins, with accurate, consistent, and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (principally, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data. Created by merging the data in Swiss-Prot, TrEMBL and PIR-PSD, individual UniProt Knowledgebase entries may contain more information than was available in any given separate source database. The UniProt Knowledgebase consists of two sections: a section containing manually-annotated records with information extracted from literature and curator-evaluated computational analysis, and a section with computationally analyzed records that await full manual annotation. For the sake of continuity and name recognition, the two sections are referred to as ‘Swiss-Prot’ and ‘TrEMBL’,

respectively.”

UniProt Knowledgebase Release 7.0 The

   UniProt consortium European Bioinformatics Institute (EBI), Swiss
   Institute of Bioinformatics (SIB) and Protein Information Resource (PIR), is pleased to announce UniProt Knowledgebase
   (UniProtKB) Release 7.0 (07-Feb-2006). UniProt (Universal Protein
   Resource) is a comprehensive catalog of information on proteins. UniProtKB
   Release 7.0 consists of 2,812,716 entries (UniProtKB/Swiss-Prot: 207,132

entries and UniProtKB/TrEMBL: 2,605,584 entries)

UniProt databases can be accessed from the web at <a

href="http://www.uniprot.org/">http://www.uniprot.org</a> and downloaded from <a href="http://www.uniprot.org/database/download.shtml">http://www.uniprot.org/database/download.shtml</a>.

   Detailed release statistics for TrEMBL and Swiss-Prot sections of the UniProt
   Knowledgebase can be viewed at <a

href="http://www.ebi.ac.uk/swissprot/sptr_stats/index.html">http://www.ebi.ac.uk/swissprot/sptr_stats/index.html</a> and <a href="http://www.expasy.org/sprot/relnotes/relstat.html">http://www.expasy.org/sprot/relnotes/relstat.html</a> respectively.

ExPASy (Expert Protein Analysis System)

ExPASy interface. The ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE.

GUI Examples===

Data===

Complete non-redundant sets of complete proteomes in UniProtKB.
ftp://ca.expasy.org/README
ftp://ca.expasy.org/databases/complete_proteomes/entries/bacteria/PSEAE.dat</a> (The PSEAE looks like a short form for PSEudomonas Aeruginosa)

UMLS (Unified Medical Language System)

The PPLRE project plans to use UMLS to assist with linguistic concept identification and Named Entity Recognition (NER). Specifically the Annotator’s Conceptualizer module will make use of the Metathesaurus and Semantic Network and the MMTx tool. It may also become a sources for organism instances.
It is a very large ontology that covers more than 100 source vocabularies including GO, NCBI-taxonomy, MeSH, HUGO etc
Provides several linguistics-oriented tools, one of which is for the NE annotation and is being used as the pre-processing of our NER module. E.g. MMTX
Website: http://www.nlm.nih.gov/research/umls/
“The purpose of the National Library of Medicine's (NLM’s) UMLS® is to facilitate the development of computer systems that behave as if they "understand" the meaning of the language of biomedicine and health. To that end, NLM produces and distributes the UMLS Knowledge Sources (databases) and associated software tools (programs) for use by system developers in building or enhancing electronic information systems that create, process, retrieve, integrate, and/or aggregate biomedical and health data and information, as well as in informatics research.” http://www.nlm.nih.gov/research/umls/about_umls.html.
UMLS consists of three components:

Metathesaurus===

A large multi-lingual vocabulary database that includes

   biomedial and health related concepts, their various terms and relationships
   among them. Includes more than 100 vocabulary sources, such as: MeSH, <a

href="#_7.22_____Gene_Ontology_(GO)_Cellula">GO</a> and

   <st1:Street w:st="on">
   <st1:address w:st="on">
   <st1:Street

w:st="on">

   <st1:address w:st="on">

SNOMED CT.

“The UMLS Metathesaurus is a very large, multi-purpose, and multi-lingual vocabulary database that contains information about biomedical and health related concepts, their various names, and the relationships among them. Designed for use by system developers, the Metathesaurus is built from the electronic versions of many different thesauri, classifications, code sets, and lists of controlled terms used in patient care, health services billing, public health statistics, indexing and cataloging biomedical literature, and/or basic, clinical, and health services research.” <a href="http://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html">http://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html</a>

Semantic Network===

An ontology of concepts and their relationships.
“The Semantic Network consists of (1) a set of

   broad subject categories, or Semantic Types, that provide a consistent
   categorization of all concepts represented in the UMLS Metathesaurus®, and (2)
   a set of useful and important relationships, or Semantic Relations, that exist
   between Semantic Types. This section of the documentation provides an overview
   of the Semantic Network, and describes the files of the Semantic Network.
   Sample records illustrate structure and content of these files.” <a

href="http://www.nlm.nih.gov/pubs/factsheets/umlssemn.html">http://www.nlm.nih.gov/pubs/factsheets/umlssemn.html</a>

3.SPECIALIST:

Lexical information of names
“The

   SPECIALIST lexicon has been developed to provide the lexical information needed
   for the SPECIALIST Natural Language Processing System (NLP). It is intended to
   be a general English lexicon that includes many biomedical terms. Coverage
   includes both commonly occurring English words and biomedical vocabulary. The
lexicon entry for each word or term records the syntactic, morphological, and
   orthographic information needed by the SPECIALIST NLP System.” <a

href="http://www.nlm.nih.gov/pubs/factsheets/umlslex.html">http://www.nlm.nih.gov/pubs/factsheets/umlslex.html</a>

Some

Organism names: UMLS has 383,064 organism names in its Metathesaurus. This

Protein names: 330,192 names under semantic type "Amino Acid, Peptide or

Prokaryote-protein relations: 40,263 pairs, most are co-occurrence

UMLS has

     a web-based query interface <a href="http://umlsks.nlm.nih.gov/">http://umlsks.nlm.nih.gov/</a>

MetaMap Transfer (MMTx) <a

http://mmtx.nlm.nih.gov/

Protein/Gene Named Entity Recognition, NLP Research

·There is a significant amount of recent research into the question of correctly identifying genes/proteins within natural language text (see sample listing of papers below). Unfortunately, most research papers do not appear to be accompanied by openly available programs. So, instead of developing these research solutions from scratch we plan to stick with freely available / executable programs (see GENIA above).

Entity Types

1.GENIA Ontology

<a

href="http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/topics/Corpus/genia-ontology.html">http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/topics/Corpus/genia-ontology.html</a>

PROTEIN
domain or region of DNA
CELL_COMPONENT
Tasks/Datasets

1.Bio-Entity Recognition Task at BioNLP/NLPBA 2004

<a href="http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/ERtask/report.html">http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/ERtask/report.html</a>
Held from March to April 2004
Demos:

1.<a href="http://nlp.i2r.a-star.edu.sg/demo_bioner.html">http://nlp.i2r.a-star.edu.sg/demo_bioner.html</a>

Papers:
Contextual weighting for Support Vector Machines in

   literature mining: an application to gene versus protein name disambiguation. T. Pahikkala, et al. <a

href="http://www.biomedcentral.com/1471-2105/6/157">http://www.biomedcentral.com/1471-2105/6/157</a>

Recognition of protein/gene names from text using an

   ensemble of classifiers. G. Zhou, et

al. <a href="http://www.biomedcentral.com/1471-2105/6/S1/S7">http://www.biomedcentral.com/1471-2105/6/S1/S7</a>

Exploring the boundaries: gene and protein

   identification in biomedical text. J.

Finkel et al. <a href="http://www.biomedcentral.com/1471-2105/6/S1/S5">http://www.biomedcentral.com/1471-2105/6/S1/S5</a>

A simple approach for protein name identification:

   prospects and limits. Katrin Fundel, et
   al. <a

href="http://www.biomedcentral.com/1471-2105/6/S1/S15">http://www.biomedcentral.com/1471-2105/6/S1/S15</a>

ProMiner: rule-based protein and gene entity

   recognition. D. Hanisch.et al. <a

href="http://www.biomedcentral.com/1471-2105/6/S1/S14">http://www.biomedcentral.com/1471-2105/6/S1/S14</a>

Gene/protein name recognition based on support vector

   machine using dictionary as features.

T. Mitsumori et al <a href="http://www.biomedcentral.com/1471-2105/6/S1/S8">

http://www.biomedcentral.com/1471-2105/6/S1/S8</a>

Using co-occurrence network structure to extract

synonymous gene and protein names from MEDLINE abstracts. A. Cohen et al <a href="http://www.biomedcentral.com/1471-2105/6/103">http://www.biomedcentral.com/1471-2105/6/103</a>

</HTML>

Snowball

See: PPLRE Snowball

Genbank

See: GenBank.

One challenge with Genbank is that it contains lots of redundant entries and unconfirmed sequences. That said, Genbank IDs are used more often than TREMBL IDs. The non-redudant curated set of IDs can be found within folders in /home/shared/NCBI_Genomes/curated/Bacteria. The .faa files would contain the IDs and protein names/descriptions (along with protein sequences but that can be ignored).
The OTHER set of GI numbers (with redundancies) that people sometimes use in recent literature can be found in ftp://ftp.ncbi.nih.gov/genbank/ It is unclear which files are the most useful for this project though. There's the livelists folder which contains a list of GIs + Accession numbers for ALL the entries in Genbank. There are also the gbbct1.seq.gz to gbbct13.seq.gz files, which contains too much information (full Genbank flatfiles - unsure if these are just DNA or DNA + proteins). Apparently there are supposed to be index files that contain less info (just the Accession + GI ids), but according to the release notes, they had trouble generating them for this release (152).

Bio-Acronym Databases

Acronyms are regularly used in biomed articles. The following datasets may help us resolve the meaning of the abbreviations that we encounter in our task.

ARGH (<a

href="http://invention.swmed.edu/argh/">invention.swmed.edu/argh</a> ): about

   221,000 unique acronyms. Zhongmin has the entire database (attributes: acronym,

full form, accuracy, context, etc.)

Acromed (<a

href="http://medstract.med.tufts.edu/acro1.1">medstract.med.tufts.edu/acro1.1</a> ): 481,531 acronyms. Zhongmin has the database (similar attributes as ARGH)

Standford Abbr.

   (<a href="http://abbreviation.stanford.edu/">abbreviation.stanford.edu</a> ): 2,074,367

abbreviations, program accessible. An example of searching "CPR":

<st1:State w:st="on"> <st1:place w:st="on"> <a name="OLE_LINK2"></a><a name="OLE_LINK1"></a>Ind	Abbr.	Long Form	Quality (Score)	#Docs
1	CPR	Cardio-Pulmonary Resuscitation	Excellent (0.91)	1,154
2	CPR	Computer Based Patient Records	Excellent (0.59)	65
3	CPR	C peptide immunoreactivity	Good (0.33)	52
4	CPR	Cefpirome	Good (0.34)	32
5	CPR	C-Peptide	Good (0.13)	29
6	CPR	Computerised Patient Record	Excellent (0.91)	18
7	CPR	chicken progesterone receptor	Excellent (0.91)	14
8	CPR	NADPH--cytochrome P450 reductase	Excellent (0.91)	13
9	CPR	C-peptide reactivity	Excellent (0.86)	10
10	CPR	Cefpirome sulfate	Good (0.07)	10

FASTA File Format

·<a href="http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml">http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml</a>

·“A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA format is:”

>gi|532319|pir|TVFV2E|TVFV2E envelope protein

ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT

QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC

HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK

MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK

TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF

APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL

LAAVEAQQQMLKLTIWGVK

From <a href="http://en.wikipedia.org/wiki/FASTA_format">http://en.wikipedia.org/wiki/FASTA_format</a>: “The FASTA defline format is not formally

defined, but generally uses the following abbreviations”:

GenBank gi|gi-number|gb|accession|locus

<a name="OLE_LINK13"></a><a name="OLE_LINK14"></a>EMBL Data Library gi|gi-number|emb|accession|locus

DDBJ, DNA Database of Japangi|gi-number|dbj|accession|locus

NBRF PIR pir||entry

Protein Research Foundationprf||name

SWISS-PROT sp|accession|name

Brookhaven Protein Data Bank (1) pdb|entry|chain

Brookhaven Protein Data Bank (2) entry:chain|PDBID|CHAIN|SEQUENCE

Patents pat|country|number

GenInfo Backbone Id bbs|number

General database identifiergnl|database|identifier

NCBI Reference Sequence ref|accession|locus

Local Sequence identifierlcl|identifier

Gene Ontology (GO) Cellular Component Ontology

“The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism. The three organizing principles of GO are molecular function, biological process and cellular component. … The cellular component ontology describes locations, at the levels of subcellular structures and macromolecular complexes. Examples of cellular components include nuclear inner membrane, with the synonym inner envelope, and the ubiquitin ligase complex, with several subtypes of these complexes represented. Generally, a gene product is located in or is a subcomponent of a particular cellular component. The cellular component ontology includes multi-subunit enzymes and other protein complexes, but not individual proteins or nucleic acids. Cellular component also does not include multicellular anatomical terms.”
“GO Accession ID” is one of the fields in ePSORTdb (see ePSORTdb)
<a href="http://www.geneontology.org/GO.component.guidelines.shtml">http://www.geneontology.org/GO.component.guidelines.shtml</a>
The Gene Ontology (GO) is one of the more popular ontology sources used by biologist. E.g. the cellular component ontology is used by the following biology ontologies: BioPax <owl:ObjectProperty rdf:about="#CELLULAR-LOCATION"> <a href="http://www.biopax.org/release/biopax-level2.owl">http://www.biopax.org/release/biopax-level2.owl</a>; and INOH (Integrating Network Objects with Hierarchies.
Example #1) extracellular region

<a

http://www.godatabase.org/cgi-bin/amigo/go.cgi?view=details&query=GO:0005576

Accession: GO:0005576
Ontology: cellular_component
Synonyms:

1.exact: extracellular

Definition: The space external to the

Example #2) plasma membrane

<a

http://www.godatabase.org/cgi-bin/amigo/go.cgi?view=details&query=GO:0005886

http://www.godatabase.org/cgi-bin/amigo/go.cgi?view=details&query=GO:0005886</a>

Accession: GO:0005886
Ontology: cellular_component
Synonyms:

1.related: plasma membrane cation-transporting ATPase

2.related: plasma membrane long-chain fatty acid transporter

3.narrow: bacterial inner membrane

4.exact: cell membrane

5.exact: cytoplasmic membrane

6.exact: plasmalemma

7.broad: juxtamembrane

Definition:

1. The membrane surrounding a cell that separates the cell from its external environment. It consists of a phospholipid bilayer and associated proteins.

Example #3) periplasmic space (sensu Proteobacteria)

<a

http://www.godatabase.org/cgi-bin/amigo/go.cgi?view=details&query=GO:0030288

Accession: GO:0005886
Ontology: cellular_component
Synonyms:

exact: periplasmic space (sensu Gram-negative bacteria)
broad: periplasm
broad: periplasmic space

·Definition: The region between the inner (cytoplasmic) membrane and outer membrane. As in, but not restricted to, the Gram-negative bacteria (Proteobacteria, ncbi_taxonomy_id:1224).

Example #4) cytoplasm

<a

http://www.godatabase.org/cgi-bin/amigo/go.cgi?view=details&query=GO:0005737

Accession: GO:0005737
Ontology: cellular_component
Synonyms: None
Definition: All of

Example #5) cellular_component

<a

http://www.godatabase.org/cgi-bin/amigo/go.cgi?view=details&query=GO:0005775

Accession: GO:0005775
Ontology: cellular_component
Synonyms: None
Definition: The part

NLM (National Library of Medicine)

·“The National Library of Medicine (NLM), on the campus of the National Institutes of Health (NIH) in <st1:place w:st="on"> <st1:City w:st="on"> Bethesda, <st1:State w:st="on"> Maryland, is the world's largest medical library. The Library collects materials in all areas of biomedicine and health care, as well as works on biomedical aspects of technology, the humanities, and the physical, life, and social sciences. The collections stand at more than 7 million items--books, journals, technical reports, manuscripts, microfilms, photographs and images.”

·Participates in: NCBI, MESH, UMLS

·<a href="http://www.nlm.nih.gov/">http://www.nlm.nih.gov/</a>

= <a name="_Ref132123190">7.24</a> PDB (Protein

   Data Bank)=

“The Protein Data Bank (PDB) uses macromolecular

Crystallographic Information File (mmCIF) data dictionaries to describe the information content of PDB entries. The RCSB PDB provides a variety of tools and resources for studying the structures of biological macromolecules and their relationships to sequence, function, and disease. The RCSB (Research Collaboratory for Structural Bioinformatics) is a member of the wwPDB whose mission is to ensure that the PDB archive remains an international resource
with uniform data.”

<a href="http://www.pdb.org/">http://www.pdb.org</a>
<a href="ftp://ftp.rcsb.org/pub/pdb/">ftp://ftp.rcsb.org/pub/pdb/</a>
The Worldwide Protein Data Bank (wwPDB) consists of

   three member organizations that act as deposition, data processing and
   distribution centers for PDB data. The founding members are RCSB PDB (USA),
   MSD-EBI (Europe) and PDBj (
   <st1:country-region w:st="on">
     <st1:place w:st="on">
     Japan</st1:country-region>
   )
   1. The mission of the wwPDB is to maintain a single Protein Data Bank Archive
   of macromolecular structural data that is freely and publicly available to the
   global community. H. Berman, et al (2003): Announcing the worldwide Protein
   Data Bank. Nature Structural Biology 10 (12), p. 980. <a

href="http://www.wwpdb.org/">http://www.wwpdb.org/</a>

SWISS-PROT Accession Format

ID IdentificationOne; starts the entry

AC Accession number(s)One or more

DT DateThree times

DE DescriptionOne or more

GN Gene name(s)Optional

OS Organism speciesOne or more

OG OrganelleOptional

OC Organism classificationOne or more

RN Reference number One or more

RP Reference positionOne or more

RC Reference comment(s)Optional

RX Reference cross-reference(s) Optional

RA Reference authorsOne or more

RT Reference titleOptional

RL Reference locationOne or more

CC Comments or notesOptional

DR Database cross-references Optional

KW Keywords Optional

FT Feature table dataOptional

SQ Sequence headerOne

Amino Acid SequenceOne

Termination lineOne; ends the entry

General information about the UniProtKB/Swiss-Prot entry

<a href="http://www.expasy.org/sprot/userman.html#ID_line">Entry name</a>

EPO_HUMAN

<a href="http://www.expasy.org/sprot/userman.html#AC_line">Primary accession number</a>

P01588

<a href="http://www.expasy.org/sprot/userman.html#AC_line">Secondary accession numbers</a>

Q549U2 Q9UDZ0 Q9UEZ5 Q9UHA0

<a href="http://www.expasy.org/sprot/userman.html#DT_line">Integrated into UniProtKB/Swiss-Prot</a>

21-JUL-1986

<a href="http://www.expasy.org/sprot/userman.html#DT_line">Sequence was last modified </a>

21-JUL-1986, version 1

<a href="http://www.expasy.org/sprot/userman.html#DT_line">Entry was last modified </a>

21-MAR-2006, version 68

<a href="http://www.expasy.org/sprot/userman.html#DE_line">Protein description</a>

<a href="http://www.expasy.org/sprot/userman.html#DE_line">Protein name</a>

Erythropoietin precursor

Synonyms

Epoetin

Origin of the protein

Genename

EPO

<a

     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=os&linetypedetail=scientificname&queryoperator=equals&querytext=Homo%20sapiens">Homo

sapiens</a> (Human)

[<a href="http://www.expasy.org/sprot/userman.html#OX_line">TaxID</a>:<a href="http://www.ebi.ac.uk/newt/display?search=9606">9606</a>]

<a href="http://www.expasy.org/sprot/userman.html#OC_line">Taxonomy</a>

<a

   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Eukaryota">Eukaryota</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Metazoa">Metazoa</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Chordata">Chordata</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Craniata">Craniata</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Vertebrata">Vertebrata</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Euteleostomi">Euteleostomi</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Mammalia">Mammalia</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Eutheria">Eutheria</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Euarchontoglires">Euarchontoglires</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Primates">Primates</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Catarrhini">Catarrhini</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Hominidae">Hominidae</a>; <a

href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Homo">Homo</a>.

<a href="http://www.expasy.org/sprot/userman.html#Ref_line">References</a>

[1]

NUCLEOTIDE SEQUENCE [GENOMIC DNA / MRNA].

MEDLINE=85137899; PubMed=3838366;[<a

     name="cit1EPO_HUMAN"></a><a
     href="http://www.ncbi.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=3838366&dopt=Abstract">NCBI</a>,<a
     href="http://www.expasy.org/cgi-bin/medline_local.pl?3838366">ExPASy</a>,<a
     href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-view+MedlineFull+%5bmedline-PMID:3838366%5d">EBI</a>,<a
     href="http://bip.weizmann.ac.il/cgi-bin/getpm?3838366">Israel</a>,<a

href="http://www.genome.ad.jp/dbget-bin/www_bget?pubmed+3838366">Japan</a>]

<a

     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Jacobs%20K.">Jacobs K.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Shoemaker%20C.">Shoemaker
                     C.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Rudersdorf%20R.">Rudersdorf
                     R.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Neill%20S.D.">Neill S.D.</a>,
                     <a
     href="javascript:showHide('divA28315',%20'imgA28315',%20'moreTextA28315');"><img
     border=0 width=12 height=13 id=imgA28315

SRC="PPLRE_7_ResourceDescriptions_files/image001.gif" align=absBottom></a>

<a

     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Kaufman%20R.J.">Kaufman
                     R.J.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Mufson%20A.">Mufson A.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Seehra%20J.">Seehra J.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Jones%20S.S.">Jones S.S.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Hewick%20R.">Hewick R.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Fritsch%20E.F.">Fritsch
                     E.F.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Kawakita%20M.">Kawakita
                     M.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Shimizu%20T.">Shimizu T.</a>, <a

href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Miyake%20T.">Miyake T.</a>;

"Isolation and characterization of genomic and cDNA clones of human erythropoietin.";

Nature 313:806-810(1985).

[2]

NUCLEOTIDE SEQUENCE [GENOMIC DNA].

MEDLINE=86067948; PubMed=3865178;[<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=3865178&dopt=Abstract">NCBI</a>,<a
     href="http://www.expasy.org/cgi-bin/medline_local.pl?3865178">ExPASy</a>,<a
     href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-view+MedlineFull+%5bmedline-PMID:3865178%5d">EBI</a>,<a
     href="http://bip.weizmann.ac.il/cgi-bin/getpm?3865178">Israel</a>,<a

href="http://www.genome.ad.jp/dbget-bin/www_bget?pubmed+3865178">Japan</a>]

<a

     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Lin%20F.-K.">Lin F.-K.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Suggs%20S.">Suggs S.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Lin%20C.-H.">Lin C.-H.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Browne%20J.K.">Browne
                     J.K.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Smalling%20R.">Smalling
                     R.</a>, <a
     href="javascript:showHide('divA28316',%20'imgA28316',%20'moreTextA28316');"><img
     border=0 width=12 height=13 id=imgA28316

SRC="PPLRE_7_ResourceDescriptions_files/image001.gif" align=absBottom></a>

<a

     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Egrie%20J.C.">Egrie J.C.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Chen%20K.K.">Chen K.K.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Fox%20G.M.">Fox G.M.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Martin%20F.">Martin F.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Stabinsky%20Z.">Stabinsky
                     Z.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Badrawi%20S.M.">Badrawi
                     S.M.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Lai%20P.-H.">Lai P.-H.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Goldwasser%20E.">Goldwasser

E.</a>;

"Cloning and expression of the human erythropoietin gene.";

<a

     href="http://www.pnas.org/cgi/lookup?vol=82&fp=7580&view=abstract">Proc.

Natl. Acad. Sci. U.S.A. 82:7580-7584(1985).</a>

<a href="http://www.expasy.org/sprot/userman.html#CC_line">Comments</a>

<a href="http://www.expasy.org/sprot/userman.html#CC_line">FUNCTION</a>

Erythropoietin is the principal hormone involved in the regulation

               of erythrocyte different<a name=commentsP01588></a><a name="cit3EPO_HUMAN"></a>iation
               and the maintenance of a physiological level of circulating erythrocyte

mass.

<a href="http://www.expasy.org/sprot/userman.html#CC_line">SUBCELLULAR LOCATION</a>

Secreted protein.

<a href="http://www.expasy.org/sprot/userman.html#CC_line">TISSUE SPECIFICITY</a>

Produced by kidney or liver of adult mammals and by liver of fetal or neonatal mammals.

<a href="http://www.expasy.org/sprot/userman.html#CC_line">PHARMACEUTICAL</a>

Used for the treatment of anemia. Available under the names

               Epogen (Amgen), Epogin (Chugai), Epomax (Elanex), Eprex (Janssen-Cilag),
               NeoRecormon or Recormon (Roche), and Procrit (Ortho Biotech). Variations in
         the glycosylation pattern of EPO distinguishes these products. Epogen,
               Epogin, Eprex and Procrit are generically known as epoetin alfa,

NeoRecormon and Recormon as epoetin beta and Epomax as epoetin omega.

<a href="http://www.expasy.org/sprot/userman.html#CC_line">SIMILARITY</a>

Belongs to the EPO/TPO family.

<a href="http://www.expasy.org/sprot/userman.html#CCDB">DATABASE</a>

NAME	R&D Systems' cytokine source book: EPO
WWW	"<a href="http://www.rndsystems.com/asp/g_sitebuilder.asp?bodyId=197">http://www.rndsystems.com/asp/g_sitebuilder.asp?bodyId=197</a>"

<a href="http://www.expasy.org/sprot/userman.html#DR_line">Cross-references</a>

X02158; CAA26095.1; -; Genomic_DNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?X02158">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=X02158&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?X02158">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:CAA26095*%5d">CoDingSequence</a>]
X02157; CAA26094.1; -; mRNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?X02157">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=X02157&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?X02157">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:CAA26094*%5d">CoDingSequence</a>]
M11319; AAA52400.1; -; Genomic_DNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?M11319">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=M11319&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?M11319">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAA52400*%5d">CoDingSequence</a>]
AF053356; AAC78791.1; -; Genomic_DNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF053356">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF053356&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF053356">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAC78791*%5d">CoDingSequence</a>]
AF202308; AAF23132.1; -; Genomic_DNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202308">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202308&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202308">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23132*%5d">CoDingSequence</a>]
AF202306; AAF23132.1; JOINED; Genomic_DNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202306">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202306&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202306">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23132*%5d">CoDingSequence</a>]
AF202307; AAF23132.1; JOINED; Genomic_DNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202307">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202307&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202307">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23132*%5d">CoDingSequence</a>]
AF202310; AAF23133.1; -; Genomic_DNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202310">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202310&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202310">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23133*%5d">CoDingSequence</a>]
AF202309; AAF23133.1; JOINED; Genomic_DNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202309">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202309&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202309">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23133*%5d">CoDingSequence</a>]
AF202311; AAF17572.1; -; Genomic_DNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202311">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202311&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202311">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF17572*%5d">CoDingSequence</a>]
AF202314; AAF23134.1; -; Genomic_DNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202314">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202314&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202314">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23134*%5d">CoDingSequence</a>]
AF202312; AAF23134.1; JOINED; Genomic_DNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202312">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202312&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202312">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23134*%5d">CoDingSequence</a>]
AF202313; AAF23134.1; JOINED; Genomic_DNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202313">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202313&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202313">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23134*%5d">CoDingSequence</a>]
AC009488; AAP22357.1; -; Genomic_DNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AC009488">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AC009488&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AC009488">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAP22357*%5d">CoDingSequence</a>]
BC093628; AAH93628.1; -; mRNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?BC093628">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=BC093628&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?BC093628">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAH93628*%5d">CoDingSequence</a>]
S65458; AAD13964.1; -; mRNA.	[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?S65458">EMBL</a>/<a href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=S65458&doptcmdl=GenBank">GenBank</a>/<a href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?S65458">DDBJ</a>][<a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAD13964*%5d">CoDingSequence</a>]

<a href="http://pir.georgetown.edu/cgi-bin/nbrfget?uid=A01855&xref=1">A01855</a>; ZUHU.

1BUY; NMR; A=28-193.

[<a href="http://www.expasy.org/cgi-bin/get-pdb.pl?1BUY">ExPASy</a>/<a

     href="http://www.rcsb.org/pdb/cgi/explore.cgi?pdbId=1BUY">RCSB</a>/<a

href="http://www.ebi.ac.uk/msd-srv/atlas?id=1BUY">EBI</a>]

1CN4; X-ray; C=28-193.

[<a href="http://www.expasy.org/cgi-bin/get-pdb.pl?1CN4">ExPASy</a>/<a

     href="http://www.rcsb.org/pdb/cgi/explore.cgi?pdbId=1CN4">RCSB</a>/<a

href="http://www.ebi.ac.uk/msd-srv/atlas?id=1CN4">EBI</a>]

1EER; X-ray; A=28-193.

[<a href="http://www.expasy.org/cgi-bin/get-pdb.pl?1EER">ExPASy</a>/<a

     href="http://www.rcsb.org/pdb/cgi/explore.cgi?pdbId=1EER">RCSB</a>/<a

href="http://www.ebi.ac.uk/msd-srv/atlas?id=1EER">EBI</a>]

<a href="http://www.expasy.org/cgi-bin/dbxref?GlycoSuiteDB">GlycoSuiteDB</a>

P01588; -.

<a href="http://www.expasy.org/cgi-bin/dbxref?Ensembl">Ensembl</a>

ENSG00000130427; Homo sapiens.

[<a

     href="http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000130427">Entry</a>/<a

href="http://www.ensembl.org/Homo_sapiens/contigview?gene=ENSG00000130427">Contig</a>]

<a href="http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/get_data.pl?hgnc_id=3415">HGNC:3415</a>; EPO.

133170; gene.

[<a

     href="http://www.ncbi.nih.gov/entrez/dispomim.cgi?id=133170">MIM</a>/<a

href="http://srs6.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+%5bomim-id:133170%5d">EBI</a>]

Cellular component	extracellular space	<a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0005615">GO:0005615</a>	traceable author statement
Biological process	circulation	<a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0008015">GO:0008015</a>	non-traceable author statement
Biological process	response to stress	<a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0006950">GO:0006950</a>	traceable author statement
Biological process	signal transduction	<a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007165">GO:0007165</a>	non-traceable author statement
[<a href="http://www.ebi.ac.uk/ego/GSearch?mode=protein&ontology=all_ont&query=P01588">QuickGO</a>]

<a href="http://www.expasy.org/cgi-bin/dbxref?InterPro">InterPro</a>

<a href="http://www.ebi.ac.uk/interpro/IEntry?ac=IPR009079">IPR009079</a>; 4_helix_cytokine.
<a href="http://www.ebi.ac.uk/interpro/IEntry?ac=IPR012351">IPR012351</a>; Cytokine_4_hlx.
<a href="http://www.ebi.ac.uk/interpro/IEntry?ac=IPR001323">IPR001323</a>; EPO_TPO.
<a href="http://www.ebi.ac.uk/interpro/IEntry?ac=IPR003013">IPR003013</a>; Erythroptn.
<a href="http://www.ebi.ac.uk/interpro/ISpy?mode=single&ac=P01588">Graphical view of the domain structure</a>

<a href="http://www.expasy.org/cgi-bin/dbxref?PANTHER">PANTHER</a>

<a href="http://www.pantherdb.org/panther/family.do?clsAccession=PTHR10370">PTHR10370</a>; Erythroptn; 1.

<a href="http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00758">PF00758</a>; EPO_TPO; 1.

<a

     href="http://www.sanger.ac.uk/cgi-bin/Pfam/swisspfamget.pl?name=P01588">Pfam

graphical view of domain structure</a>

<a href="http://www.expasy.org/cgi-bin/dbxref?PIRSF">PIRSF</a>

<a href="http://pir.georgetown.edu/cgi-bin/ipcSF?id=PIRSF001951">PIRSF001951</a>; EPO; 1.

<a href="http://www.expasy.org/cgi-bin/dbxref?PRINTS">PRINTS</a>

<a href="http://www.bioinf.man.ac.uk/cgi-bin/dbbrowser/sprint/searchprintss.cgi?display_opts=Prints&category=None&queryform=false&prints_accn=PR00272">PR00272</a>; ERYTHROPTN.

<a href="http://www.expasy.org/cgi-bin/dbxref?PROSITE">PROSITE</a>

<a href="http://www.expasy.org/cgi-bin/nicedoc.pl?PS00817">PS00817</a>; EPO_TPO; 1.

<a href="http://www.expasy.org/sprot/userman.html#KW_line">Keywords</a>

<a href="http://www.expasy.org/cgi-bin/get-entries?KW=3D-structure">3D-structure</a>

<a

   href="http://www.expasy.org/cgi-bin/get-entries?KW=Direct%20protein%20sequencing">Direct

protein sequencing</a>

<a

   href="http://www.expasy.org/cgi-bin/get-entries?KW=Erythrocyte%20maturation">Erythrocyte

maturation</a>

<a href="http://www.expasy.org/cgi-bin/get-entries?KW=Glycoprotein">Glycoprotein</a>

<a href="http://www.expasy.org/cgi-bin/get-entries?KW=Hormone">Hormone</a>

<a href="http://www.expasy.org/cgi-bin/get-entries?KW=Pharmaceutical">Pharmaceutical</a>

<a href="http://www.expasy.org/cgi-bin/get-entries?KW=Polymorphism">Polymorphism</a>

<a href="http://www.expasy.org/cgi-bin/get-entries?KW=Signal">Signal</a>

<a href="http://www.expasy.org/sprot/userman.html#FT_line">Features</a>

Type

<a href="http://www.expasy.org/sprot/userman.html#FT_position">FromTo</a>

Length

Description

<a href="http://www.expasy.org/sprot/userman.html#FTID">Feature ID</a>

<a href="http://www.expasy.org/sprot/userman.html#FT_SIGNAL">SIGNAL</a>

27

<a href="http://www.expasy.org/sprot/userman.html#FT_CHAIN">CHAIN</a>

166

Erythropoietin

PRO_0000008401

<a href="http://www.expasy.org/sprot/userman.html#FT_PROPEP">PROPEP</a>

4

Removed in mature form (Probable)

PRO_000000840<a name="ftidEPO_HUMANPRO_0000008401"></a>2

<a href="http://www.expasy.org/sprot/userman.html#FT_CARBOHYD">CARBOHYD</a>

N-linked (GlcNAc...)

CAR_000052

<a href="http://www.expasy.org/sprot/userman.html#FT_CARBOHYD">CARBOHYD</a>

N-linked (GlcNAc...)

CAR_000166

<a href="http://www.expasy.org/sprot/userman.html#FT_CARBOHYD">CARBOHYD</a>

N-linked (GlcNAc...)

CAR_000192

<a href="http://www.expasy.org/sprot/userman.html#FT_CARBOHYD">CARBOHYD</a>

O-linked (GalNAc...).

<a href="http://www.expasy.org/sprot/userman.html#FT_DISULFID">DISULFID</a>

<a href="http://www.expasy.org/sprot/userman.html#FT_DISULFID">DISULFID</a>

<a href="http://www.expasy.org/sprot/userman.html#FT_VARIANT">VARIANT</a>

SL -> NF (in an hepatocellular carcinoma)

VAR_009870

<a href="http://www.expasy.org/sprot/userman.html#FT_VARIANT">VARIANT</a>

P -> Q (in an hepatocellular carcinoma)

<a href="http://www.expasy.org/cgi-bin/get-sprot-variant.pl?VAR_009871">VAR_009871</a>

<a href="http://www.expasy.org/sprot/userman.html#FT_CONFLICT">CONFLICT</a>

E -> Q (in Ref. <a

   href="http://www.ebi.uniprot.org/entry/EPO_HUMAN#cit1EPO_HUMAN">1</a>; <a

href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:CAA26095*%5d">CAA26095</a>).

<a href="http://www.expasy.org/sprot/userman.html#FT_CONFLICT">CONFLICT</a>

Q -> QQ (in Ref. <a href="http://www.ebi.uniprot.org/entry/EPO_HUMAN#cit7EPO_HUMAN">7</a>).

<a href="http://www.expasy.org/sprot/userman.html#FT_CONFLICT">CONFLICT</a>

G -> R

               (in Ref. <a href="http://www.ebi.uniprot.org/entry/EPO_HUMAN#cit1EPO_HUMAN">1</a>; <a

href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:CAA26095*%5d">CAA26095</a>).

<a href="http://www.expasy.org/sprot/userman.html#FT_STRAND">STRAND</a>

1

<a href="http://www.expasy.org/sprot/userman.html#FT_HELIX">HELIX</a>

3

<a href="http://www.expasy.org/sprot/userman.html#FT_HELIX">HELIX</a>

7

2

<a href="http://www.expasy.org/sprot/userman.html#FT_STRAND">STRAND</a>

1

<a href="http://www.expasy.org/sprot/userman.html#FT_STRAND">STRAND</a>

5

<a href="http://www.expasy.org/sprot/userman.html#FT_HELIX">HELIX</a>

13

1

<a href="http://www.expasy.org/sprot/userman.html#FT_HELIX">HELIX</a>

10

<a href="http://www.expasy.org/sprot/userman.html#SQ_line">Sequence information</a>

<a href="http://www.expasy.org/sprot/userman.html#SQ_line">Length</a>

193 AA

<a href="http://www.expasy.org/sprot/userman.html#SQ_line">Molecular weight</a><a name=seqP01588></a>

21307 Da

C91F0E4C26A52033 [This is a checksum on the sequence]

MGVHECPAWL WLLLSLLSLP LGLPVLGAPP RLICDSRVLE RYLLEAKEAE 50 <BR> 
     <BR>

NITTGCAEHC SLNENITVPD TKVNFYAWKR MEVGQQAVEV WQGLALLSEA 100 <BR> 
     <BR>

VLRGQALLVN SSQPWEPLQL HVDKAVSGLR SLTTLLRALG AQKEAISPPD 150 <BR> 
     <BR>

AASAAPLRTI TADTFRKLFR VYSNFLRGKL KLYTGEACRT GDR193

PDF file conversion

An hour of experimentation of a few tools led to the selection of Xpdf

<a href="http://www.foolabs.com/xpdf/">http://www.foolabs.com/xpdf/</a>
Xpdf 3.01pl2 was released 2006-feb-08.

Protein Information Resource (PIR) Center

iProLink

WebSite: <a href="http://pir.georgetown.edu/pirwww/iprolink/protname.shtml">http://pir.georgetown.edu/pirwww/iprolink/protname.shtml</a>
Data: <a

ftp://ftp.pir.georgetown.edu/pir_databases/iprolink/biothesaurus.dist

BTW,

TextMining on Bio

<a href="http://blimp.cs.queensu.ca/cateR_1.html">http://blimp.cs.queensu.ca/cateR_1.html</a>
Contains several review papers on text mining in biomedicine:

KDDCup08 Proposal

* KDDCup08 Proposal

PPLRE Resources

PSORTdb

PubMed

NCBI

Swiss-Prot

TrEMBL (Translation of EMBL)

UniProtKB (Universal Protein Knowledge Base)

ExPASy (Expert Protein Analysis System)

GUI Examples===

Data===

UMLS (Unified Medical Language System)

Metathesaurus===

Semantic Network===

Protein/Gene Named Entity Recognition, NLP Research

Snowball

Genbank

Bio-Acronym Databases

FASTA File Format

Gene Ontology (GO) Cellular Component Ontology

NLM (National Library of Medicine)

SWISS-PROT Accession Format

PDF file conversion

Protein Information Resource (PIR) Center

TextMining on Bio

KDDCup08 Proposal

Navigation menu

PPLRE Resources

PSORTdb

PubMed

NCBI

Swiss-Prot

TrEMBL (Translation of EMBL)

UniProtKB (Universal Protein Knowledge Base)

ExPASy (Expert Protein Analysis System)

GUI Examples===

Data===

UMLS (Unified Medical Language System)

Metathesaurus===

Semantic Network===

Protein/Gene Named Entity Recognition, NLP Research

Snowball

Genbank

Bio-Acronym Databases

FASTA File Format

Gene Ontology (GO) Cellular Component Ontology

NLM (National Library of Medicine)

SWISS-PROT Accession Format

PDF file conversion

Protein Information Resource (PIR) Center

TextMining on Bio

KDDCup08 Proposal

Navigation menu

Search