PPLRE Resources

From GM-RKB
Jump to navigation Jump to search

This page provides an overview for many of the external resources that we plan to use for the PPLRE Project.

PSORTdb

See: PPLRE ePSORTdbOrganismProteinLocalization Table See: PPLRE cPSORTdbOrganismProteinLocalization Table

PubMed

See: PubMed See: PubMed Central

= Stanford Parser

See: Stanford Parser See: PPLRE Stanford Parser

NCBI

See: NCBI.

  • It is the source of our OrganismID (See: “NCBIOrganismName and NCBIOrganismTreeNodes Tables” section of detailed data design)

Swiss-Prot

  • See: Swiss-Prot.
  • It is the data source for the SProtProteinProkaryote table. (see section 5.1.16)

TrEMBL (Translation of EMBL)

  • TrEMBL is automatically generated (from annotated EMBL coding sequences (CDS)) and annotated using software tools. Contains all of what is not in SWISS-PROT. SWISS-PROT + TrEMBL = all known protein sequences. Once in SWISS-PROT, the entry is no more in TrEMBL, but still in EMBL (archive).”

UniProtKB (Universal Protein Knowledge Base)

  • UniProtKB is the central hub for the
   collection of functional information on proteins, with accurate, consistent, and
   rich annotation. In addition to capturing the core data mandatory for each
   UniProtKB entry (principally, the amino acid sequence, protein name or
   description, taxonomic data and citation information), as much annotation
   information as possible is added. This includes widely accepted biological
   ontologies, classifications and cross-references, and clear indications of the
   quality of annotation in the form of evidence attribution of experimental and
   computational data. Created by merging the data in Swiss-Prot, TrEMBL and
   PIR-PSD, individual UniProt Knowledgebase entries may contain more information
   than was available in any given separate source database. The UniProt
   Knowledgebase consists of two sections: a section containing manually-annotated
   records with information extracted from literature and curator-evaluated
   computational analysis, and a section with computationally analyzed records
   that await full manual annotation. For the sake of continuity and name
   recognition, the two sections are referred to as ‘Swiss-Prot’ and ‘TrEMBL’,

respectively.”

  • UniProt Knowledgebase Release 7.0 The
   UniProt consortium European Bioinformatics Institute (EBI), Swiss
   Institute of Bioinformatics (SIB) and Protein Information Resource (PIR), is pleased to announce UniProt Knowledgebase
   (UniProtKB) Release 7.0 (07-Feb-2006). UniProt (Universal Protein
   Resource) is a comprehensive catalog of information on proteins. UniProtKB
   Release 7.0 consists of 2,812,716 entries (UniProtKB/Swiss-Prot: 207,132

entries and UniProtKB/TrEMBL: 2,605,584 entries)

  • UniProt databases can be accessed from the web at <a

href="http://www.uniprot.org/">http://www.uniprot.org</a> and downloaded from <a href="http://www.uniprot.org/database/download.shtml">http://www.uniprot.org/database/download.shtml</a>.

   Detailed release statistics for TrEMBL and Swiss-Prot sections of the UniProt
   Knowledgebase can be viewed at <a

href="http://www.ebi.ac.uk/swissprot/sptr_stats/index.html">http://www.ebi.ac.uk/swissprot/sptr_stats/index.html</a> and <a href="http://www.expasy.org/sprot/relnotes/relstat.html">http://www.expasy.org/sprot/relnotes/relstat.html</a> respectively.

ExPASy (Expert Protein Analysis System)

  • ExPASy interface. The ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE.

GUI Examples===

Data===

UMLS (Unified Medical Language System)

  • The PPLRE project plans to use UMLS to assist with linguistic concept identification and Named Entity Recognition (NER). Specifically the Annotator’s Conceptualizer module will make use of the Metathesaurus and Semantic Network and the MMTx tool. It may also become a sources for organism instances.
  • It is a very large ontology that covers more than 100 source vocabularies including GO, NCBI-taxonomy, MeSH, HUGO etc
  • Provides several linguistics-oriented tools, one of which is for the NE annotation and is being used as the pre-processing of our NER module. E.g. MMTX
  • Website: http://www.nlm.nih.gov/research/umls/
  • The purpose of the National Library of Medicine's (NLM’s) UMLS® is to facilitate the development of computer systems that behave as if they "understand" the meaning of the language of biomedicine and health. To that end, NLM produces and distributes the UMLS Knowledge Sources (databases) and associated software tools (programs) for use by system developers in building or enhancing electronic information systems that create, process, retrieve, integrate, and/or aggregate biomedical and health data and information, as well as in informatics research.” http://www.nlm.nih.gov/research/umls/about_umls.html.
  • UMLS consists of three components:

Metathesaurus===

  • A large multi-lingual vocabulary database that includes
   biomedial and health related concepts, their various terms and relationships
   among them. Includes more than 100 vocabulary sources, such as: MeSH, <a

href="#_7.22_____Gene_Ontology_(GO)_Cellula">GO</a> and

   <st1:Street w:st="on">
   <st1:address w:st="on">
   <st1:Street

w:st="on">

   <st1:address w:st="on">

SNOMED CT.

  • The UMLS Metathesaurus is a very large, multi-purpose, and multi-lingual vocabulary database that contains information about biomedical and health related concepts, their various names, and the relationships among them. Designed for use by system developers, the Metathesaurus is built from the electronic versions of many different thesauri, classifications, code sets, and lists of controlled terms used in patient care, health services billing, public health statistics, indexing and cataloging biomedical literature, and/or basic, clinical, and health services research.” <a href="http://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html">http://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html</a>

Semantic Network===

  • An ontology of concepts and their relationships.

  • The Semantic Network consists of (1) a set of
   broad subject categories, or Semantic Types, that provide a consistent
   categorization of all concepts represented in the UMLS Metathesaurus®, and (2)
   a set of useful and important relationships, or Semantic Relations, that exist
   between Semantic Types. This section of the documentation provides an overview
   of the Semantic Network, and describes the files of the Semantic Network.
   Sample records illustrate structure and content of these files.” <a

href="http://www.nlm.nih.gov/pubs/factsheets/umlssemn.html">http://www.nlm.nih.gov/pubs/factsheets/umlssemn.html</a>

3.SPECIALIST:

  • Lexical information of names

  • The
   SPECIALIST lexicon has been developed to provide the lexical information needed
   for the SPECIALIST Natural Language Processing System (NLP). It is intended to
   be a general English lexicon that includes many biomedical terms. Coverage
   includes both commonly occurring English words and biomedical vocabulary. The
  lexicon entry for each word or term records the syntactic, morphological, and
   orthographic information needed by the SPECIALIST NLP System.” <a

href="http://www.nlm.nih.gov/pubs/factsheets/umlslex.html">http://www.nlm.nih.gov/pubs/factsheets/umlslex.html</a>

    • Some
    relevant facts:
      • Organism names: UMLS has 383,064 organism names in its Metathesaurus. This
      includes 92,448 prokaryote names. In terms of taxonomy, it has separate categories (semantic types) for bacteria and archaea. While NCBI taxonomy has roughly same amount of prokaryote names: 84,192 out of 385,145 organism names.
      • Protein names: 330,192 names under semantic type "Amino Acid, Peptide or
      Protein".
      • Prokaryote-protein relations: 40,263 pairs, most are co-occurrence
      relationships.
    • UMLS has
         a web-based query interface <a href="http://umlsks.nlm.nih.gov/">http://umlsks.nlm.nih.gov/</a>
    
    (requires free registration - takes a few days to process)
    • MetaMap Transfer (MMTx) <a
    href="http://mmtx.nlm.nih.gov/">http://mmtx.nlm.nih.gov/</a> MetaMap maps arbitrary text to concepts in the UMLS Metathesaurus; or, equivalently, it discovers Metathesaurus concepts in text.

Protein/Gene Named Entity Recognition, NLP Research

·There is a significant amount of recent research into the question of correctly identifying genes/proteins within natural language text (see sample listing of papers below). Unfortunately, most research papers do not appear to be accompanied by openly available programs. So, instead of developing these research solutions from scratch we plan to stick with freely available / executable programs (see GENIA above).

  • Entity Types

1.GENIA Ontology

  • <a

href="http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/topics/Corpus/genia-ontology.html">http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/topics/Corpus/genia-ontology.html</a>

  • PROTEIN

  • domain or region of DNA

  • CELL_COMPONENT

  • Tasks/Datasets

1.Bio-Entity Recognition Task at BioNLP/NLPBA 2004

1.<a href="http://nlp.i2r.a-star.edu.sg/demo_bioner.html">http://nlp.i2r.a-star.edu.sg/demo_bioner.html</a>

  • Papers:

  • Contextual weighting for Support Vector Machines in
   literature mining: an application to gene versus protein name disambiguation. T. Pahikkala, et al. <a

href="http://www.biomedcentral.com/1471-2105/6/157">http://www.biomedcentral.com/1471-2105/6/157</a>

  • Recognition of protein/gene names from text using an
   ensemble of classifiers. G. Zhou, et

al. <a href="http://www.biomedcentral.com/1471-2105/6/S1/S7">http://www.biomedcentral.com/1471-2105/6/S1/S7</a>

  • Exploring the boundaries: gene and protein
   identification in biomedical text. J.

Finkel et al. <a href="http://www.biomedcentral.com/1471-2105/6/S1/S5">http://www.biomedcentral.com/1471-2105/6/S1/S5</a>

  • A simple approach for protein name identification:
   prospects and limits. Katrin Fundel, et
   al. <a

href="http://www.biomedcentral.com/1471-2105/6/S1/S15">http://www.biomedcentral.com/1471-2105/6/S1/S15</a>

  • ProMiner: rule-based protein and gene entity
   recognition. D. Hanisch.et al. <a

href="http://www.biomedcentral.com/1471-2105/6/S1/S14">http://www.biomedcentral.com/1471-2105/6/S1/S14</a>

  • Gene/protein name recognition based on support vector
   machine using dictionary as features.

T. Mitsumori et al <a href="http://www.biomedcentral.com/1471-2105/6/S1/S8">

http://www.biomedcentral.com/1471-2105/6/S1/S8</a>

  • Using co-occurrence network structure to extract
synonymous gene and protein names from MEDLINE abstracts. A. Cohen et al <a href="http://www.biomedcentral.com/1471-2105/6/103">http://www.biomedcentral.com/1471-2105/6/103</a>

</HTML>

Snowball

See: PPLRE Snowball

Genbank

  • One challenge with Genbank is that it contains lots of redundant entries and unconfirmed sequences. That said, Genbank IDs are used more often than TREMBL IDs. The non-redudant curated set of IDs can be found within folders in /home/shared/NCBI_Genomes/curated/Bacteria. The .faa files would contain the IDs and protein names/descriptions (along with protein sequences but that can be ignored).
  • The OTHER set of GI numbers (with redundancies) that people sometimes use in recent literature can be found in ftp://ftp.ncbi.nih.gov/genbank/ It is unclear which files are the most useful for this project though. There's the livelists folder which contains a list of GIs + Accession numbers for ALL the entries in Genbank. There are also the gbbct1.seq.gz to gbbct13.seq.gz files, which contains too much information (full Genbank flatfiles - unsure if these are just DNA or DNA + proteins). Apparently there are supposed to be index files that contain less info (just the Accession + GI ids), but according to the release notes, they had trouble generating them for this release (152).

Bio-Acronym Databases

Acronyms are regularly used in biomed articles. The following datasets may help us resolve the meaning of the abbreviations that we encounter in our task.

  • ARGH (<a

href="http://invention.swmed.edu/argh/">invention.swmed.edu/argh</a> ): about

   221,000 unique acronyms. Zhongmin has the entire database (attributes: acronym,
full form, accuracy, context, etc.)

  • Acromed (<a
href="http://medstract.med.tufts.edu/acro1.1">medstract.med.tufts.edu/acro1.1</a> ): 481,531 acronyms. Zhongmin has the database (similar attributes as ARGH)

  • Standford Abbr.
   (<a href="http://abbreviation.stanford.edu/">abbreviation.stanford.edu</a> ): 2,074,367
abbreviations, program accessible. An example of searching "CPR":

         <st1:State w:st="on">
         <st1:place w:st="on">
         <a name="OLE_LINK2"></a><a
name="OLE_LINK1"></a>Ind

Abbr.

Long Form

Quality (Score)

#Docs

1

CPR

Cardio-Pulmonary Resuscitation

Excellent (0.91)

1,154

2

CPR

Computer Based Patient Records

Excellent (0.59)

65

3

CPR

C peptide immunoreactivity

Good (0.33)

52

4

CPR

Cefpirome

Good (0.34)

32

5

CPR

C-Peptide

Good (0.13)

29

6

CPR

Computerised Patient Record

Excellent (0.91)

18

7

CPR

chicken progesterone receptor

Excellent (0.91)

14

8

CPR

NADPH--cytochrome P450 reductase

Excellent (0.91)

13

9

CPR

C-peptide reactivity

Excellent (0.86)

10

10

CPR

Cefpirome sulfate

Good (0.07)

10

FASTA File Format

·<a href="http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml">http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml</a>

·“A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA format is:”

>gi|532319|pir|TVFV2E|TVFV2E envelope protein

ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT

QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC

HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK

MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK

TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF

APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL

LAAVEAQQQMLKLTIWGVK

defined, but generally uses the following abbreviations”:

GenBank gi|gi-number|gb|accession|locus

<a name="OLE_LINK13"></a><a name="OLE_LINK14"></a>EMBL Data Library gi|gi-number|emb|accession|locus

DDBJ, DNA Database of Japangi|gi-number|dbj|accession|locus

NBRF PIR pir||entry

Protein Research Foundationprf||name

SWISS-PROT sp|accession|name

Brookhaven Protein Data Bank (1) pdb|entry|chain

Brookhaven Protein Data Bank (2) entry:chain|PDBID|CHAIN|SEQUENCE

Patents pat|country|number

GenInfo Backbone Id bbs|number

General database identifiergnl|database|identifier

NCBI Reference Sequence ref|accession|locus

Local Sequence identifierlcl|identifier

Gene Ontology (GO) Cellular Component Ontology

1.exact: extracellular

      • Definition: The space external to the
      outermost structure of a cell. For cells without external protective or external encapsulating structures this refers to space outside of the plasma membrane. This term covers the host cell environment outside an intracellular parasite.

1.related: plasma membrane cation-transporting ATPase

2.related: plasma membrane long-chain fatty acid transporter

3.narrow: bacterial inner membrane

4.exact: cell membrane

5.exact: cytoplasmic membrane

6.exact: plasmalemma

7.broad: juxtamembrane

      • Definition:

1. The membrane surrounding a cell that separates the cell from its external environment. It consists of a phospholipid bilayer and associated proteins.

      1. exact: periplasmic space (sensu Gram-negative bacteria)
      2. broad: periplasm
      3. broad: periplasmic space

·Definition: The region between the inner (cytoplasmic) membrane and outer membrane. As in, but not restricted to, the Gram-negative bacteria (Proteobacteria, ncbi_taxonomy_id:1224).

NLM (National Library of Medicine)

·“The National Library of Medicine (NLM), on the campus of the National Institutes of Health (NIH) in <st1:place w:st="on"> <st1:City w:st="on"> Bethesda, <st1:State w:st="on"> Maryland, is the world's largest medical library. The Library collects materials in all areas of biomedicine and health care, as well as works on biomedical aspects of technology, the humanities, and the physical, life, and social sciences. The collections stand at more than 7 million items--books, journals, technical reports, manuscripts, microfilms, photographs and images.”

·Participates in: NCBI, MESH, UMLS

·<a href="http://www.nlm.nih.gov/">http://www.nlm.nih.gov/</a>

= <a name="_Ref132123190">7.24</a> PDB (Protein

   Data Bank)=
  • The Protein Data Bank (PDB) uses macromolecular
   Crystallographic Information File (mmCIF) data dictionaries to describe the
  information content of PDB entries. The RCSB PDB provides a variety of tools and
   resources for studying the structures of biological macromolecules and their
   relationships to sequence, function, and disease. The RCSB (Research
   Collaboratory for Structural Bioinformatics) is a member of the wwPDB whose
   mission is to ensure that the PDB archive remains an international resource
with uniform data
.”

   three member organizations that act as deposition, data processing and
   distribution centers for PDB data. The founding members are RCSB PDB (USA),
   MSD-EBI (Europe) and PDBj (
   <st1:country-region w:st="on">
     <st1:place w:st="on">
     Japan</st1:country-region>
   )
   1. The mission of the wwPDB is to maintain a single Protein Data Bank Archive
   of macromolecular structural data that is freely and publicly available to the
   global community. H. Berman, et al (2003): Announcing the worldwide Protein
   Data Bank. Nature Structural Biology 10 (12), p. 980. <a
href="http://www.wwpdb.org/">http://www.wwpdb.org/</a>

SWISS-PROT Accession Format

<IMG border=0 width=599 height=744 id="_x0000_i1025" src="PPLRE_7_ResourceDescriptions_files/image001.gif">

ID IdentificationOne; starts the entry

AC Accession number(s)One or more

DT DateThree times

DE DescriptionOne or more

GN Gene name(s)Optional

OS Organism speciesOne or more

OG OrganelleOptional

OC Organism classificationOne or more

RN Reference number One or more

RP Reference positionOne or more

RC Reference comment(s)Optional

RX Reference cross-reference(s) Optional

RA Reference authorsOne or more

RT Reference titleOptional

RL Reference locationOne or more

CC Comments or notesOptional

DR Database cross-references Optional

KW Keywords Optional

FT Feature table dataOptional

SQ Sequence headerOne

Amino Acid SequenceOne

Termination lineOne; ends the entry

General information about the UniProtKB/Swiss-Prot entry

<a href="http://www.expasy.org/sprot/userman.html#ID_line">Entry name</a>

EPO_HUMAN

<a href="http://www.expasy.org/sprot/userman.html#AC_line">Primary accession number</a>

P01588

<a href="http://www.expasy.org/sprot/userman.html#AC_line">Secondary accession numbers</a>

Q549U2 Q9UDZ0 Q9UEZ5 Q9UHA0

<a href="http://www.expasy.org/sprot/userman.html#DT_line">Integrated into UniProtKB/Swiss-Prot</a>

21-JUL-1986

<a href="http://www.expasy.org/sprot/userman.html#DT_line">Sequence was last modified </a>

21-JUL-1986, version 1

<a href="http://www.expasy.org/sprot/userman.html#DT_line">Entry was last modified </a>

21-MAR-2006, version 68

<a href="http://www.expasy.org/sprot/userman.html#DE_line">Protein description</a>

<a href="http://www.expasy.org/sprot/userman.html#DE_line">Protein name</a>

Erythropoietin precursor

Synonyms

Epoetin

Origin of the protein

<a href="http://www.expasy.org/sprot/userman.html#GN_line">Gene</a>

Genename

EPO

<a href="http://www.expasy.org/sprot/userman.html#OS_line">From</a>

<a

     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=os&linetypedetail=scientificname&queryoperator=equals&querytext=Homo%20sapiens">Homo
sapiens</a> (Human)

[<a href="http://www.expasy.org/sprot/userman.html#OX_line">TaxID</a>:<a href="http://www.ebi.ac.uk/newt/display?search=9606">9606</a>]

<a href="http://www.expasy.org/sprot/userman.html#OC_line">Taxonomy</a>

<a

   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Eukaryota">Eukaryota</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Metazoa">Metazoa</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Chordata">Chordata</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Craniata">Craniata</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Vertebrata">Vertebrata</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Euteleostomi">Euteleostomi</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Mammalia">Mammalia</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Eutheria">Eutheria</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Euarchontoglires">Euarchontoglires</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Primates">Primates</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Catarrhini">Catarrhini</a>; <a
   href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Hominidae">Hominidae</a>; <a
href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=oc&queryoperator=contains&querytext=Homo">Homo</a>.

<a href="http://www.expasy.org/sprot/userman.html#Ref_line">References</a>

<a name=refP01588></a>

[1]

NUCLEOTIDE SEQUENCE [GENOMIC DNA / MRNA].

MEDLINE=85137899; PubMed=3838366;[<a

     name="cit1EPO_HUMAN"></a><a
     href="http://www.ncbi.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=3838366&dopt=Abstract">NCBI</a>,<a
     href="http://www.expasy.org/cgi-bin/medline_local.pl?3838366">ExPASy</a>,<a
     href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-view+MedlineFull+%5bmedline-PMID:3838366%5d">EBI</a>,<a
     href="http://bip.weizmann.ac.il/cgi-bin/getpm?3838366">Israel</a>,<a
href="http://www.genome.ad.jp/dbget-bin/www_bget?pubmed+3838366">Japan</a>]

<a

     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Jacobs%20K.">Jacobs K.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Shoemaker%20C.">Shoemaker
                     C.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Rudersdorf%20R.">Rudersdorf
                     R.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Neill%20S.D.">Neill S.D.</a>,
                     <a
     href="javascript:showHide('divA28315',%20'imgA28315',%20'moreTextA28315');"><img
     border=0 width=12 height=13 id=imgA28315
SRC="PPLRE_7_ResourceDescriptions_files/image001.gif" align=absBottom></a>

<a

     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Kaufman%20R.J.">Kaufman
                     R.J.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Mufson%20A.">Mufson A.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Seehra%20J.">Seehra J.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Jones%20S.S.">Jones S.S.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Hewick%20R.">Hewick R.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Fritsch%20E.F.">Fritsch
                     E.F.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Kawakita%20M.">Kawakita
                     M.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Shimizu%20T.">Shimizu T.</a>, <a
href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Miyake%20T.">Miyake T.</a>;

"Isolation and characterization of genomic and cDNA clones of human erythropoietin.";

Nature 313:806-810(1985).

[2]

NUCLEOTIDE SEQUENCE [GENOMIC DNA].

MEDLINE=86067948; PubMed=3865178;[<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=3865178&dopt=Abstract">NCBI</a>,<a
     href="http://www.expasy.org/cgi-bin/medline_local.pl?3865178">ExPASy</a>,<a
     href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-view+MedlineFull+%5bmedline-PMID:3865178%5d">EBI</a>,<a
     href="http://bip.weizmann.ac.il/cgi-bin/getpm?3865178">Israel</a>,<a
href="http://www.genome.ad.jp/dbget-bin/www_bget?pubmed+3865178">Japan</a>]

<a

     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Lin%20F.-K.">Lin F.-K.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Suggs%20S.">Suggs S.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Lin%20C.-H.">Lin C.-H.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Browne%20J.K.">Browne
                     J.K.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Smalling%20R.">Smalling
                     R.</a>, <a
     href="javascript:showHide('divA28316',%20'imgA28316',%20'moreTextA28316');"><img
     border=0 width=12 height=13 id=imgA28316
SRC="PPLRE_7_ResourceDescriptions_files/image001.gif" align=absBottom></a>

<a

     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Egrie%20J.C.">Egrie J.C.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Chen%20K.K.">Chen K.K.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Fox%20G.M.">Fox G.M.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Martin%20F.">Martin F.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Stabinsky%20Z.">Stabinsky
                     Z.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Badrawi%20S.M.">Badrawi
                     S.M.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Lai%20P.-H.">Lai P.-H.</a>, <a
     href="http://www.ebi.uniprot.org/entry/search.do?library=ST&linetype=citation&linetypedetail=all&linetypesubdetail=author&queryoperator=contains&querytext=Goldwasser%20E.">Goldwasser
E.</a>;

"Cloning and expression of the human erythropoietin gene.";

<a

     href="http://www.pnas.org/cgi/lookup?vol=82&fp=7580&view=abstract">Proc.
Natl. Acad. Sci. U.S.A. 82:7580-7584(1985).</a>

<a href="http://www.expasy.org/sprot/userman.html#CC_line">Comments</a>

<a href="http://www.expasy.org/sprot/userman.html#CC_line">FUNCTION</a>

Erythropoietin is the principal hormone involved in the regulation

               of erythrocyte different<a name=commentsP01588></a><a name="cit3EPO_HUMAN"></a>iation
               and the maintenance of a physiological level of circulating erythrocyte
mass.

<a href="http://www.expasy.org/sprot/userman.html#CC_line">SUBCELLULAR LOCATION</a>

Secreted protein.

<a href="http://www.expasy.org/sprot/userman.html#CC_line">TISSUE SPECIFICITY</a>

Produced by kidney or liver of adult mammals and by liver of fetal or neonatal mammals.

<a href="http://www.expasy.org/sprot/userman.html#CC_line">PHARMACEUTICAL</a>

Used for the treatment of anemia. Available under the names

               Epogen (Amgen), Epogin (Chugai), Epomax (Elanex), Eprex (Janssen-Cilag),
               NeoRecormon or Recormon (Roche), and Procrit (Ortho Biotech). Variations in
         the glycosylation pattern of EPO distinguishes these products. Epogen,
               Epogin, Eprex and Procrit are generically known as epoetin alfa,
NeoRecormon and Recormon as epoetin beta and Epomax as epoetin omega.

<a href="http://www.expasy.org/sprot/userman.html#CC_line">SIMILARITY</a>

Belongs to the EPO/TPO family.

<a href="http://www.expasy.org/sprot/userman.html#CCDB">DATABASE</a>

NAME

R&D Systems' cytokine source book: EPO

WWW

"<a href="http://www.rndsystems.com/asp/g_sitebuilder.asp?bodyId=197">http://www.rndsystems.com/asp/g_sitebuilder.asp?bodyId=197</a>"

<a href="http://www.expasy.org/sprot/userman.html#DR_line">Cross-references</a>

<a href="http://www.expasy.org/cgi-bin/dbxref?EMBL">EMBL</a>

X02158; CAA26095.1; -; Genomic_DNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?X02158">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=X02158&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?X02158">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:CAA26095*%5d">CoDingSequence</a>]

X02157; CAA26094.1; -; mRNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?X02157">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=X02157&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?X02157">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:CAA26094*%5d">CoDingSequence</a>]

M11319; AAA52400.1; -; Genomic_DNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?M11319">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=M11319&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?M11319">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAA52400*%5d">CoDingSequence</a>]

AF053356; AAC78791.1; -; Genomic_DNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF053356">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF053356&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF053356">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAC78791*%5d">CoDingSequence</a>]

AF202308; AAF23132.1; -; Genomic_DNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202308">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202308&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202308">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23132*%5d">CoDingSequence</a>]

AF202306; AAF23132.1; JOINED; Genomic_DNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202306">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202306&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202306">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23132*%5d">CoDingSequence</a>]

AF202307; AAF23132.1; JOINED; Genomic_DNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202307">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202307&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202307">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23132*%5d">CoDingSequence</a>]

AF202310; AAF23133.1; -; Genomic_DNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202310">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202310&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202310">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23133*%5d">CoDingSequence</a>]

AF202309; AAF23133.1; JOINED; Genomic_DNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202309">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202309&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202309">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23133*%5d">CoDingSequence</a>]

AF202311; AAF17572.1; -; Genomic_DNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202311">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202311&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202311">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF17572*%5d">CoDingSequence</a>]

AF202314; AAF23134.1; -; Genomic_DNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202314">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202314&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202314">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23134*%5d">CoDingSequence</a>]

AF202312; AAF23134.1; JOINED; Genomic_DNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202312">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202312&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202312">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23134*%5d">CoDingSequence</a>]

AF202313; AAF23134.1; JOINED; Genomic_DNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF202313">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AF202313&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF202313">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAF23134*%5d">CoDingSequence</a>]

AC009488; AAP22357.1; -; Genomic_DNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?AC009488">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AC009488&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AC009488">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAP22357*%5d">CoDingSequence</a>]

BC093628; AAH93628.1; -; mRNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?BC093628">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=BC093628&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?BC093628">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAH93628*%5d">CoDingSequence</a>]

S65458; AAD13964.1; -; mRNA.

[<a href="http://www.ebi.ac.uk/cgi-bin/expasyfetch?S65458">EMBL</a>/<a

     href="http://www.ncbi.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=S65458&doptcmdl=GenBank">GenBank</a>/<a
     href="http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?S65458">DDBJ</a>][<a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:AAD13964*%5d">CoDingSequence</a>]

<a href="http://www.expasy.org/cgi-bin/dbxref?PIR">PIR</a>

<a href="http://pir.georgetown.edu/cgi-bin/nbrfget?uid=A01855&xref=1">A01855</a>; ZUHU.

<a href="http://www.expasy.org/cgi-bin/dbxref?PDB">PDB</a>

1BUY; NMR; A=28-193.

[<a href="http://www.expasy.org/cgi-bin/get-pdb.pl?1BUY">ExPASy</a>/<a

     href="http://www.rcsb.org/pdb/cgi/explore.cgi?pdbId=1BUY">RCSB</a>/<a
href="http://www.ebi.ac.uk/msd-srv/atlas?id=1BUY">EBI</a>]

1CN4; X-ray; C=28-193.

[<a href="http://www.expasy.org/cgi-bin/get-pdb.pl?1CN4">ExPASy</a>/<a

     href="http://www.rcsb.org/pdb/cgi/explore.cgi?pdbId=1CN4">RCSB</a>/<a
href="http://www.ebi.ac.uk/msd-srv/atlas?id=1CN4">EBI</a>]

1EER; X-ray; A=28-193.

[<a href="http://www.expasy.org/cgi-bin/get-pdb.pl?1EER">ExPASy</a>/<a

     href="http://www.rcsb.org/pdb/cgi/explore.cgi?pdbId=1EER">RCSB</a>/<a
href="http://www.ebi.ac.uk/msd-srv/atlas?id=1EER">EBI</a>]

<a href="http://www.expasy.org/cgi-bin/dbxref?GlycoSuiteDB">GlycoSuiteDB</a>

P01588; -.

<a href="http://www.expasy.org/cgi-bin/dbxref?Ensembl">Ensembl</a>

ENSG00000130427; Homo sapiens.

[<a

     href="http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000130427">Entry</a>/<a
href="http://www.ensembl.org/Homo_sapiens/contigview?gene=ENSG00000130427">Contig</a>]

<a href="http://www.expasy.org/cgi-bin/dbxref?HGNC">HGNC</a>

<a href="http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/get_data.pl?hgnc_id=3415">HGNC:3415</a>; EPO.

<a href="http://www.expasy.org/cgi-bin/dbxref?MIM">MIM</a>

133170; gene.

[<a

     href="http://www.ncbi.nih.gov/entrez/dispomim.cgi?id=133170">MIM</a>/<a
href="http://srs6.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+%5bomim-id:133170%5d">EBI</a>]

<a href="http://www.expasy.org/cgi-bin/dbxref?GO">GO</a>

Cellular component

extracellular space

<a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0005615">GO:0005615</a>

traceable author statement

Biological process

circulation

<a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0008015">GO:0008015</a>

non-traceable author statement

Biological process

response to stress

<a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0006950">GO:0006950</a>

traceable author statement

Biological process

signal transduction

<a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007165">GO:0007165</a>

non-traceable author statement

[<a href="http://www.ebi.ac.uk/ego/GSearch?mode=protein&ontology=all_ont&query=P01588">QuickGO</a>]

<a href="http://www.expasy.org/cgi-bin/dbxref?InterPro">InterPro</a>

<a href="http://www.ebi.ac.uk/interpro/IEntry?ac=IPR009079">IPR009079</a>; 4_helix_cytokine.

<a href="http://www.ebi.ac.uk/interpro/IEntry?ac=IPR012351">IPR012351</a>; Cytokine_4_hlx.

<a href="http://www.ebi.ac.uk/interpro/IEntry?ac=IPR001323">IPR001323</a>; EPO_TPO.

<a href="http://www.ebi.ac.uk/interpro/IEntry?ac=IPR003013">IPR003013</a>; Erythroptn.

<a

     href="http://www.ebi.ac.uk/interpro/ISpy?mode=single&ac=P01588">Graphical
view of the domain structure</a>

<a href="http://www.expasy.org/cgi-bin/dbxref?PANTHER">PANTHER</a>

<a href="http://www.pantherdb.org/panther/family.do?clsAccession=PTHR10370">PTHR10370</a>; Erythroptn; 1.

<a href="http://www.expasy.org/cgi-bin/dbxref?Pfam">Pfam</a>

<a href="http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00758">PF00758</a>; EPO_TPO; 1.

<a

     href="http://www.sanger.ac.uk/cgi-bin/Pfam/swisspfamget.pl?name=P01588">Pfam
graphical view of domain structure</a>

<a href="http://www.expasy.org/cgi-bin/dbxref?PIRSF">PIRSF</a>

<a href="http://pir.georgetown.edu/cgi-bin/ipcSF?id=PIRSF001951">PIRSF001951</a>; EPO; 1.

<a href="http://www.expasy.org/cgi-bin/dbxref?PRINTS">PRINTS</a>

<a href="http://www.bioinf.man.ac.uk/cgi-bin/dbbrowser/sprint/searchprintss.cgi?display_opts=Prints&category=None&queryform=false&prints_accn=PR00272">PR00272</a>; ERYTHROPTN.

<a href="http://www.expasy.org/cgi-bin/dbxref?PROSITE">PROSITE</a>

<a href="http://www.expasy.org/cgi-bin/nicedoc.pl?PS00817">PS00817</a>; EPO_TPO; 1.

<a href="http://www.expasy.org/sprot/userman.html#KW_line">Keywords</a>

<a href="http://www.expasy.org/cgi-bin/get-entries?KW=3D-structure">3D-structure</a>

<a

   href="http://www.expasy.org/cgi-bin/get-entries?KW=Direct%20protein%20sequencing">Direct
protein sequencing</a>

<a

   href="http://www.expasy.org/cgi-bin/get-entries?KW=Erythrocyte%20maturation">Erythrocyte
maturation</a>

<a href="http://www.expasy.org/cgi-bin/get-entries?KW=Glycoprotein">Glycoprotein</a>

<a href="http://www.expasy.org/cgi-bin/get-entries?KW=Hormone">Hormone</a>

<a href="http://www.expasy.org/cgi-bin/get-entries?KW=Pharmaceutical">Pharmaceutical</a>

<a href="http://www.expasy.org/cgi-bin/get-entries?KW=Polymorphism">Polymorphism</a>

<a href="http://www.expasy.org/cgi-bin/get-entries?KW=Signal">Signal</a>

<a href="http://www.expasy.org/sprot/userman.html#FT_line">Features</a>

Type

<a href="http://www.expasy.org/sprot/userman.html#FT_position">FromTo</a>

Length

Description

<a href="http://www.expasy.org/sprot/userman.html#FTID">Feature ID</a>

<a href="http://www.expasy.org/sprot/userman.html#FT_SIGNAL">SIGNAL</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@SIGNAL@1@27">127</a>

27

<a href="http://www.expasy.org/sprot/userman.html#FT_CHAIN">CHAIN</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@CHAIN@28@193">28193</a>

166

Erythropoietin

PRO_0000008401

<a href="http://www.expasy.org/sprot/userman.html#FT_PROPEP">PROPEP</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@PROPEP@190@193">190193</a>

4

Removed in mature form (Probable)

PRO_000000840<a name="ftidEPO_HUMANPRO_0000008401"></a>2

<a href="http://www.expasy.org/sprot/userman.html#FT_CARBOHYD">CARBOHYD</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@CARBOHYD@51@51">5151</a>

N-linked (GlcNAc...)

CAR_000052

<a href="http://www.expasy.org/sprot/userman.html#FT_CARBOHYD">CARBOHYD</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@CARBOHYD@65@65">6565</a>

N-linked (GlcNAc...)

CAR_000166

<a href="http://www.expasy.org/sprot/userman.html#FT_CARBOHYD">CARBOHYD</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@CARBOHYD@110@110">110110</a>

N-linked (GlcNAc...)

CAR_000192

<a href="http://www.expasy.org/sprot/userman.html#FT_CARBOHYD">CARBOHYD</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@CARBOHYD@153@153">153153</a>

O-linked (GalNAc...).

<a href="http://www.expasy.org/sprot/userman.html#FT_DISULFID">DISULFID</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@DISULFID@34@188">34188</a>

<a href="http://www.expasy.org/sprot/userman.html#FT_DISULFID">DISULFID</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@DISULFID@56@60">5660</a>

<a href="http://www.expasy.org/sprot/userman.html#FT_VARIANT">VARIANT</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@VARIANT@131@132">131132</a>

SL -> NF (in an hepatocellular carcinoma)

VAR_009870

<a href="http://www.expasy.org/sprot/userman.html#FT_VARIANT">VARIANT</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@VARIANT@149@149">149149</a>

P -> Q (in an hepatocellular carcinoma)

<a href="http://www.expasy.org/cgi-bin/get-sprot-variant.pl?VAR_009871">VAR_009871</a>

<a href="http://www.expasy.org/sprot/userman.html#FT_CONFLICT">CONFLICT</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@CONFLICT@40@40">4040</a>

E -> Q (in Ref. <a

   href="http://www.ebi.uniprot.org/entry/EPO_HUMAN#cit1EPO_HUMAN">1</a>; <a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:CAA26095*%5d">CAA26095</a>).

<a href="http://www.expasy.org/sprot/userman.html#FT_CONFLICT">CONFLICT</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@CONFLICT@85@85">8585</a>

Q -> QQ (in Ref. <a href="http://www.ebi.uniprot.org/entry/EPO_HUMAN#cit7EPO_HUMAN">7</a>).

<a href="http://www.expasy.org/sprot/userman.html#FT_CONFLICT">CONFLICT</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@CONFLICT@140@140">140140</a>

G -> R

               (in Ref. <a href="http://www.ebi.uniprot.org/entry/EPO_HUMAN#cit1EPO_HUMAN">1</a>; <a
href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+%5b%7bEMBL%7d-ProteinID:CAA26095*%5d">CAA26095</a>).

<a href="http://www.expasy.org/sprot/userman.html#FT_STRAND">STRAND</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@STRAND@31@31">3131</a>

1

<a href="http://www.expasy.org/sprot/userman.html#FT_HELIX">HELIX</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@HELIX@32@34">3234</a>

3

<a href="http://www.expasy.org/sprot/userman.html#FT_HELIX">HELIX</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@HELIX@141@147">141147</a>

7

<a href="http://www.expasy.org/sprot/userman.html#FT_TURN">TURN</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@TURN@148@149">148149</a>

2

<a href="http://www.expasy.org/sprot/userman.html#FT_STRAND">STRAND</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@STRAND@151@151">151151</a>

1

<a href="http://www.expasy.org/sprot/userman.html#FT_STRAND">STRAND</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@STRAND@160@164">160164</a>

5

<a href="http://www.expasy.org/sprot/userman.html#FT_HELIX">HELIX</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@HELIX@165@177">165177</a>

13

<a href="http://www.expasy.org/sprot/userman.html#FT_TURN">TURN</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@TURN@178@178">178178</a>

1

<a href="http://www.expasy.org/sprot/userman.html#FT_HELIX">HELIX</a>

<a href="http://www.expasy.org/cgi-bin/sprot-ft-details.pl?P01588@HELIX@179@188">179188</a>

10

<a href="http://www.expasy.org/sprot/userman.html#SQ_line">Sequence information</a>

<a href="http://www.expasy.org/sprot/userman.html#SQ_line">Length</a>

193 AA

<a href="http://www.expasy.org/sprot/userman.html#SQ_line">Molecular weight</a><a name=seqP01588></a>

21307 Da

<a href="http://www.expasy.org/sprot/userman.html#SQ_line">CRC64</a>

C91F0E4C26A52033 [This is a checksum on the sequence]

MGVHECPAWL WLLLSLLSLP LGLPVLGAPP RLICDSRVLE RYLLEAKEAE 50 <BR> 
     <BR> 
    
NITTGCAEHC SLNENITVPD TKVNFYAWKR MEVGQQAVEV WQGLALLSEA 100 <BR> 
     <BR> 
    
VLRGQALLVN SSQPWEPLQL HVDKAVSGLR SLTTLLRALG AQKEAISPPD 150 <BR> 
     <BR> 
    
AASAAPLRTI TADTFRKLFR VYSNFLRGKL KLYTGEACRT GDR193

PDF file conversion

An hour of experimentation of a few tools led to the selection of Xpdf

Protein Information Resource (PIR) Center

TextMining on Bio

KDDCup08 Proposal

* KDDCup08 Proposal