Snowball Internal Parameters

From GM-RKB
Jump to: navigation, search

See: Snowball ---

  • initialMaxDocuments
    • Value=30
  • secondaryMaxDocuments
    • Value=30
  • tertiaryMaxDocuments
    • Value=5000

.

  • leftWindowSize
    • The number of terms on the left side of the pattern
    • Default=2
  • rightWindowSize
    • The number of terms on the right side of the pattern
    • Default=2

.

  • stopwords
  • Values=growth,>, <, /,PROTEIN,ORGANISM,PERSON,ENAMEX,ORGANIZATION,LOCATION,LOCALIZATION,DOCID,plainText,TEXT,</timex>,<timex,</TIMEX>,</enamex>,<text>,</text>,</docid>,<docid>, </TEXT>, <TEXT>, <NUMEX, </NUMEX>, <TIMEX, a,ability,about, above, accordingly, after, again, ...

.

  • proximityWindowSize
    • If there are more than proximityWindow characters between the start and end tags, it is considered an invalid relationship and is ignored.
    • Default=500
  • leftTermWeight
    • Weight for the left terms
    • Default=0.2
  • middleTermWeight
    • Default=0.6
    • Weight for the middle terms
  • rightTermWeight
    • Default=0.2
    • Weight for the right terms

.

  • PatternExtractor.proximityWindowSize=500

.

  • Clusterer.minSupport=1
    • The minimum number of tuples to support a pattern
  • Clusterer.minSimilarity
    • The minimum similarity metric in order for two patterns to be considered in the same cluster. If no cluster is within this number then a new cluster is created.
    • Default=0.6
  • Clusterer.minTermSupport
    • The minimum term support in a clustered pattern
    • Default=2

.

  • TupleExtractor.minSimilarity
    • The minimum similarity necessary between a pattern and a tuple.
    • Default=0.05
  • TupleExtractor.minSupport
    • The minimum number of patterns needed to support a tuple.
    • Default=1
  • TupleExtractor.proximityWindowSize
    • Default=500

.

  • Reweighter.patternWeight
    • Default=0.5
  • Reweighter.minPatternRank
    • The minimum rank for a pattern to be kept in the system
    • Default=0.2
  • Reweighter.tupleWeight
    • The weight that should be applied to the previous iteration of the tuple rank.
    • Default=0.5
  • Reweighter.minTupleRank
    • The minimum rank for a tuple to be kept in the system
    • Default=0.06
  • Reweighter.minSeedConf
    • Default=0.3
  • Reweighter.numTopSeed
    • Default=200