Research Glossary

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Back to HomePage

• THIS PAGE IS OUT-OF-DATE. ONE DAY IT WILL BE AUTOMATICALLY GENERATED

O

• Ontology: An Ontology is a formal representation of knowledge within a specific domain.

.

.

.

.

.

.

S

.

.

.

.

• Statistically Independent: Two Events are Statistically Independent if the probability that they both occur simultaneously is equal to the product of the probability that each occurs individually.
• AKA: Stochastically Independent
• Context:
• Sometimes statistical independence is represented as P(A ^ B) and its property as P(A)·P(B).
• Alternatively, two events are independent if the discovery that one of the event has occurred does not help you determine whether the other event has also occurred: i.e., P(A|B) = P(A).
• See: Probability, Conditional Probability, Independent Variable.
• Structured Data:.
• Subsumption: A subsumption relation specifies the relative generality of two concepts. More fomally, concept [math]A[/math] subsumes concept [math]B[/math] if the definitions of [math]A[/math] and [math]B[/math] logically imply that members of [math]B[/math] must also be members of A.
• Synonym: A word [math]x[/math] is a Synonym of another word [math]y[/math] if they are both similar enough in meaning that they can be interchanged in some situations without loss of meaning.

.

T

• Table: In relational databases, a structure (table) that contains a set of records (tuples).
• See: Predicate, Relation.

.

.

• Taxonomy: A hierachical classification of concepts typically for a specific domain. The primary semantic organizing principle of taxonomies is class inclusion (is a or subsumption relationships). Examples of taxonomises include the tree of life, and library catalogues.

.

• Template: AKA: forms or 'frames') The common tabular-like structure that is filled in information extraction tasks. The elements of templates are often referred to as slots. Occassionally information extraction is referred to as a "template filling" or "slot filling" exercise.

.

• Text Mining: The automated discovery of interesting patterns from human-readable sources. Typical sources include the Web, email, corporate databases with text information, and publication databases such as Citeseer and MEDLINE. Text mining is sometimes referred to as data mining on unstructured text data.
• TF-IDF: TF-IDF is a function that estimates how well a term describes a document.
• Top-Down Learning: Refers to the technique of starting from a general rule and to proceed by specializing it.

.

• TREC: Text Retrieval Conference. A conference sponsored by NIST with tracks in Information Extraction, Question Answering, and other NLP tasks.

.

U

• Unstructured Data: Unstructured Data is Data that is not in a format that is amenable to computer processing.
• Context:
• Is useable by humans.
• It can be a Document
• It can be an audio file.
• It can be a video file.
• Example(s):
• Wikipedia
• A television news program
• An internet news story
• A telephone conversation
• See: Data, Structured Data

W

.

• WebKB: A project started in the late-90s by Tom M. Mitchell to develop a knowledge base that mirrors the content of the Web.
• See: [Craven et al. 1998]

.

.

• Word Play: Word Play is the intentional use of words that connote multiple meanings.
• Example(s): Just for the pun of it. The title of the play "The Importance of Being Earnest".
• See: Word Sense Disambiguation, Polysemous, Homonym

.

.

X

• XML: XML is a standard that facilitates the exchange of structured data.