Sentence Embedding Space

A Sentence Embedding Space is a text-item embedding space composed of sentence embedding items (for sentence items) created by a sentence embedding encoder.

Context:
- It can (typically) be associated with a Sentence Encoding System (that uses a sentence embedding model).
- …
Example(s):
- One associated with an OpenAI.embeddings("text-embedding-3-small").
- One associated with an S-BERT Model (SBERT).
- A Domain-Specific Sentence Embedding Space, such as:
  - a Contract Sentence Embedding Space.
- ...
Counter-Example(s):
See: Lexical Co-Occurrence Matrix, Distributional Word Vector, Vector Space Model.

References

2024

(Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Sentence_embedding Retrieved:2024-2-10.
- In natural language processing, a sentence embedding refers to a numeric representation of a sentence in the form of a vector of real numbers which encodes meaningful semantic information. ^[1] ^[2] ^[3] State of the art embeddings are based on the learned hidden layer representation of dedicated sentence transformer models. BERT pioneered an approach involving the use of a dedicated [CLS] token prepended to the beginning of each sentence inputted into the model; the final hidden state vector of this token encodes information about the sentence and can be fine-tuned for use in sentence classification tasks. In practice however, BERT's sentence embedding with the [CLS] token achieves poor performance, often worse than simply averaging non-contextual word embeddings. SBERT later achieved superior sentence embedding performance by fine tuning BERT's [CLS] token embeddings through the usage of a siamese neural network architecture on the SNLI dataset. Other approaches are loosely based on the idea of distributional semantics applied to sentences. Skip-Thought trains an encoder-decoder structure for the task of neighboring sentences predictions. Though this has been shown to achieve worse performance than approaches such as InferSent or SBERT. An alternative direction is to aggregate word embeddings, such as those returned by Word2vec, into sentence embeddings. The most straightforward approach is to simply compute the average of word vectors, known as continuous bag-of-words (CBOW). However, more elaborate solutions based on word vector quantization have also been proposed. One such approach is the vector of locally aggregated word embeddings (VLAWE), which demonstrated performance improvements in downstream text classification tasks.

2018

(Wolf, 2018b) ⇒ Thomas Wolf. (2018). “The Current Best of Universal Word Embeddings and Sentence Embeddings." Blog post
- QUOTE: Word and sentence embeddings have become an essential part of any Deep-Learning-based natural language processing systems. They encode words and sentences 📜 in fixed-length dense vectors 📐 to drastically improve the processing of textual data. A huge trend is the quest for Universal Embeddings: embeddings that are pre-trained on a large corpus and can be plugged in a variety of downstream task models (sentimental analysis, classification, translation…) to automatically improve their performance by incorporating some general word/sentence representations learned on the larger dataset. It’s a form of transfer learning. Transfer learning has been recently shown to drastically increase the performance of NLP models on important tasks such as text classification. …
  … There are currently many competing schemes for learning sentence embeddings. While simple baselines like averaging word embeddings consistently give strong results, a few novel unsupervised and supervised approaches, as well as multi-task learning schemes, have emerged in late 2017-early 2018 and lead to interesting improvements. Let’s go quickly through the four types of approaches currently studied: from simple word vector averaging baselines to unsupervised/supervised approaches and multi-task learning schemes (as illustrated above). There is a general consensus in the field that the simple approach of directly averaging a sentence’s word vectors (so-called Bag-of-Word approach) gives a strong baseline for many downstream tasks. A good algorithm for computing such a baseline is detailed in the work of Arora et al. published last year at ICLR, A Simple but Tough-to-Beat Baseline for Sentence Embeddings: use a popular word embeddings of your choice, encode a sentence in a linear weighted combination the word vectors and perform a common component removal (remove the projection of the vectors on their first principal component). This general method has deeper and powerful theoretical motivations that rely on a generative model which uses a random walk on a discourse vector to generate text

2017

(Nikhil, 2017) ⇒ Nishant Nikhil. (2017). “Sentence Embedding."
- QUOTE: … One way to get a representation of sentences is to add all the representation of word vectors contained in it, it is termed as words centroid. And similarity between two sentences can be computed by centroid distance. Same thing can be extended to paragraphs and documents. But this method neglects a lot of information like the sequence and it might give false results. Like:
  - You are going there to teach not play.
  - You are going there to play not teach.

2015

(Kiros et al., 2015) ⇒ Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. (2015). “Skip-thought Vectors.” In: Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS-2015).
- QUOTE: We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the surrounding sentences of an encoded passage. Sentences that share semantic and syntactic properties are thus mapped to similar vector representations. We next introduce a simple vocabulary expansion method to encode words that were not seen as part of training, allowing us to expand our vocabulary to a million words. After training our model, we extract and evaluate our vectors with linear models on 8 tasks: semantic relatedness, paraphrase detection, image-sentence ranking, question-type classification and 4 benchmark sentiment and subjectivity datasets.

↑ Paper Summary: Evaluation of sentence embeddings in downstream and linguistic probing tasks
↑ The Current Best of Universal Word Embeddings and Sentence Embeddings
↑ Sanjeev Arora, Yingyu Liang, and Tengyu Ma. “A simple but tough-to-beat baseline for sentence embeddings.", 2016; openreview:SyK00v5xx.

[1] Paper Summary: Evaluation of sentence embeddings in downstream and linguistic probing tasks

[2] The Current Best of Universal Word Embeddings and Sentence Embeddings

[3] Sanjeev Arora, Yingyu Liang, and Tengyu Ma. “A simple but tough-to-beat baseline for sentence embeddings.", 2016; openreview:SyK00v5xx.

[1]

[2]

[3]

Sentence Embedding Space

References

2024

2018

2017

2015

Navigation menu

Search