HCCL Semantic Word Similarity System

Jump to navigation Jump to search

A HCCL Semantic Word Similarity System is a multilingual and cross-lingual semantic word similarity system that is a combination of a word embedding system and a machine translation system.





We use skip-gram word embeddings directly for monolingual subtask. For cross-lingual subtask, we use English as pivot language and train multilingual word embeddings using monolingual corpora and sentence-aligned parallel data. A translation model is also trained by our statistical machine translation system. Subsequently, we translate the words in the test set into English and look up their word embeddings. For those out of English word embeddings, we check them from original language word embeddings.