2017 HCCLatSemEval2017Task2Combining

From GM-RKB
Jump to navigation Jump to search

Subject Headings: HCCL; SemEval-2017; SemEval-2017 Task 2; Semantic Word Similarity System; Semantic Word Similarity Benchmark Task; Multilingual And Cross-Lingual Semantic Word Similarity System; Machine Translation.

Notes

Cited By

Quotes

Abstract

In this paper, we introduce an approach to combining word embeddings and machine translation for multilingual semantic word similarity, the task2 of SemEval-2017. Thanks to the unsupervised transliteration model, our cross-lingual word embeddings encounter decreased sums of OOVs. Our results are produced using only monolingual Wikipedia corpora and a limited amount of sentence-aligned data. Although relatively little resources are utilized, our system ranked 3rd in the monolingual subtask and can be the 6th in the cross-lingual subtask.

1. Introduction

...

In this task, we adopt different strategies for the two subtasks. We use word2vec for subtask1, monolingual word similarity. For the subtask2, cross-lingual word similarity, we use jointly optimized cross-lingual word representation in addition to transliteration model. We build a crosslingual word embedding system and a special machine translation system. Our approach has the following characteristics:

We constructed a naive system and did not try out the parameters for embeddings and translation models in limited time.

2. Our Approach

We use skip-gram word embeddings directly for monolingual subtask. For cross-lingual subtask, we use English as pivot language and train multilingual word embeddings using monolingual corpora and sentence-aligned parallel data. A translation model is also trained by our statistical machine translation system. Subsequently, we translate the words in the test set into English and look up their word embeddings. For those out of English word embeddings, we check them from original language word embeddings.

...

3. Experiments

4. Results

5. Conclusion

Acknowledgments

Footnotes


References

BibTeX

@inproceedings{2017_HCCLatSemEval2017Task2Combining,
  author    = {Junqing He and
               Long Wu and
               Xuemin Zhao and
               Yonghong Yan},
  editor    = {Steven Bethard and
               Marine Carpuat and
               Marianna Apidianaki and
               Saif M. Mohammad and
               Daniel M. Cer and
               David Jurgens},
  title     = {HCCL at SemEval-2017 Task 2: Combining Multilingual Word Embeddings
               and Transliteration Model for Semantic Similarity},
  booktitle = {Proceedings of the 11th International Workshop on Semantic Evaluation
               (SemEval@ACL 2017)},
  pages     = {220--225},
  publisher = {Association for Computational Linguistics},
  year      = {2017},
  url       = {https://doi.org/10.18653/v1/S17-2033},
  doi       = {10.18653/v1/S17-2033},
}


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2017 HCCLatSemEval2017Task2CombiningJunqing He
Long Wu
Xuemin Zhao
Yonghong Yan
HCCL at SemEval-2017 Task 2: Combining Multilingual Word Embeddings and Transliteration Model for Semantic Similarity2017