Bilingual Embeddings with Random Walks over Multilingual Wordnets

Goikoetxea, J.; Soroa, A.; Agirre, E.

doi:10.1016/j.knosys.2018.03.017

Computer Science > Computation and Language

arXiv:1804.08316 (cs)

[Submitted on 23 Apr 2018]

Title:Bilingual Embeddings with Random Walks over Multilingual Wordnets

Authors:J.Goikoetxea, A.Soroa, E.Agirre

View PDF

Abstract:Bilingual word embeddings represent words of two languages in the same space, and allow to transfer knowledge from one language to the other without machine translation. The main approach is to train monolingual embeddings first and then map them using bilingual dictionaries. In this work, we present a novel method to learn bilingual embeddings based on multilingual knowledge bases (KB) such as WordNet. Our method extracts bilingual information from multilingual wordnets via random walks and learns a joint embedding space in one go. We further reinforce cross-lingual equivalence adding bilingual con- straints in the loss function of the popular skipgram model. Our experiments involve twelve cross-lingual word similarity and relatedness datasets in six lan- guage pairs covering four languages, and show that: 1) random walks over mul- tilingual wordnets improve results over just using dictionaries; 2) multilingual wordnets on their own improve over text-based systems in similarity datasets; 3) the good results are consistent for large wordnets (e.g. English, Spanish), smaller wordnets (e.g. Basque) or loosely aligned wordnets (e.g. Italian); 4) the combination of wordnets and text yields the best results, above mapping-based approaches. Our method can be applied to richer KBs like DBpedia or Babel- Net, and can be easily extended to multilingual embeddings. All software and resources are open source.

Comments:	Preprint version, Knowledge-Based Systems (ISSN: 0950-7051). (2018)
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:1804.08316 [cs.CL]
	(or arXiv:1804.08316v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1804.08316
Related DOI:	https://doi.org/10.1016/j.knosys.2018.03.017

Submission history

From: Josu Goikoetxea [view email]
[v1] Mon, 23 Apr 2018 10:02:29 UTC (55 KB)

Computer Science > Computation and Language

Title:Bilingual Embeddings with Random Walks over Multilingual Wordnets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Bilingual Embeddings with Random Walks over Multilingual Wordnets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators