HomeText SimilarityWord Similarity: A Website Interface for 89 Languages Word2Vec Models
Deep Learning Specialization on Coursera

I have launched WordSimilarity on April, which focused on computing the word similarity between two words by word2vec model based on the Wikipedia data. The website has the English Word2Vec Model for English Word Similarity: Exploiting Wikipedia Word Similarity by Word2Vec, Chinese Word2Vec Model for Chinese Word Similarity:Training a Chinese Wikipedia Word2Vec Model by Gensim and Jieba, Japanese Word2Vec Model for Japanese Word Similarity: Training a Japanese Wikipedia Word2Vec Model by Gensim and Mecab, and other languages word similarity like Korean, Russian, French, German and Spanish: Update Korean, Russian, French, German, Spanish Wikipedia Word2Vec Model for Word Similarity.

Based on the Wikipedia Word2vec script and Wikipedia dumps data, recently, I have trained more than eighty languages word2vec models follow the “List of ISO 639-1 codes” and filter the languae which wikipedia dump data smaller than 10 megabytes.

You can search any words on the WordSimilarity website, or compute any two words similarity by Word Similarity API. Following is some list of the languages word2vec models data which supported by Word Similarity:

1. English Word2Vec Model for Word Similarity
2. Chinese Word2Vec Model for Word Similarity
3. German Word2Vec Model for Word Similarity
4. Russian Word2Vec Model for Word Similarity
5. French Word2Vec Model for Word Similarity
6. Swedish Word2Vec Model for Word Similarity
7. Polish Word2Vec Model for Word Similarity
8. Korean Word2Vec Model for Word Similarity
9. Spanish Word2Vec Model for Word Similarity
10. Ukrainian Word2Vec Model for Word Similarity
11. Italian Word2Vec Model for Word Similarity
12. Dutch Word2Vec Model for Word Similarity
13. Hungarian Word2Vec Model for Word Similarity
14. Japanese Word2Vec Model for Word Similarity
15. Finnish Word2Vec Model for Word Similarity
16. Czech Word2Vec Model for Word Similarity
17. Portuguese Word2Vec Model for Word Similarity
18. Catalan Word2Vec Model for Word Similarity
19. Arabic Word2Vec Model for Word Similarity
20. Norwegian Word2Vec Model for Word Similarity
21. Hebrew Word2Vec Model for Word Similarity
22. Serbian Word2Vec Model for Word Similarity
23. Turkish Word2Vec Model for Word Similarity
24. Thai Word2Vec Model for Word Similarity
25. Romanian Word2Vec Model for Word Similarity
26. Esperanto Word2Vec Model for Word Similarity
27. Croatian Word2Vec Model for Word Similarity
28. Estonian Word2Vec Model for Word Similarity
29. Bulgarian Word2Vec Model for Word Similarity
30. Greek Word2Vec Model for Word Similarity
31. Slovak Word2Vec Model for Word Similarity
32. Indonesian Word2Vec Model for Word Similarity
33. Danish Word2Vec Model for Word Similarity
34. Slovenian Word2Vec Model for Word Similarity
35. Armenian Word2Vec Model for Word Similarity
36. Lithuanian Word2Vec Model for Word Similarity
37. Vietnamese Word2Vec Model for Word Similarity
38. Basque Word2Vec Model for Word Similarity
39. Belarusian Word2Vec Model for Word Similarity
40. Persian Word2Vec Model for Word Similarity
41. Azerbaijani Word2Vec Model for Word Similarity
42. Galician Word2Vec Model for Word Similarity
43. Georgian Word2Vec Model for Word Similarity
44. Kazakh Word2Vec Model for Word Similarity
45. Norwegian Nynorsk Word2Vec Model for Word Similarity
46. Macedonian Word2Vec Model for Word Similarity
47. Latvian Word2Vec Model for Word Similarity
48. Bosnian Word2Vec Model for Word Similarity
49. Latin Word2Vec Model for Word Similarity
50. Albanian Word2Vec Model for Word Similarity

Posted by TextMiner

Deep Learning Specialization on Coursera

Comments

Word Similarity: A Website Interface for 89 Languages Word2Vec Models — 9 Comments

  1. I’m a psychology researcher and want to use your word similarity API to build a similarity matrix for a list of Chinese characters. Is it possible that I can input the whole list (30 characters) and get a pairwise similarity matrix? Thank you.

Leave a Reply

Your email address will not be published.