Word Similarity: A Website Interface for 89 Languages Word2Vec Models

I have launched WordSimilarity on April, which focused on computing the word similarity between two words by word2vec model based on the Wikipedia data. The website has the English Word2Vec Model for English Word Similarity: Exploiting Wikipedia Word Similarity by … Continue reading →

Update Korean, Russian, French, German, Spanish Wikipedia Word2Vec Model for Word Similarity

I have launched WordSimilarity on April, which focused on computing the word similarity between two words by word2vec model based on the Wikipedia data. The website has the English Word2Vec Model for English Word Similarity: Exploiting Wikipedia Word Similarity by … Continue reading →

Training a Japanese Wikipedia Word2Vec Model by Gensim and Mecab

After “Training a Chinese Wikipedia Word2Vec Model by Gensim and Jieba“, we continue “Training a Japanese Wikipedia Word2Vec Model by Gensim and Mecab” with “Wikipedia_Word2vec” related scripts. Still, download the latest Japanese Wikipedia dump data first: https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-pages-articles.xml.bz2. You can use … Continue reading →

Training a Chinese Wikipedia Word2Vec Model by Gensim and Jieba

We have posted two methods for training a word2vec model based on English wikipedia data: “Training Word2Vec Model on English Wikipedia by Gensim” and “Exploiting Wikipedia Word Similarity by Word2Vec“. Based on the pipeline and related scripts: Wikipedia_Word2vec,we can train … Continue reading →

Exploiting Wikipedia Word Similarity by Word2Vec

We have written “Training Word2Vec Model on English Wikipedia by Gensim” before, and got a lot of attention. Recently, I have reviewed Word2Vec related materials again and test a new method to process the English wikipedia data and train Word2Vec … Continue reading →