Training a Japanese Wikipedia Word2Vec Model by Gensim and Mecab

After “Training a Chinese Wikipedia Word2Vec Model by Gensim and Jieba“, we continue “Training a Japanese Wikipedia Word2Vec Model by Gensim and Mecab” with “Wikipedia_Word2vec” related scripts. Still, download the latest Japanese Wikipedia dump data first: https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-pages-articles.xml.bz2. You can use … Continue reading →

Training a Chinese Wikipedia Word2Vec Model by Gensim and Jieba

We have posted two methods for training a word2vec model based on English wikipedia data: “Training Word2Vec Model on English Wikipedia by Gensim” and “Exploiting Wikipedia Word Similarity by Word2Vec“. Based on the pipeline and related scripts: Wikipedia_Word2vec,we can train … Continue reading →

Dive Into NLTK, Part XI: From Word2Vec to WordNet

This is the eleventh article in the series “Dive Into NLTK“, here is an index of all the articles in the series that have been published to date: Part I: Getting Started with NLTK Part II: Sentence Tokenize and Word … Continue reading →

Exploiting Wikipedia Word Similarity by Word2Vec

We have written “Training Word2Vec Model on English Wikipedia by Gensim” before, and got a lot of attention. Recently, I have reviewed Word2Vec related materials again and test a new method to process the English wikipedia data and train Word2Vec … Continue reading →

Dive Into NLTK, Part X: Play with Word2Vec Models based on NLTK Corpus

This is the tenth article in the series “Dive Into NLTK“, here is an index of all the articles in the series that have been published to date: Part I: Getting Started with NLTK Part II: Sentence Tokenize and Word … Continue reading →

Dive Into TensorFlow, Part VI: Beyond Deep Learning

This is the sixth article in the series “Dive Into TensorFlow“, here is an index of all the articles in the series that have been published to date: Part I: Getting Started with TensorFlow Part II: Basic Concepts Part III: … Continue reading →

Dive Into TensorFlow, Part V: Deep MNIST

This is the fifth article in the series “Dive Into TensorFlow“, here is an index of all the articles in the series that have been published to date: Part I: Getting Started with TensorFlow Part II: Basic Concepts Part III: … Continue reading →

fastText for Fast Sentiment Analysis

fastText is a Library for fast text representation and classification which recently launched by facebookresearch team. The related papers are “Enriching Word Vectors with Subword Information” and “Bag of Tricks for Efficient Text Classification“. I’m very interested in the text … Continue reading →

Dive Into TensorFlow, Part IV: Hello MNIST

This is the fourth article in the series “Dive Into TensorFlow“, here is an index of all the articles in the series that have been published to date: Part I: Getting Started with TensorFlow Part II: Basic Concepts Part III: … Continue reading →

Dive Into NLTK, Part IX: From Text Classification to Sentiment Analysis

This is the ninth article in the series “Dive Into NLTK“, here is an index of all the articles in the series that have been published to date: Part I: Getting Started with NLTK Part II: Sentence Tokenize and Word … Continue reading →