Training a Japanese Wikipedia Word2Vec Model by Gensim and Mecab

After “Training a Chinese Wikipedia Word2Vec Model by Gensim and Jieba“, we continue “Training a Japanese Wikipedia Word2Vec Model by Gensim and Mecab” with “Wikipedia_Word2vec” related scripts. Still, download the latest Japanese Wikipedia dump data first: https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-pages-articles.xml.bz2. You can use … Continue reading →

Training a Chinese Wikipedia Word2Vec Model by Gensim and Jieba

We have posted two methods for training a word2vec model based on English wikipedia data: “Training Word2Vec Model on English Wikipedia by Gensim” and “Exploiting Wikipedia Word Similarity by Word2Vec“. Based on the pipeline and related scripts: Wikipedia_Word2vec,we can train … Continue reading →

Exploiting Wikipedia Word Similarity by Word2Vec

We have written “Training Word2Vec Model on English Wikipedia by Gensim” before, and got a lot of attention. Recently, I have reviewed Word2Vec related materials again and test a new method to process the English wikipedia data and train Word2Vec … Continue reading →

Adding spaCy Demo and API into TextAnalysisOnline

I have added spaCy demo and api into TextAnalysisOnline, you can test spaCy by our scaCy demo and use spaCy in other languages such as Java/JVM/Android, Node.js, PHP, Objective-C/i-OS, Ruby, .Net and etc by Mashape api platform. Here is the … Continue reading →

Getting Started with spaCy

Update: Almost since one year after writing this article, spaCy now has been upgraded to version 1.2, and new data and new features are added in it. I fix some problems in this article for spacy install and test, especially … Continue reading →

Deep Learning for Text Mining from Scratch

Here is a list of courses or materials for you to learn deep learning for text mining from scratch。 Mathematics Everything start from mathematics. 1. Pre-Calculus About the Course Through this course, students will acquire a solid foundation in algebra … Continue reading →