Update Korean, Russian, French, German, Spanish Wikipedia Word2Vec Model for Word Similarity

I have launched WordSimilarity on April, which focused on computing the word similarity between two words by word2vec model based on the Wikipedia data. The website has the English Word2Vec Model for English Word Similarity: Exploiting Wikipedia Word Similarity by … Continue reading →

Exploiting Wikipedia Word Similarity by Word2Vec

We have written “Training Word2Vec Model on English Wikipedia by Gensim” before, and got a lot of attention. Recently, I have reviewed Word2Vec related materials again and test a new method to process the English wikipedia data and train Word2Vec … Continue reading →

Dive Into NLTK, Part X: Play with Word2Vec Models based on NLTK Corpus

This is the tenth article in the series “Dive Into NLTK“, here is an index of all the articles in the series that have been published to date: Part I: Getting Started with NLTK Part II: Sentence Tokenize and Word … Continue reading →

Dive Into NLTK, Part IX: From Text Classification to Sentiment Analysis

This is the ninth article in the series “Dive Into NLTK“, here is an index of all the articles in the series that have been published to date: Part I: Getting Started with NLTK Part II: Sentence Tokenize and Word … Continue reading →

CourseMiner – Mining Course for World

I have launched a website CourseMiner for open courses mining, which used simple text mining methods like tag extraction (keyword or keyphrase extraction) and document similarity or text similarity computing. Till now it just add Coursera and edX platform, and … Continue reading →

Getting Started with Text Processing or Natural Language Processing

Text Processing is the one of the most common tasks in the world, this article will focus on the natural language text processing in the computer, which commonly referred to as NLP. According to the wikipedia, Text processing is defined … Continue reading →

Getting Started with Keyword Extraction

Recently, I have surveyed some keyword extraction tools, papers and documents, and record them here for getting started with keyword extraction. According wikipedia, Keyword Extraction is defined like this: Keyword extraction is tasked with the automatic identification of terms that … Continue reading →

Deep Learning for Text Mining from Scratch

Here is a list of courses or materials for you to learn deep learning for text mining from scratch。 Especially recommended Deep Learning Specialization by Andrew Ng About This Specialization If you want to break into AI, this Specialization will … Continue reading →

Getting Started with Sentiment Analysis and Opinion Mining

According wikipedia, Sentiment Analysis is defined like this: Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials. Generally speaking, sentiment … Continue reading →

NLTK Wordnet Word Lemmatizer API for English Word with POS Tag Only

We have told you how to use nltk wordnet lemmatizer in python: Dive Into NLTK, Part IV: Stemming and Lemmatization , and implemented it in our Text Analysis API: NLTK Wordnet Lemmatizer. We have preprocessed the english text with pos … Continue reading →