Dive Into NLTK, Part IX: From Text Classification to Sentiment Analysis

This is the ninth article in the series “Dive Into NLTK“, here is an index of all the articles in the series that have been published to date: Part I: Getting Started with NLTK Part II: Sentence Tokenize and Word … Continue reading →

CourseMiner – Mining Course for World

I have launched a website CourseMiner for open courses mining, which used simple text mining methods like tag extraction (keyword or keyphrase extraction) and document similarity or text similarity computing. Till now it just add Coursera and edX platform, and … Continue reading →

Getting Started with Text Processing or Natural Language Processing

Text Processing is the one of the most common tasks in the world, this article will focus on the natural language text processing in the computer, which commonly referred to as NLP. According to the wikipedia, Text processing is defined … Continue reading →

Getting Started with Keyword Extraction

Recently, I have surveyed some keyword extraction tools, papers and documents, and record them here for getting started with keyword extraction. According wikipedia, Keyword Extraction is defined like this: Keyword extraction is tasked with the automatic identification of terms that … Continue reading →

Deep Learning for Text Mining from Scratch

Here is a list of courses or materials for you to learn deep learning for text mining from scratch。 Mathematics Everything start from mathematics. 1. Pre-Calculus About the Course Through this course, students will acquire a solid foundation in algebra … Continue reading →

Getting Started with Sentiment Analysis and Opinion Mining

According wikipedia, Sentiment Analysis is defined like this: Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials. Generally speaking, sentiment … Continue reading →

NLTK Wordnet Word Lemmatizer API for English Word with POS Tag Only

We have told you how to use nltk wordnet lemmatizer in python: Dive Into NLTK, Part IV: Stemming and Lemmatization , and implemented it in our Text Analysis API: NLTK Wordnet Lemmatizer. We have preprocessed the english text with pos … Continue reading →

Training Word2Vec Model on English Wikipedia by Gensim

After learning word2vec and glove, a natural way to think about them is training a related model on a larger corpus, and english wikipedia is an ideal choice for this task. After google the related keywords like “word2vec wikipedia”, “gensim … Continue reading →

Text Analysis Online no longer provides NLTK Stanford NLP API Interface

Text Analysis Online no longer provides NLTK Stanford NLP API Interface, but keep the related demo just for testing: NLTK Stanford POS Tagger: http://textanalysisonline.com/nltk-stanford-postagger NLTK Stanford Named Entity Recognizer: http://textanalysisonline.com/nltk-stanford-ner NLTK Stanford Named Entity Recognizer for 7Class: http://textanalysisonline.com/nltk-stanford-ner-7class NLTK Stanford … Continue reading →

Getting Started with Word2Vec and GloVe

Word2Vec and GloVe are two popular word embedding algorithms recently which used to construct vector representations for words. And those methods can be used to compute the semantic similarity between words by the mathematically vector representation. The c/c++ tools for … Continue reading →