Update Korean, Russian, French, German, Spanish Wikipedia Word2Vec Model for Word Similarity

I have launched WordSimilarity on April, which focused on computing the word similarity between two words by word2vec model based on the Wikipedia data. The website has the English Word2Vec Model for English Word Similarity: Exploiting Wikipedia Word Similarity by … Continue reading →

Exploiting Wikipedia Word Similarity by Word2Vec

We have written “Training Word2Vec Model on English Wikipedia by Gensim” before, and got a lot of attention. Recently, I have reviewed Word2Vec related materials again and test a new method to process the English wikipedia data and train Word2Vec … Continue reading →

Getting Started with Text Processing or Natural Language Processing

Text Processing is the one of the most common tasks in the world, this article will focus on the natural language text processing in the computer, which commonly referred to as NLP. According to the wikipedia, Text processing is defined … Continue reading →

Getting Started with Pattern

We have talked about NLTK and TextBlob, now it’s time to “Getting Started with Pattern”. About Pattern According Pattern Official Website: Pattern is a web mining module for the Python programming language. It has tools for data mining (Google, Twitter … Continue reading →

We have launched the Text Analysis API on Mashape

We have launched the Text Analysis API on Mashape: TextAnalysis API TextAnalysis API provides customized Text Analysis or Text Mining Services like Word Tokenize, Part-of-Speech(POS) Tagging, Stemmer, Lemmatizer, Chunker, Parser, Key Phrase Extraction(Noun Phrase Extraction), Sentence Segmentation(Sentence Boundary Detection), Grammar … Continue reading →