HomeHow to Use Mashape APIGetting Started with TextBlob
Deep Learning Specialization on Coursera

TextBlob is a new python natural language processing toolkit, which stands on the shoulders of giants like NLTK and Pattern, provides text mining, text analysis and text processing modules for python developers. Here I will introduce the basics of TextBlob and show the text processing result with our demo website: Text Analysis Online.

About TextBlob

Following is the description from the TextBlob official website:

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

Installing TextBlob

TextBlob reuses the NLTK corpora, and If you installed the NLTK before, everything seems simple, TextBlob will use your local version instead of the bundled version. The following TextBlob install steps are tested on my mac os and ubuntu 12.04 vps, not test on the windows system, with a python version 2.7, ant TextBlob support Python >= 2.6 or 3.3.

The simplest way to install TextBlob is by PyPI:

$ pip install -U textblob
$ python -m textblob.download_corpora

This will install TextBlob and download the necessary NLTK corpora, if you have installed NLTK before, you didn’t need to download the corpora.

Another way to install TextBlob is from the open source code by Github:

TextBlob is actively developed on Github.

You can clone the public repo:

$ git clone https://github.com/sloria/TextBlob.git

Or download one of the following:

tarball
zipball

Once you have the source, you can install it into your site-packages with

$ python setup.py install

Test TextBlob

After installing TextBlob, you can test it by the Python interpreter:

>>> from textblob import TextBlob
>>> text = “Natural language processing (NLP) deals with the application of computational models to text or speech data. Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways. NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form. From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models.”
>>> blob = TextBlob(text)
>>> blob.tags
[(u’Natural’, u’NNP’), (u’language’, u’NN’), (u’processing’, u’NN’), (u’NLP’, u’NN’), (u’deals’, u’NNS’), (u’with’, u’IN’), (u’the’, u’DT’), (u’application’, u’NN’), (u’of’, u’IN’), (u’computational’, u’JJ’), (u’models’, u’NNS’), (u’to’, u’TO’), (u’text’, u’NN’), (u’or’, u’CC’), (u’speech’, u’NN’), (u’data’, u’NNS’), (u’Application’, u’NNP’), (u’areas’, u’NNS’), (u’within’, u’IN’), (u’NLP’, u’NN’), (u’include’, u’VBP’), (u’automatic’, u’JJ’), (u’machine’, u’NN’), (u’translation’, u’NN’), (u’between’, u’IN’), (u’languages’, u’NNS’), (u’dialogue’, u’NN’), (u’systems’, u’NNS’), (u’which’, u’WDT’), (u’allow’, u’VB’), (u’a’, u’DT’), (u’human’, u’JJ’), (u’to’, u’TO’), (u’interact’, u’VBP’), (u’with’, u’IN’), (u’a’, u’DT’), (u’machine’, u’NN’), (u’using’, u’VBG’), (u’natural’, u’JJ’), (u’language’, u’NN’), (u’and’, u’CC’), (u’information’, u’NN’), (u’extraction’, u’NN’), (u’where’, u’WRB’), (u’the’, u’DT’), (u’goal’, u’NN’), (u’is’, u’VBZ’), (u’to’, u’TO’), (u’transform’, u’VB’), (u’unstructured’, u’JJ’), (u’text’, u’NN’), (u’into’, u’IN’), (u’structured’, u’VBN’), (u’database’, u’NN’), (u’representations’, u’NNS’), (u’that’, u’IN’), (u’can’, u’MD’), (u’be’, u’VB’), (u’searched’, u’VBD’), (u’and’, u’CC’), (u’browsed’, u’VBN’), (u’in’, u’IN’), (u’flexible’, u’JJ’), (u’ways’, u’NNS’), (u’NLP’, u’NN’), (u’technologies’, u’NNS’), (u’are’, u’VBP’), (u’having’, u’VBG’), (u’a’, u’DT’), (u’dramatic’, u’JJ’), (u’impact’, u’NN’), (u’on’, u’IN’), (u’the’, u’DT’), (u’way’, u’NN’), (u’people’, u’NNS’), (u’interact’, u’VBP’), (u’with’, u’IN’), (u’computers’, u’NNS’), (u’on’, u’IN’), (u’the’, u’DT’), (u’way’, u’NN’), (u’people’, u’NNS’), (u’interact’, u’VBP’), (u’with’, u’IN’), (u’each’, u’DT’), (u’other’, u’JJ’), (u’through’, u’IN’), (u’the’, u’DT’), (u’use’, u’NN’), (u’of’, u’IN’), (u’language’, u’NN’), (u’and’, u’CC’), (u’on’, u’IN’), (u’the’, u’DT’), (u’way’, u’NN’), (u’people’, u’NNS’), (u’access’, u’NN’), (u’the’, u’DT’), (u’vast’, u’JJ’), (u’amount’, u’NN’), (u’of’, u’IN’), (u’linguistic’, u’JJ’), (u’data’, u’NNS’), (u’now’, u’RB’), (u’in’, u’IN’), (u’electronic’, u’JJ’), (u’form’, u’NN’), (u’From’, u’IN’), (u’a’, u’DT’), (u’scientific’, u’JJ’), (u’viewpoint’, u’NN’), (u’NLP’, u’NN’), (u’involves’, u’VBZ’), (u’fundamental’, u’JJ’), (u’questions’, u’NNS’), (u’of’, u’IN’), (u’how’, u’WRB’), (u’to’, u’TO’), (u’structure’, u’NN’), (u’formal’, u’JJ’), (u’models’, u’NNS’), (u’for’, u’IN’), (u’example’, u’NN’), (u’statistical’, u’JJ’), (u’models’, u’NNS’), (u’of’, u’IN’), (u’natural’, u’JJ’), (u’language’, u’NN’), (u’phenomena’, u’NNS’), (u’and’, u’CC’), (u’of’, u’IN’), (u’how’, u’WRB’), (u’to’, u’TO’), (u’design’, u’NN’), (u’algorithms’, u’NNS’), (u’that’, u’IN’), (u’implement’, u’VB’), (u’these’, u’DT’), (u’models’, u’NNS’)]
>>> blob.noun_phrases
WordList([u’natural language processing’, ‘nlp’, u’computational models’, u’speech data.’, ‘application’, ‘nlp’, u’dialogue systems’, u’natural language’, u’information extraction’, ‘nlp’, u’dramatic impact’, u’way people interact’, u’way people interact’, u’way people access’, u’vast amount’, u’linguistic data’, u’electronic form.’, u’scientific viewpoint’, ‘nlp’, u’fundamental questions’, u’formal models’, u’statistical models’, u’natural language phenomena’, u’design algorithms’])
>>> blob.sentences
[Sentence(“Natural language processing (NLP) deals with the application of computational models to text or speech data.”), Sentence(“Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways.”), Sentence(“NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form.”), Sentence(“From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models.”)]
>>> len(blob.sentences)
4
>>> for sentence in blob.sentences:
… print sentence.sentiment.polarity

0.1
0.05
-0.114583333333
0.1
>>> blob.translate(to=”fr”)
TextBlob(“Traitement du langage naturel (NLP ) traite de l’application de modèles de calcul au texte ou de la parole données . Les domaines d’application au sein de la PNL sont automatique (machine) traduction entre les langues ; les systèmes de dialogue , qui permettent à un être humain d’interagir avec un ordinateur en utilisant un langage naturel ; et l’extraction de l’information , dont l’objectif est de transformer un texte non structuré dans structurées ( bases de données) des représentations qui peuvent être recherchées et parcourues de manière flexible . Technologies PNL ont un impact dramatique sur la façon dont les gens interagissent avec les ordinateurs , sur la façon dont les gens interagissent les uns avec les autres grâce à l’utilisation de la langue , et sur ​​la façon dont les gens accèdent à la grande quantité de données linguistiques maintenant sous forme électronique . D’un point de vue scientifique , la PNL soulève des questions fondamentales sur la façon de structurer les modèles formels (par exemple les modèles statistiques ) des phénomènes de langage naturel , et de la façon de concevoir des algorithmes qui implémentent ces modèles .”)

Dive into TextBlob
Based on NLTK, Pattern and other NLP Tools, TextBlob support following text processing features, including:

Word Tokenization
Sentence Tokenization
Part-of-speech tagging
Noun phrase extraction
Sentiment analysis
Word Pluralization
Word Singularization
Spelling correction
Parsing
Classification (Naive Bayes, Decision Tree)
Language translation and detection powered by Google Translate
Word and phrase frequencies
n-grams
Word inflection (pluralization and singularization) and lemmatization
JSON serialization
Add new models or languages through extensions
WordNet integration

Now it’s time to introduce them one by one, first create a TextBlob:

>>> from textblob import TextBlob
>>> nlpblob = TextBlob(“Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages.”)

1) Word Tokenization

You can tokenize TextBlobs into words:

>>> nlpblob.words
WordList([‘Natural’, ‘language’, ‘processing’, ‘NLP’, ‘is’, ‘a’, ‘field’, ‘of’, ‘computer’, ‘science’, ‘artificial’, ‘intelligence’, ‘and’, ‘linguistics’, ‘concerned’, ‘with’, ‘the’, ‘interactions’, ‘between’, ‘computers’, ‘and’, ‘human’, ‘natural’, ‘languages’])

2) Sentence Tokenization

TextBlob can be used to segment sentence from text paragraph:

>>> nlpblob.sentences
[Sentence(“Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages.”)]

3)Part-of-Speech Tagging

You can use TextBlob to get POS Tagging result, which can be accessed through the tags property:

>>> nlpblob.tags
[(u’Natural’, u’NNP’), (u’language’, u’NN’), (u’processing’, u’NN’), (u’NLP’, u’NN’), (u’is’, u’VBZ’), (u’a’, u’DT’), (u’field’, u’NN’), (u’of’, u’IN’), (u’computer’, u’NN’), (u’science’, u’NN’), (u’artificial’, u’JJ’), (u’intelligence’, u’NN’), (u’and’, u’CC’), (u’linguistics’, u’NNS’), (u’concerned’, u’VBN’), (u’with’, u’IN’), (u’the’, u’DT’), (u’interactions’, u’NNS’), (u’between’, u’IN’), (u’computers’, u’NNS’), (u’and’, u’CC’), (u’human’, u’JJ’), (u’natural’, u’JJ’), (u’languages’, u’NNS’)]

4) Noun Phrase Extraction

Furthermore, you can get noun phrases by accessing through the noun_phrases property in TextBlob:

>>> nlpblob.noun_phrases
WordList([u’natural language processing’, ‘nlp’, u’computer science’, u’artificial intelligence’])

5) Sentiment Analysis

Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials. You can get the sentiment analysis result by TextBlob:

>>> nlpblob.sentiment
Sentiment(polarity=-0.1, subjectivity=0.475)
>>> nlpblob.sentiment.polarity
-0.1
>>> nlpblob.sentiment.subjectivity
0.475

6) Word Singularize

>>> nlpblob.words[19]
‘computers’
>>> nlpblob.words[19].singularize()
‘computer’

7) Word Pluralize

>>> nlpblob.words[21]
‘human’
>>> nlpblob.words[21].pluralize()
‘humen’

8) Words Lemmatization

Words can be lemmatized by the lemmatize method, but notice that the TextBlog lemmatize method is inherited from NLTK Word Lemmatizer, and the default POS Tag is “n”, if you want lemmatize other pos tag words, you need specify it:

>> from textblob import Word
>>> w = Word(“octopi”)
>>> w.lemmatize()
u’octopus’
>>> w = Word(“is”)
>>> w.lemmatize()
‘is’
>>> w.lemmatize(“v”)
u’be’

9)Spelling Correction

TextBlob Spelling correction is based on Peter Norvig’s “How to Write a Spelling Corrector“, which is implemented in the pattern library:

>>> b = TextBlob(“I havv good speling!”)
>>> b.correct()
TextBlob(“I have good spelling!”)

Word objects also have a spellcheck() method that returns a list of (word, confidence) tuples with spelling suggestions:

TextBlob(“I have good spelling!”)
>>> from textblob import Word
>>> w = Word(‘havv’)
>>> w.spellcheck()
[(u’have’, 1.0)]

9) Parsing

TextBlob parse method is based on pattern parser:

>>> nlpblob.parse()
u’Natural/NNP/B-NP/O language/NN/I-NP/O processing/NN/I-NP/O (/(/O/O NLP/NN/B-NP/O )/)/O/O is/VBZ/B-VP/O a/DT/B-NP/O field/NN/I-NP/O of/IN/B-PP/B-PNP computer/NN/B-NP/I-PNP science/NN/I-NP/I-PNP ,/,/O/O artificial/JJ/B-NP/O intelligence/NN/I-NP/O ,/,/O/O and/CC/O/O linguistics/NNS/B-NP/O concerned/VBN/B-VP/O with/IN/B-PP/B-PNP the/DT/B-NP/I-PNP interactions/NNS/I-NP/I-PNP between/IN/B-PP/B-PNP computers/NNS/B-NP/I-PNP and/CC/O/O human/JJ/B-ADJP/O (/(/O/O natural/JJ/B-ADJP/O )/)/O/O languages/NNS/B-NP/O ././O/O’

10) Translation and Language Detection

TextBlob’s translation and language detection feature are based on Google’s API:

>>> nlpblob.translate(to=”es”)
TextBlob(“Procesamiento del lenguaje natural (NLP ) es un campo de la informática , la inteligencia artificial, y la lingüística que se ocupan de las interacciones entre las computadoras y los lenguajes humanos (naturales) .”)
>>> nlpblob.translate(to=”zh”)
TextBlob(“自然语言处理(NLP )是计算机科学,人工智能和语言学关注计算机和人类(自然)语言之间的相互作用的一个领域。”)
>>> nlpblob.detect_language()
u’en’
>>> nlpblob.translate(to=”zh”).detect_language()
u’zh-CN’
>>>

Except the features we have mentioned here, you can still find other text processing features by TextBlob, even to train a text classification model and build your own’s text classifier, just following the official guide you can find more interesting text analysis features.

Using TextBlob in Other Programming Languages

TextBlob is a Python text mining tool, only support the Python environment, even though, you have to install the NLTK Corpora and other dependences before you can use TextBlob, that’s a tedious thing. So we have intergrated TextBlog into our Text Analysis API on mashape, and the details info you can find on our demo website: Text Analysis Online. Based on mashape, you can use TextBlob features in Java/JVM/Android, Node.js, PHP, Python, Objective-C/iOS, Ruby, .NET and other programming languages. You can use the text analysis api by the free plan first, which limited 1000 requests/day. If you need more requests, just pay a small amount money, which can be used to support our web server. Mashape is the Cloude API Marketplace, all you need to do is just three steps:

1. Register a Mashape account;
2. Go to the Text Analysis API page on Mashape and subscrible to it;
3. Start using the TexBlob method by Text Analysis API;

That’s all, you can use TextBlob by a local version, or by our text analysis services: Text Analysis Online.

Posted by TextMiner

Deep Learning Specialization on Coursera

Comments

Getting Started with TextBlob — 8 Comments

  1. Pingback: Getting Started with Pattern | Text Mining Online | Text Analysis Online | Text Processing Online

  2. Can i extend TextBlob to use some specific NLTK corpus ? If i use the NaiveBayes analyzer for Sentiment Analysis in TextBlob it used the Movie database. How can i make it use some other corpus ?

Leave a Reply

Your email address will not be published.