Getting Started with MBSP
MBSP is a Python text analysis tool like NLTK, TextBlob, Pattern.
About MBSP for Python
According MBSP official website:
MBSP is a text analysis system based on the TiMBL and MBT memory based learning applications developed at CLiPS and ILK. It provides tools for Tokenization and Sentence Splitting, Part of Speech Tagging, Chunking, Lemmatization, Relation Finding and Prepositional Phrase Attachment.
Installing MBSP
MBSP works with Python 2.x, and it’s bundled with three required dependencies written in C/C++(TiMBL, MBT and MBLEM). Binaries have been precompiled for Mac OS X 10.5, if it was not work on your machine, you need to compile binaries manually from the source code. The simplest way to install MBSP is by setup.py, the module comes with a setup.py script that compiles the C/C++ binaries automatically. After download MBSP package, you just need:
> cd MBSP
> python setup.py install
If this words for you, you’r in luck – no manual compilation is necessary. Other ways you need manually install TiMBL, MBT and MBLEM, you can following the install steps on the official website.
Text Analysis by MBSP
MBSP uses a client-server architecture for some text analysis tasks. After import MBSP in the python, you can find some server start with some port on your local machine:
>>> import MBSP
Starting server ‘chunk’ at localhost:6061….
Starting server ‘lemma’ at localhost:6062..
Starting server ‘relation’ at localhost:6063…………
Starting server ‘preposition’ at localhost:6064………………
>>>
Now you can use some Text Processing functions by MBSP.
1) Word Tokenize
MBSP provides the basic word tokenize method by tokenize():
>>> MBSP.tokenize(“this’s a text analysis test”)
u”this ‘s a text analysis test”
2) POS Tagging
MBSP provides the basic part-of-speech tagging by tag() method:
>>> MBSP.tag(“this’s a text analysis test”)
u”this/DT ‘s/VBZ a/DT text/NN analysis/NN test/NN”
You can find the details of the tags on the MBSP Official site: Penn Treebank II tag set
You can use the MBSP lemmatizer by lemmatize method:
>>> MBSP.lemmatize(“The cats were spleeping.”, tokenize=True)
u’the cat be spleep .’
4) Chunk
MBSP provides chunk method for phrase extraction:
>>> MBSP.chunk(“The cats were spleeping.”)
u’The/DT/I-NP/O cats/NNS/I-NP/O were/VBD/I-VP/O spleeping/VBG/I-VP/O ././O/O’
>>>
5) Parse
Parse methods can be used for text processing:
>>> MBSP.parse(“The cats were spleeping.”)
u’The/DT/I-NP/O/NP-SBJ-1/O/the cats/NNS/I-NP/O/NP-SBJ-1/O/cat were/VBD/I-VP/O/VP-1/O/be spleeping/VBG/I-VP/O/VP-1/O/spleep ././O/O/O/O/.’
If you want use MBSP in other programming languages like java, php, ruby, we strongly recommend you try our Text Analysis API on Mashape.
Posted by TextMiner
Comments
Getting Started with MBSP — No Comments