Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify elements in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
Most research on NER systems has been structured as taking an unannotated block of text, such as this one:
Jim bought 300 shares of Acme Corp. in 2006.
And producing an annotated block of text that highlights the names of entities:
[Jim]Person bought 300 shares of [Acme Corp.]Organization in Time.
Stanford NER (Named Entity Recognizer) is one of the most popular Named Entity Recognition tools and implemented by Java. Now the problem appeared, how to use Stanford NER in other languages? Like Python, Ruby, PHP and etc.
Luckily, NLTK provided an interface of Stanford NER: A module for interfacing with the Stanford taggers. The example use Stanford NER in Python with NLTK like the following:
>>> from nltk.tag.stanford import NERTagger
>>> st = NERTagger(‘/usr/share/stanford-ner/classifiers/all.3class.distsim.crf.ser.gz’,
>>> st.tag(‘Rami Eid is studying at Stony Brook University in NY’.split())
[(‘Rami’, ‘PERSON’), (‘Eid’, ‘PERSON’), (‘is’, ‘O’), (‘studying’, ‘O’),
(‘at’, ‘O’), (‘Stony’, ‘ORGANIZATION’), (‘Brook’, ‘ORGANIZATION’),
(‘University’, ‘ORGANIZATION’), (‘in’, ‘O’), (‘NY’, ‘LOCATION’)]
The stanford-ner.jar and classifier modle “all.3class.distsim.crf.ser.gz” can be downloaded here: Download Stanford Named Entity Recognizer version 3.4
The download is a 66M zipped file (mainly consisting of classifier data objects). If you unpack that file, you should have everything needed. It includes batch files for running under Windows or Unix/Linux/MacOSX, a simple GUI, and the ability to run as a server. Stanford NER requires Java v1.6+.
And notice that you should install Java JRE first, for example, in ubuntu 12.04, installing Java JRE and JDK is very simple:
Installing default JRE/JDK
This is the recommended and easiest option. This will install OpenJDK 6 on Ubuntu 12.04 and earlier and on 12.10+ it will install OpenJDK 7.
Installing Java with apt-get is easy. First, update the package index:
sudo apt-get update
Then, check if Java is not already installed:
If it returns “The program java can be found in the following packages”, Java hasn’t been installed yet, so execute the following command:
sudo apt-get install default-jre
This will install the Java Runtime Environment (JRE). If you instead need the Java Development Kit (JDK), which is usually needed to compile Java applications (for example Apache Ant, Apache Maven, Eclipse and IntelliJ IDEA execute the following command:
sudo apt-get install default-jdk
That is everything that is needed to install Java.
All other steps are optional and must only be executed when needed.
Now you can enjoy the Stanford NER in your Python programming project by NLTK and Java JRE.
If you want use Stanford NER in other programming languages like Java/JVM/Android, Node.js, PHP, Python, Objective-C/iOS, Ruby, .NET, the best way is use the REST API by our Text Analysis API on Mashape Platform, which provide the Stanford NER Service online, you can test it on our demo here: NLTK Stanford Named Entity Recognizer. Hope you enjoy it!
Posted by TextMiner.