HomeNLPDive Into NLTK, Part XI: From Word2Vec to WordNet

This is the eleventh article in the series “Dive Into NLTK“, here is an index of all the articles in the series that have been published to date:

Part I: Getting Started with NLTK
Part II: Sentence Tokenize and Word Tokenize
Part III: Part-Of-Speech Tagging and POS Tagger
Part IV: Stemming and Lemmatization
Part V: Using Stanford Text Analysis Tools in Python
Part VI: Add Stanford Word Segmenter Interface for Python NLTK
Part VII: A Preliminary Study on Text Classification
Part VIII: Using External Maximum Entropy Modeling Libraries for Text Classification
Part IX: From Text Classification to Sentiment Analysis
Part X: Play With Word2Vec Models based on NLTK Corpus
Part XI: From Word2Vec to WordNet (this article)

About WordNet

WordNet is a lexical database for English:

WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short definitions and usage examples, and records a number of relations among these synonym sets or their members. WordNet can thus be seen as a combination of dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. The database and software tools have been released under a BSD style license and are freely available for download from the WordNet website. Both the lexicographic data (lexicographer files) and the compiler (called grind) for producing the distributed database are available.

For more about WordNet Install and Test information, we recommended you refer: Getting started with WordNet

WordNet in NLTK

NLTK provides a fantastic python wordnet interface for managing words in WordNet: WordNet Interface, and the source code can be referenced here: Source code for nltk.corpus.reader.wordnet. We can use nltk to play with WordNet:

Python 2.7.6 (default, Jun  3 2014, 07:43:23) 
Type "copyright", "credits" or "license" for more information.
 
IPython 3.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
 
In [1]: from nltk.corpus import wordnet as wn
 
In [2]: wn.synsets('book')
Out[2]: 
[Synset('book.n.01'),
 Synset('book.n.02'),
 Synset('record.n.05'),
 Synset('script.n.01'),
 Synset('ledger.n.01'),
 Synset('book.n.06'),
 Synset('book.n.07'),
 Synset('koran.n.01'),
 Synset('bible.n.01'),
 Synset('book.n.10'),
 Synset('book.n.11'),
 Synset('book.v.01'),
 Synset('reserve.v.04'),
 Synset('book.v.03'),
 Synset('book.v.04')]
 
In [3]: wn.synsets('book', pos=wn.NOUN)
Out[3]: 
[Synset('book.n.01'),
 Synset('book.n.02'),
 Synset('record.n.05'),
 Synset('script.n.01'),
 Synset('ledger.n.01'),
 Synset('book.n.06'),
 Synset('book.n.07'),
 Synset('koran.n.01'),
 Synset('bible.n.01'),
 Synset('book.n.10'),
 Synset('book.n.11')]
 
In [4]: wn.synsets('book', pos=wn.VERB)
Out[4]: 
[Synset('book.v.01'),
 Synset('reserve.v.04'),
 Synset('book.v.03'),
 Synset('book.v.04')]
 
In [5]: wn.synset('book.n.01')
Out[5]: Synset('book.n.01')
 
In [6]: print(wn.synset('book.n.01').definition())
a written work or composition that has been published (printed on pages bound together)
 
In [7]: print(wn.synset('book.v.01').definition())
engage for a performance
 
In [8]: len(wn.synset('book.n.01').examples())
Out[8]: 1
 
In [9]: print(wn.synset('book.n.01').examples()[0])
I am reading a good book on economics
 
In [10]: len(wn.synset('book.v.01').examples())
Out[10]: 1
 
In [11]: print(wn.synset('book.v.01').examples()[0])
Her agent had booked her for several concerts in Tokyo
 
In [12]: wn.synset('book.n.01').lemmas()
Out[12]: [Lemma('book.n.01.book')]
 
In [13]: wn.synset('book.v.01').lemmas()
Out[13]: [Lemma('book.v.01.book')]
 
In [14]: [str(lemma.name()) for lemma in wn.synset('book.n.01').lemmas()]
Out[14]: ['book']
 
In [15]: [str(lemma.name()) for lemma in wn.synset('book.v.01').lemmas()]
Out[15]: ['book']
 
In [16]: wn.lemma('book.n.01.book').synset()
Out[16]: Synset('book.n.01')
 
In [17]: book = wn.synset('book.n.01')
 
In [18]: book.hypernyms()
Out[18]: [Synset('publication.n.01')]
 
In [19]: book.hyponyms()
Out[19]: 
[Synset('appointment_book.n.01'),
 Synset('authority.n.07'),
 Synset('bestiary.n.01'),
 Synset('booklet.n.01'),
 Synset('catalog.n.01'),
 Synset('catechism.n.02'),
 Synset('copybook.n.01'),
 Synset('curiosa.n.01'),
 Synset('formulary.n.01'),
 Synset('phrase_book.n.01'),
 Synset('playbook.n.02'),
 Synset('pop-up_book.n.01'),
 Synset('prayer_book.n.01'),
 Synset('reference_book.n.01'),
 Synset('review_copy.n.01'),
 Synset('songbook.n.01'),
 Synset('storybook.n.01'),
 Synset('textbook.n.01'),
 Synset('tome.n.01'),
 Synset('trade_book.n.01'),
 Synset('workbook.n.01'),
 Synset('yearbook.n.01')]
 
In [20]: book.member_holonyms()
Out[20]: []
 
In [21]: book.root_hypernyms()
Out[21]: [Synset('entity.n.01')]
 
In [22]: man = wn.synset('man.n.01')
 
In [23]: man.lemmas()
Out[23]: [Lemma('man.n.01.man'), Lemma('man.n.01.adult_male')]
 
In [24]: man.lemmas()[0]
Out[24]: Lemma('man.n.01.man')
 
In [25]: man.lemmas()[0].antonyms()
Out[25]: [Lemma('woman.n.01.woman')]

You can browse “book.n.01” and “man.n.01” on WordNet Online.

Word Similarity Interface by WordNet

In [43]: cat = wn.synset('cat.n.01')
 
In [44]: dog = wn.synset('dog.n.01')
 
In [45]: man = wn.synset('man.n.01')
 
In [46]: woman = wn.synset('woman.n.01')
 
In [47]: hit = wn.synset('hit.v.01')
 
In [48]: kick = wn.synset('kick.v.01')
 
In [49]: cat.path_similarity(cat)
Out[49]: 1.0
 
In [50]: cat.path_similarity(dog)
Out[50]: 0.2
 
In [51]: man.path_similarity(woman)
Out[51]: 0.3333333333333333
 
In [52]: hit.path_similarity(kick)
Out[52]: 0.3333333333333333
 
In [53]: cat.lch_similarity(dog)
Out[53]: 2.0281482472922856
 
In [54]: man.lch_similarity(woman)
Out[54]: 2.538973871058276
 
In [55]: hit.lch_similarity(kick)
Out[55]: 2.159484249353372
 
In [56]: cat.wup_similarity(dog)
Out[56]: 0.8571428571428571
 
In [57]: man.wup_similarity(woman)
Out[57]: 0.6666666666666666
 
In [58]: hit.wup_similarity(kick)
Out[58]: 0.6666666666666666

You can browse “cat.n.01“, “dog.n.01“, “man.n.01“, “woman.n.01“, “hit.v.01” and “kick.v.01” on WordNet Online.

Reference:
Getting started with WordNet by Text Processing
WordNet Interface by NLTK
WordNet and ImageNet
Open Multilingual Wordnet
Wordnet with NLTK
Tutorial: What is WordNet? A Conceptual Introduction Using Python
Dive into WordNet with NLTK

Posted by TextMiner


Comments

Dive Into NLTK, Part XI: From Word2Vec to WordNet — No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *