Dive Into NLTK, Part XI: From Word2Vec to WordNet
This is the eleventh article in the series “Dive Into NLTK“, here is an index of all the articles in the series that have been published to date:
Part I: Getting Started with NLTK
Part II: Sentence Tokenize and Word Tokenize
Part III: Part-Of-Speech Tagging and POS Tagger
Part IV: Stemming and Lemmatization
Part V: Using Stanford Text Analysis Tools in Python
Part VI: Add Stanford Word Segmenter Interface for Python NLTK
Part VII: A Preliminary Study on Text Classification
Part VIII: Using External Maximum Entropy Modeling Libraries for Text Classification
Part IX: From Text Classification to Sentiment Analysis
Part X: Play With Word2Vec Models based on NLTK Corpus
Part XI: From Word2Vec to WordNet (this article)
About WordNet
WordNet is a lexical database for English:
WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short definitions and usage examples, and records a number of relations among these synonym sets or their members. WordNet can thus be seen as a combination of dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. The database and software tools have been released under a BSD style license and are freely available for download from the WordNet website. Both the lexicographic data (lexicographer files) and the compiler (called grind) for producing the distributed database are available.
For more about WordNet Install and Test information, we recommended you refer: Getting started with WordNet
WordNet in NLTK
NLTK provides a fantastic python wordnet interface for managing words in WordNet: WordNet Interface, and the source code can be referenced here: Source code for nltk.corpus.reader.wordnet. We can use nltk to play with WordNet:
Python 2.7.6 (default, Jun 3 2014, 07:43:23) Type "copyright", "credits" or "license" for more information. IPython 3.1.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. In [1]: from nltk.corpus import wordnet as wn In [2]: wn.synsets('book') Out[2]: [Synset('book.n.01'), Synset('book.n.02'), Synset('record.n.05'), Synset('script.n.01'), Synset('ledger.n.01'), Synset('book.n.06'), Synset('book.n.07'), Synset('koran.n.01'), Synset('bible.n.01'), Synset('book.n.10'), Synset('book.n.11'), Synset('book.v.01'), Synset('reserve.v.04'), Synset('book.v.03'), Synset('book.v.04')] In [3]: wn.synsets('book', pos=wn.NOUN) Out[3]: [Synset('book.n.01'), Synset('book.n.02'), Synset('record.n.05'), Synset('script.n.01'), Synset('ledger.n.01'), Synset('book.n.06'), Synset('book.n.07'), Synset('koran.n.01'), Synset('bible.n.01'), Synset('book.n.10'), Synset('book.n.11')] In [4]: wn.synsets('book', pos=wn.VERB) Out[4]: [Synset('book.v.01'), Synset('reserve.v.04'), Synset('book.v.03'), Synset('book.v.04')] In [5]: wn.synset('book.n.01') Out[5]: Synset('book.n.01') In [6]: print(wn.synset('book.n.01').definition()) a written work or composition that has been published (printed on pages bound together) In [7]: print(wn.synset('book.v.01').definition()) engage for a performance In [8]: len(wn.synset('book.n.01').examples()) Out[8]: 1 In [9]: print(wn.synset('book.n.01').examples()[0]) I am reading a good book on economics In [10]: len(wn.synset('book.v.01').examples()) Out[10]: 1 In [11]: print(wn.synset('book.v.01').examples()[0]) Her agent had booked her for several concerts in Tokyo In [12]: wn.synset('book.n.01').lemmas() Out[12]: [Lemma('book.n.01.book')] In [13]: wn.synset('book.v.01').lemmas() Out[13]: [Lemma('book.v.01.book')] In [14]: [str(lemma.name()) for lemma in wn.synset('book.n.01').lemmas()] Out[14]: ['book'] In [15]: [str(lemma.name()) for lemma in wn.synset('book.v.01').lemmas()] Out[15]: ['book'] In [16]: wn.lemma('book.n.01.book').synset() Out[16]: Synset('book.n.01') In [17]: book = wn.synset('book.n.01') In [18]: book.hypernyms() Out[18]: [Synset('publication.n.01')] In [19]: book.hyponyms() Out[19]: [Synset('appointment_book.n.01'), Synset('authority.n.07'), Synset('bestiary.n.01'), Synset('booklet.n.01'), Synset('catalog.n.01'), Synset('catechism.n.02'), Synset('copybook.n.01'), Synset('curiosa.n.01'), Synset('formulary.n.01'), Synset('phrase_book.n.01'), Synset('playbook.n.02'), Synset('pop-up_book.n.01'), Synset('prayer_book.n.01'), Synset('reference_book.n.01'), Synset('review_copy.n.01'), Synset('songbook.n.01'), Synset('storybook.n.01'), Synset('textbook.n.01'), Synset('tome.n.01'), Synset('trade_book.n.01'), Synset('workbook.n.01'), Synset('yearbook.n.01')] In [20]: book.member_holonyms() Out[20]: [] In [21]: book.root_hypernyms() Out[21]: [Synset('entity.n.01')] In [22]: man = wn.synset('man.n.01') In [23]: man.lemmas() Out[23]: [Lemma('man.n.01.man'), Lemma('man.n.01.adult_male')] In [24]: man.lemmas()[0] Out[24]: Lemma('man.n.01.man') In [25]: man.lemmas()[0].antonyms() Out[25]: [Lemma('woman.n.01.woman')] |
You can browse “book.n.01” and “man.n.01” on WordNet Online.
Word Similarity Interface by WordNet
In [43]: cat = wn.synset('cat.n.01') In [44]: dog = wn.synset('dog.n.01') In [45]: man = wn.synset('man.n.01') In [46]: woman = wn.synset('woman.n.01') In [47]: hit = wn.synset('hit.v.01') In [48]: kick = wn.synset('kick.v.01') In [49]: cat.path_similarity(cat) Out[49]: 1.0 In [50]: cat.path_similarity(dog) Out[50]: 0.2 In [51]: man.path_similarity(woman) Out[51]: 0.3333333333333333 In [52]: hit.path_similarity(kick) Out[52]: 0.3333333333333333 In [53]: cat.lch_similarity(dog) Out[53]: 2.0281482472922856 In [54]: man.lch_similarity(woman) Out[54]: 2.538973871058276 In [55]: hit.lch_similarity(kick) Out[55]: 2.159484249353372 In [56]: cat.wup_similarity(dog) Out[56]: 0.8571428571428571 In [57]: man.wup_similarity(woman) Out[57]: 0.6666666666666666 In [58]: hit.wup_similarity(kick) Out[58]: 0.6666666666666666 |
You can browse “cat.n.01“, “dog.n.01“, “man.n.01“, “woman.n.01“, “hit.v.01” and “kick.v.01” on WordNet Online.
Reference:
Getting started with WordNet by Text Processing
WordNet Interface by NLTK
WordNet and ImageNet
Open Multilingual Wordnet
Wordnet with NLTK
Tutorial: What is WordNet? A Conceptual Introduction Using Python
Dive into WordNet with NLTK
Posted by TextMiner
Comments
Dive Into NLTK, Part XI: From Word2Vec to WordNet — No Comments