Lab 3: POS
Complete the code in the given file. There are 5 questions in the code. Please code or answer all 5. You can put your answers in the code using comments (the # sign)
import nltk
from nltk.corpus import
own
own_tagged_sents =
own.tagged_sents(categories='news')
size = int(len(
own_tagged_sents) * 0.9)
train_sents =
own_tagged_sents[:size]
test_sents =
own_tagged_sents[size:]
#train the tagge
unigram_tagger = nltk.UnigramTagger(train_sents)
#calculate the accuracy
print("Results on test set {0}".format(unigram_tagger.evaluate(train_sents)))
print("Results on test set {0}".format(unigram_tagger.evaluate(test_sents)))
#1) Why is the training accuracy higher than the testing accuracy?
#2) Why is the training accuracy not perfect (100%)
def_tagger= nltk.DefaultTagger("NN")
uni_tagger= nltk.UnigramTagger(train_sents, backoff=def_tagger)
print("Results on test set {0}".format(uni_tagger.evaluate(train_sents)))
print("Results on test set {0}".format(uni_tagger.evaluate(test_sents)))
#3 Why does the accuracy score on the training data not go up but it does on the test data?
#4 Create two new taggers, A BigramTagger that has not backoff and a BigramTagger that user a unigram tagger as backoff. Report the accuracies. Why is one so much lower than the other?
#5 Repeat #4 with a TrigramTagger using a Bigramtagger as backoff