XXXXXXXXXXpoints)
a) Define NER and what it is useful for:
) Give 3 examples of boundary issues in NER
c) Run NLTK’s NER model on the following sentence: “Trump is going to Paris on France’s express train with John Kelly”
i) Show your code here:
ii) Calculate the accuracy of the model on this sentence
2. (25 points)
a) What is the difference between polarity sentiment analysis and categorical sentiment analysis? Give examples
) Give 3 sentences that would be hard to get co
ect sentiment for. Each one should show different language issue.
i)
ii)
iii)
c) Code in python and report the polarity sentiment for the following sentence: “This is the best exam in the world!”
i) Show your code here:
ii) Report the score:
3. (25 points)
a)What is the main use of TFIDF? Give two examples of use cases for it
)Why doesn’t TFIDF return the most frequent word in a document as the most important in all cases?
c)What is the downside to using TFIDF? What does it not do well?
4. (25 points)
a) Explain a Zipf curve and what it detects.
) Does a Zipf curve look the same in all languages? What would cause a Zipf curve to be abnormal looking?
c) Draw a Zipf Curve. Label the X and Y axis and show example values