Nowadays, you can find a tag cloud1 at every turn on the Internet. This is a visual representation of the frequency with which words appear in a text. Words that are more common are displayed in a larger font. Thus, the key words of the text immediately catch the eye. The visual effect can be enhanced by working with different colors and placement of the words. Usually a list of stop words is used as well. These are words that occur frequently in most texts. Words that appear in the stop word list, do not appear in the word cloud.

woordenwolk
Example of a tag cloud.

Assignment

Write a function tag cloud that returns the frequency of words that occur in a given text file in the form of a dictionary. Each word from the text that is not in a given stop word list is used as a key in the dictionary, and the corresponding value is the number of occurrences of that word in the text. The words of the text are formed by the longest possible sequences of letters and the apostrophe ('). The text fragment "Don't say that word!" thus consists of four words.

To the function takes the locations of two text files as its arguments. The first text file contains a few lines of text. The second text file contains a list of stop words, with each stop word on a separate line.

def tagcloud(textFile, stopWordsFile)

Note: If you correctly implement the function tagcloud, it will be used to generate a tag cloud based on the script of a famous film. Can you figure out which film?

Example

>>> tagcloud('tagcloud.txt', 'stopWords.txt')
{'say': 1, 'word': 2, "don't": 1}