Bag-of-Words Language Model
Lesson 1 of 1
  1. 1

    “A bag-of-words is all you need,” some NLPers have decreed. The bag-of-words language model is a simple-yet-powerful tool to have up your sleeve when working on natural language processing (NLP)….

  2. 2

    Bag-of-words (BoW) is a statistical language model based on word count. Say what? Let’s start with that first part: a statistical language model is a way for computers to make sense o…

  3. 3

    One of the most common ways to implement the BoW model in Python is as a dictionary with each key set to a word and each value set to the number of times that word appears. Take the example below:…

  4. 4

    Sometimes a dictionary just won’t fit the bill. Topic modelling applications, for example, require an implementation of bag-of-words that is a bit more mathematical: feature vectors. A feat…

  5. 5

    Now that you know what a bag-of-words vector looks like, you can create a function that builds them! First, we need a way of generating a features dictionary from a list of training documents. We…

  6. 6

    Nice work! Time to put that dictionary of vocabulary to good use and build a bag-of-words vector from a new document. In Python, we can use a list to represent a vector. Each index in the list wil…

  7. 7

    Phew! That was a lot of work. It’s time to put […] and […] together and use them in a spam filter we created that uses a Naive Bayes classifier. We’ve slightly modified the two functions f…

  8. 8

    Amazing work! As is the case with many tasks in Python, there’s already a library that can do all of that work for you. For […] , you can approximate the functionality with the […] module’…

  9. 9

    As you can see, bag-of-words is pretty useful! BoW also has several advantages over other language models. For one, it’s an easier model to get started with and a few Python libraries already have …

  10. 10

    Alas, there is a trade-off for all the brilliance BoW brings to the table. Unless you want sentences that look like “the a but for the”, BoW is NOT a great primary model for text prediction. If t…

  11. 11

    You made it! And you’ve learned plenty about the bag-of-words language model along the way: - Bag-of-words (BoW) — also referred to as the unigram model — is a statistical language model based on w…

What you'll create

Portfolio projects that showcase your new skills

Pro Logo

How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory

Pro Logo