Categories
Python

Scrapy: Scraping each type of Item to it’s own collection in mongodb

I am using Scrapy and I have two different Items. I want to store entries for each specific item to it’s own mongo collection. For example, let’s assume this is what I have in the items.py file:

I want to store Student items to student collection and Course items to course collection. How do we do that?

If you have used Scrapy before, you already know that for storing data, we use Pipelines. Here’s our own MongoPipeline that stores items to their own collection:

So this is what’s happening:

  • We’re using PyMongo as the mongodb driver
  • I have the MongoDB related configurations to settings. I am getting them and constructing a mongodb client. I am also selecting the database based on a setting
  • In the process_item function, we are getting the type of the item and lowering it’s name. This type name would serve as the mongodb collection name for us.
  • We are inserting the item. We’re calling dict() on the item to get a dictionary representation which we can directly save using PyMongo.

That’s it. Now if you run your spiders, items of each type will go to it’s own collection on mongodb.