POSTS
Craigslist Auto Ad Scraping into a Google Sheets Document
Let me just be up front about the topic: Craigslist is not fond of people scraping their site. They purposely make their site annoying to scrape (well, that or they are terrible at making decent templates), and it’s against their TOS, and if you do it and make money, they will very likely sue you as they have many others. You have been warned. Tread carefully.
All that said, buying a car isn’t fun. Comparing hundreds/thousands of Craigslist ads about cars is also not fun. Keeping track of all them is just as bad. Even with RSS feeds, email notifs, and all the other tools out there to help with this, I still felt the situation was pretty terrible. I thought it would be nice if I could get a spreadsheet periodically updated with new/changed listings without having to do any copy-paste and just have the information in front of me from the ads I really cared about. So I decided to scrape CL anyway, since for my use it was pretty low volume, and I wasn’t going to be making any money off of it.
The technology and tools I used to do this are:
- Python 3 (in a virtualenv)
- Scrapy
- Google Drive API
- Google Sheets API
I considered using something akin to PhantomJS for this project, but realized that CL doesn’t really have anything “fancy” I need to worry about in their UI. It’s certainly not a JS-dependent SPA or anything close to it.