PROJECTS

Ballot PDF Scraper

Architected and developed a system to scrape data off ballot PDF files into structured ballot content which could be QAed and used for accessible, internationalized sample ballots, augmented with voter guide information, or exported for use in other related systems.

The data was scraped from the pdfs via a scalable microservice built on Flask/Python3 consisting of blob storage (s3 or azure), and Redis for pub/sub & queue processing. Pdfminer3k’s API was used heavily in the pdf parsing and processing where an xml dump didn’t suffice.

After a pdf was analysed and processed by the scraper microservice, it was used in a custom drupal 7 module to edit and refine content into accessible, internationalized, ballot content to be used with other applications.

This system has allowed Democracylive Inc. to serve accessible ballot content to potential voters in many US states since the 2016 general elections in cases where a jurisdiction does not have ballot data Democracylive Inc. can use other than ballot .PDFs.

Technologies

  • Python
  • Flask
  • Redis
  • Azure Blob Storage
  • PHP
  • Drupal7
  • Docker
  • TypeScript
  • CSS3
  • SASS
  • AWS
  • S3 Blob Storage
  • Git