For Curated.com, a new shopping platform to find the right product with the help of an expert, I built more than 30 web scrapers to fetch, clean & upload data from different e-commerce sites. The web scrapers had to be deployed to a cloud and run at set intervals. I used Scrapy a powerful web scraping library built with python to fetch the data.

 

Scraping data from these websites involved:

  • Understanding HTTP requests & responses used by the websites to fetch data
  • Understanding the website structure to find common patterns used in the product pages
  • Javascript rendering in some websites using Splash
  • Finding certain data patterns using regular expressions
  • Cleaning and formatting the data to match a set template for each product category
  • Inserting the data to the database after fetching
  • Fetching data such as price or availability on request
  • Sending requests through a proxy for some websites

Gallery