For Curated.com, a new shopping platform to find the right product with the help of an expert, I built more than 30 web scrapers to fetch, clean & upload data from different e-commerce sites. The web scrapers had to be deployed to a cloud and run at set intervals. I used Scrapy a powerful web scraping library built with python to fetch the data.
Scraping data from these websites involved:
- Understanding HTTP requests & responses used by the websites to fetch data
- Understanding the website structure to find common patterns used in the product pages
- Javascript rendering in some websites using Splash
- Finding certain data patterns using regular expressions
- Cleaning and formatting the data to match a set template for each product category
- Inserting the data to the database after fetching
- Fetching data such as price or availability on request
- Sending requests through a proxy for some websites