How to Automate Multi-Page Scraping for eCommerce Sites Using Tavily API?

Hi Tavily community,

I’m currently working on a project where I need to scrape product data from an eCommerce website with multiple pages (e.g., a product category with pagination). I want to avoid manually inputting each page URL and automate the process of scraping all products across multiple pages.

My key questions are:

  1. Does Tavily’s API support automated crawling of multiple pages from a base URL, like a category or product listing page, or do I need to manually provide each page URL?
  2. If Tavily supports crawling, how can I set it up to handle pagination automatically (e.g., ?page=2, ?page=3, etc.) for eCommerce websites?
  3. If Tavily does not support automated crawling, do you have any suggestions or best practices for automating link discovery (e.g., extracting product links from paginated pages) before passing them to Tavily for scraping?
  4. For JavaScript-heavy websites (e.g., infinite scrolling or dynamically loaded pages), can Tavily handle these scenarios, or should I integrate a tool like Selenium/Playwright to retrieve the full content before using Tavily?

Any guidance or advice on best practices for scaling eCommerce scraping with Tavily would be greatly appreciated!

Thanks in advance!

Hi there!

The Tavily API currently offers two endpoints:

  • /search – This is useful for searching the web using a specific search query. It is not intended to be used on a single web page
  • /extract – This is useful for extracting content from one or more URLs that you already know.

In your case, it seems like you will need a combination of these endpoints along with your own code to navigate the pages you are trying to retrieve. I recommend taking a look at our documentation and our example Jupyter notebooks as reference.

Hope this helps!

1 Like