Does Tavily Extract return the same output as raw_content in Tavily Search?

I’ve seen the framework of using both tavily_search and tavily_extract, but wasn’t sure if extract provides an additional benefit if tavily_search already returns content and raw_content.

Can someone let me know if there’s a difference between the returned raw_content?

Hello there!

The Tavily Extract provides additional benefits over the raw content returned by Tavily Search, especially when you need more customized extractions. While Tavily Search returns raw content, Tavily Extract allows you to fine-tune the extraction process by specifying additional keywords.

For example, one of the parameters available in Tavily Extract is include_images, which, when set to True, will return a list of images extracted from the URLs in the response.
You can check out the full documentation for Tavily Extract here.

If you are performing multiple calls to Tavily Search, it might be more efficient to extract raw content only from the most relevant URLs. Calling Tavily Search with include_raw_content will return content for all results, which can increase latency and may not always be necessary.
By separating the extraction into another step with Tavily Extract, you can focus on only the most relevant search results and potentially reduce the load on your system.

Let me know if you need further clarification!

Best,
May

I actually noticed that for websites behind a login, the search method with the ‘raw_content’ flag is able to scrape the page, but the extract method is not. Unless you run the extract call after searching with raw_content enabled - then it returns a result too, probably from a cache.

E.g. search (with raw_content=False) for a person’s name where the first result is the person’s LinkedIn page.

Then try to extract the content of that page using an extract call. You’ll get “Access denied: Unable to retrieve content from the specified URL”.

However, if you search with raw_content=True, you do nicely get the scraped LinkedIn page’s raw_content.

Hey!

We are continuously working to improve the data behind the scenes, which is why you might have seen the LinkedIn page in the second attempt. In cases where the basic extract fails, the advanced extract method should work and allow you to retrieve the content successfully. You can refer to the documentation for more details on how to use the advanced extract.

Let me know if you have more questions!

Best,
May