List of domains Tavily pulls from?

aiwebb · July 12, 2024, 5:16pm

Is there a list of domains that Tavily surfaces results from when we hit the POST /search endpoint?

We have our own internal list of domains that we consider relevant / authoritative versus junky, and I’d like to see if the overlap with Tavily’s list is enough that we can just use it as-is instead of passing in a long list to include_domains or exclude_domains.

Our internal data science team has QA standards that require us to avoid pulling from known junk domains / ones that routinely put out wrong info. And also don’t want to be passing along results to users that are going to be citing Jimbo’s Personal Blog or whatever.

carl · July 16, 2024, 6:03pm

Hello!

The Search API prioritizes trustworthy sources and in the vast majority of cases you should not see any junk domains in the results. That being said, if you want a 100% guarantee, we recommend checking the returned URLs manually against your list.

The exclude_domains parameter will not let you pass a large number of domains as it is meant to exclude specific commonly returned ones rather than filter out a list of known junk domains.

I hope that makes sense!

tabdon · December 16, 2024, 10:13pm

Hi Carl,

Are the sources a set list, or more dynamic? I’d like to understand this better so that if I’m relying it to search for something I know that the sources relevant to my searches are being considered. If the list is proprietary, can you at least share what categories of sites are on the list?

Thanks!

carl · December 17, 2024, 3:05am

Hello,

The list is dynamic. We don’t use a “hard coded” list of domains to pull from, we search the web dynamically the same way your everyday search engine does!

-Carl

alucarded · March 17, 2025, 1:03am

Hi Carl,

I was interested in finding results from Polish news websites specifically, but I could not get search results from any Polish website. Are Polish websites crawled at all?

Generally, what are the crawling limitations (domain suffixes, geolocations, languages, etc.)?

Thank you,
Tom

maitar · March 19, 2025, 8:22pm

hi @alucarded, you can use the include_domains keyword to restrict the search to a list of domains of your choice.

For example by making the following request

curl -X POST https://api.tavily.com/search
-H ‘Content-Type: application/json’
-H ‘Authorization: Bearer ’
-d ‘{
“query”: "latest news in Poland ",
“include_domains”: [“tvn24.pl” ]
}’

Hope this helps. Let us know if you have any further questions!
Maitar

alucarded · March 20, 2025, 12:51am

Thank you, @maitar.

That only works when “topic”: “general”, if I change to “topic”: “news” then there are 0 results. Looks like a bug.

maitar · March 20, 2025, 1:58pm

Hi @alucarded

Polish news websites are indeed not currently covered under the "topic": "news" category. The "news" topic primarily focuses on politics, sports, and major current events covered by mainstream media sources. If you’re looking for results from Polish news websites, the best approach would be to use "topic": "general" with your predefined list of include_domains.

Topic		Replies	Views
Include_domains returning irrelevant results when including multiple domains Bugs tavily-python	3	156	January 15, 2025
Can you make the API endpoint only return results from a specific domain? API	1	346	March 12, 2025
Include_domains not returning results when including multiple domains API	4	175	December 5, 2024
Is there a way to boost scores/ranking of domains versus fully filtering only "includes_domains"? Feature Requests	2	60	April 27, 2025
Whitelist changes Other Tavily Topics feature-request	0	102	January 17, 2025

List of domains Tavily pulls from?

Related topics