How do I strip the extra irrelavent text from Extract Endpoint results?

The raw content returned by Tavily’s extract endpoint include a lot of text that is not related to page content, like menu text, ads, etc. How can extract the main body text from this? I mean, I used readability lib to get the article from pure html, but Tavily’s result is already text, so readability does not work. Is there any tool that you recommend for this task?
Or is there an option in request that I can use, to get more clean/usable text?

Hey!

Thanks for reaching out.

We’re always working to improve the raw content and ensure it’s returned as clean as possible.
One approach is LLM-based cleaning, where you can prompt an LLM to extract only the main body text and remove menus, ads, and footers.

If you notice specific domains or sites causing issues, feel free to send them over, and we’d be happy to look into it!

Best,
May