Gal Hadar, Tavily’s Head of Research, says the industry has to settle on a standard metric for information density.
The Web Has a New Reader
Nebius expands its Research Grants Program with Tavily’s credits. Its agentic search is reaching beyond business intelligence, and now scientists are putting it to work.

An AI agent does not read the web the way a person does. It cannot skim. It ingests the whole page, navigation bar, cookie banner and all, and every token has a price that cascades into every step of reasoning that follows.
Tavily
The pages being parsed have no edges and no order. “The web is a mess. Nothing is organized, ” says Gal Hadar, who leads research at Tavily. Imposing structure on something that large, with high recall and accuracy and without spreading the work across millions of GPUs, is most of the problem.
What that layer returns is not pages but the relevant slice of them. Ask Tavily a question and it hands back the passages that answer it, with the surrounding noise already stripped; its research endpoint goes further, running an agent that calls search repeatedly to assemble an answer. For an agent paying by the token, that pre-filtering is the whole point.
Common Tavily use cases include market intelligence, company and people research, and finance. But search is useful everywhere for every agent, and the structured layer is quickly becoming a fundamental step in automated scientific pipelines. This adoption is what prompted Nebius to add Tavily credits to its Research Grants Program, giving scientists free access to Tavily’s web access layer.
Approved for Science
The examples are piling up. Stanford’s agentic paper reviewer
FROGENT
The integration also reaches the standard toolkits developers build research agents from. This blueprint
“The first wave of AI research focused on the models themselves: pre-training, scaling, alignment, ” Hadar says. “The current and next generation is increasingly focused on information access, grounding, and agent interaction in dynamic environments.”

Density Challenge
Supporting the next generation of AI-powered research requires advances in agentic search itself. As Tavily builds a machine-readable web, its R&D team is tackling challenges that have become central to the entire field. One of them is information density: because an agent’s cost is measured in tokens, Tavily optimizes for how much usable information it can return per token.
Send a model an entire page when two sentences would do, Hadar says, and the answer is the same while the token cost and latency downstream can explode. The team optimizes for this directly, against the customer and production constraints: token efficiency to hold down cost, topic coverage for the queries customers actually send, and room for customers to bring their own models and data.
The industry has yet to settle on a standard metric for information density. Creating one requires defining consistent benchmarks for efficient answers across a wide range of queries—a problem that is as much about evaluation methodology as it is about engineering. Hadar sees it as a question of effort and standardization rather than a fundamental technical barrier.
The Leaderboard and the Bill
When Tavily launched its research endpoint, it topped Deep Research Bench, ahead of entries from OpenAI, Anthropic, Google DeepMind, and Perplexity. The result was a welcome validation of the team’s approach, but Hadar is quick to point out that leaderboards capture only part of the story.
“There’s a lot of difference between a research agent that needs to be in production and a research agent that is optimized to win a leaderboard, ” he says. A benchmark entry can burn a frontier model on every step; a production API, billed per call, cannot. “Production and high scale environments require a different balance of capability, cost, and latency than benchmark settings.”
The harder issue is that benchmarks sit still while the web moves. A static benchmark is easy to optimize against precisely because it never changes, and search is dynamic by nature. Hadar points to SealQA
And the moment retrieval has a ranking, someone games it. Researchers have already named the successor to SEO — GEO

The Question Behind the Query
“Despite major advances in ranking and semantic understanding, retrieval remains largely query-centric. The engine responds to a request but has limited visibility into what the agent is ultimately trying to achieve, ” Hadar says. Semantic search improved it, but it still serves agents exactly the way it serves humans: a query goes in, results come out, and the engine knows nothing about what the agent is actually trying to do.
The agents, meanwhile, are changing fast, and it shows in the query logs. Unlike humans, who typically search with just a few words, agents tend to issue long, highly specific queries. They frequently search for exact phrases, reformulate requests multiple times, and often use operators such as site: to target particular sources. Search is increasingly being treated as a learned behavior rather than a simple retrieval step.
What interests Hadar most is the next step: retrieval that responds to the agent’s reasoning state. A search engine built for humans receives a query and returns results. One built for agents may need more: what the agent already knows, what it is trying to verify, where it is in a multi-step task. Then retrieval becomes part of the reasoning rather than a tool call that happens before it.
A fundamental step in science is finding out what is already known. Agents now do it mid-task, at machine speed, through a retrieval layer rather than a researcher’s memory. That layer is still being built. The reader changed: the work now is making the web it can actually use.



