Research Grants

The Web Has a New Reader

Nebius expands its Research Grants Program with Tavily’s credits. Its agentic search is reaching beyond business intelligence, and now scientists are putting it to work.

By Linda Petrini

2026-06-23

6 mins to read

An AI agent does not read the web the way a person does. It cannot skim. It ingests the whole page, navigation bar, cookie banner and all, and every token has a price that cascades into every step of reasoning that follows.

Tavily, the web access layer for AI agents that Nebius acquired earlier this year, is rebuilding the internet for the new reader, providing agents with relevant context. It crawls the human web, indexes it, filters spam, and lays a structured, machine-readable layer over the sprawling original. Most of the work happens offline. By the time a query arrives, the structure is already there, so the system rarely has to parse a document in real time.

The pages being parsed have no edges and no order. “The web is a mess. Nothing is organized, ” says Gal Hadar, who leads research at Tavily. Imposing structure on something that large, with high recall and accuracy and without spreading the work across millions of GPUs, is most of the problem.

What that layer returns is not pages but the relevant slice of them. Ask Tavily a question and it hands back the passages that answer it, with the surrounding noise already stripped; its research endpoint goes further, running an agent that calls search repeatedly to assemble an answer. For an agent paying by the token, that pre-filtering is the whole point.

Common Tavily use cases include market intelligence, company and people research, and finance. But search is useful everywhere for every agent, and the structured layer is quickly becoming a fundamental step in automated scientific pipelines. This adoption is what prompted Nebius to add Tavily credits to its Research Grants Program, giving scientists free access to Tavily’s web access layer.

Approved for Science

The examples are piling up. Stanford’s agentic paper reviewer, built by Yixing Jiang and Andrew Ng, grounds its critiques by querying Tavily and pulling back only titles, authors, and abstracts. It reads that thin layer, decides which papers matter, and only then spends the tokens to fetch the few that do in full: expensive attention rationed against a cheap retrieval layer. Hadar points to it as a proof of concept that the structured layer is good enough to anchor real scientific work.

FROGENT, a drug-design agent, equips its retrieval layer with web search powered by Tavily alongside arXiv and PubMed, so it can validate findings against the open web and stay current with literature that curated databases miss.

The integration also reaches the standard toolkits developers build research agents from. This blueprint for biomedical research agents uses Tavily for its web-search step, letting an agent check the live web alongside a lab’s private data. At GTC 2026, that setup was shown running on Nebius’s own infrastructure.

“The first wave of AI research focused on the models themselves: pre-training, scaling, alignment, ” Hadar says. “The current and next generation is increasingly focused on information access, grounding, and agent interaction in dynamic environments.”

Gal Hadar, Tavily’s Head of Research, says the industry has to settle on a standard metric for information density.

Density Challenge

Supporting the next generation of AI-powered research requires advances in agentic search itself. As Tavily builds a machine-readable web, its R&D team is tackling challenges that have become central to the entire field. One of them is information density: because an agent’s cost is measured in tokens, Tavily optimizes for how much usable information it can return per token.

Send a model an entire page when two sentences would do, Hadar says, and the answer is the same while the token cost and latency downstream can explode. The team optimizes for this directly, against the customer and production constraints: token efficiency to hold down cost, topic coverage for the queries customers actually send, and room for customers to bring their own models and data.

The industry has yet to settle on a standard metric for information density. Creating one requires defining consistent benchmarks for efficient answers across a wide range of queries—a problem that is as much about evaluation methodology as it is about engineering. Hadar sees it as a question of effort and standardization rather than a fundamental technical barrier.

The Leaderboard and the Bill

When Tavily launched its research endpoint, it topped Deep Research Bench, ahead of entries from OpenAI, Anthropic, Google DeepMind, and Perplexity. The result was a welcome validation of the team’s approach, but Hadar is quick to point out that leaderboards capture only part of the story.

“There’s a lot of difference between a research agent that needs to be in production and a research agent that is optimized to win a leaderboard, ” he says. A benchmark entry can burn a frontier model on every step; a production API, billed per call, cannot. “Production and high scale environments require a different balance of capability, cost, and latency than benchmark settings.”

The harder issue is that benchmarks sit still while the web moves. A static benchmark is easy to optimize against precisely because it never changes, and search is dynamic by nature. Hadar points to SealQA, a Virginia Tech benchmark that tags each question by how fast its answer decays and updates itself over time. For a system built on a precomputed picture of the web, knowing which facts go stale, and how fast, is the live problem.

And the moment retrieval has a ranking, someone games it. Researchers have already named the successor to SEO — GEO: generative engine optimization — and shown that citations, statistics, and authoritative phrasing can boost visibility in AI-generated answers by up to forty percent.

The Question Behind the Query

“Despite major advances in ranking and semantic understanding, retrieval remains largely query-centric. The engine responds to a request but has limited visibility into what the agent is ultimately trying to achieve, ” Hadar says. Semantic search improved it, but it still serves agents exactly the way it serves humans: a query goes in, results come out, and the engine knows nothing about what the agent is actually trying to do.

The agents, meanwhile, are changing fast, and it shows in the query logs. Unlike humans, who typically search with just a few words, agents tend to issue long, highly specific queries. They frequently search for exact phrases, reformulate requests multiple times, and often use operators such as site: to target particular sources. Search is increasingly being treated as a learned behavior rather than a simple retrieval step.

What interests Hadar most is the next step: retrieval that responds to the agent’s reasoning state. A search engine built for humans receives a query and returns results. One built for agents may need more: what the agent already knows, what it is trying to verify, where it is in a multi-step task. Then retrieval becomes part of the reasoning rather than a tool call that happens before it.

A fundamental step in science is finding out what is already known. Agents now do it mid-task, at machine speed, through a retrieval layer rather than a researcher’s memory. That layer is still being built. The reader changed: the work now is making the web it can actually use.