The Deep Research Trap
Why AI sometimes sounds like a junior consultant trying too hard
I’ve been using Gemini’s Deep Research feature to better understand my readers.
The idea was simple. I wanted to know what people worry about when they use AI. What they’re trying to get done. How are they coping with the change? If I knew that, I could write articles that are useful and relevant, instead of guessing.
So I asked:
I have a Substack newsletter about AI.
I want to conduct deep research on topics that most people are currently interested in, especially those who want to learn about AI and how to use it, but may struggle to find the right resources.
I want to talk about the latest uses for AI, in plain English. Not too technical. I want to go beyond “prompting” and really use AI as a thinking companion, an expert critic, and a knowledgeable counsellor. We can still talk about effective prompting, but by now, users should be engaging AI to write prompts and even build complex cases and projects.
Can you search the internet, especially social media, to find out what attracts readers? Then, curate a list of AI-related subjects and topics I can develop into interesting articles and eBooks.
Gemini returned with an 8,000-word document titled “The Algorithmic Apothecary and the Synthetic Strategist: A Comprehensive Framework for Post-Prompting AI Integration.”
What was that? What the hell is “Algorithmic Apothecary”? It read like programme notes for a scientific-philosophy conference.
Section headings discussed algorithmic companionship and synthetic strategists. Footnotes cited Reddit threads as research, and a supposed “framework” was presented as established. I learned little about my actual question, but I did find mention of Agent-Native Infrastructure.
Initially, I was impressed. Lots of citations. Really deep.
By the sixth time, I realised I never used any of it.
That’s the Deep Research trap. Outputs may look impressive, making you feel the tool is working. The real test is whether you can use what you get. Most of the time, you can’t.
Why Deep Research does this
Deep Research, as a category of AI function, has a structural bias toward looking thorough. The tool is rewarded during its training for producing outputs that feel comprehensive: long, structured, heavily cited, and written in the cadence of academic synthesis. That “feature” is hard-coded.
It doesn’t read your question and decide what depth is appropriate. It runs the same pattern every time. Scan widely. Organise into pillars. Name the pillars in capital letters. Cite forty sources. Conclude by restating the introduction.
This is not unique to Gemini. Every AI company building a research feature has trained it the same way, because the people evaluating these tools also confuse thoroughness with usefulness. Long output looks like hard work. Citations look like “effort”. A nine-section report feels more than a one-page summary, even when the 1-pager actually answers the question.
The output isn’t designed to help as much as to impress. Which is exactly how a junior consultant behaves in their first months. They write the 100-page report because they’re not yet confident enough to send the 1-page email.
What Deep Research gets wrong
Three failure modes I’ve noticed repeat across different tools.
It invents frameworks. Deep Research will confidently present a “TARGET framework” or a “Persona-Task-Constraint model” as though these are established concepts. Sometimes they are. Often, they’re synthesised on the spot and packaged to sound authoritative. If you build content on these, you’ll eventually cite something that doesn’t exist. So be careful.
It has no judgment about the audience. I asked about a newsletter for non-technical professionals aged 50 and up. The tool suggested I write about commercial space logistics and the lunar economy. Not because it fit my reader profile or because I had read something about lunar mining and decided it was interesting. Deep Research can identify trends. It cannot tell whether a trend belongs in your life.
The prose is contagious. This one is real. I fell for it too. Read enough of these reports and your own writing starts to drift. The reflex to organise every thought into threes. McKinsey’s MECE comes to mind. Mutually Exclusive, Collectively Exhaustive. While it is a good practice when you’re working in the corporate world, it’s not quite useful when writing a blog for everyday folks. Sounds confident but says very little. If you read Deep Research outputs too often, you’ll start to absorb a style you don’t actually want.
What Deep Research is actually good at
Deep Research is excellent at one job: gathering raw material. It can scan 200 websites and documents in minutes and bring back examples, quotes, and pain points you wouldn’t have found on your own. That’s very useful.
But it is weak at curating and synthesising that material for a specific audience. Those require judgment about who you’re writing for, what they care about, and what you’d want to say. The tool doesn’t seem to know any of that.
The mistake I was making, and that I suspect many are making, was asking Deep Research to do all three jobs at once. Find the material. Curate it. Tell me what to do. The first job can be done exceptionally well. The other two are mine, and mine alone.
The two levers
Once I stopped expecting Deep Research to think for me, I started getting useful output. Two levers do most of the work.
Level one: the prompt. Ask for inputs you can verify, not conclusions you somehow have to trust. Instead of “research what my audience cares about and recommend a strategy,” try “find specific examples of people aged 50 and up describing how they use AI in their daily work. Quote them directly. Give me the source for each quote.” The first prompt generates a strategy that you can’t check. The second produces a pile of raw material you can read and decide.
Other prompts that work better:
Ask for the most-discussed complaints about a topic, in the actual language people use, with sources. Ask for examples of a specific pattern playing out, with names and dates. Ask for what’s growing fast in a category, with rough audience numbers. Tell me what, not why.
Ask for evidence, not synthesis. You do the synthesis. That’s the part the tool can’t do well anyway.
Level two: the style sheet. Most people don’t know this exists. In Claude, ChatGPT, and Gemini, you can set persistent instructions for how the AI should write back to you. Short paragraphs. No em dashes. No corporate jargon. No “comprehensive overview” introductions. No three-bullet summaries at the end of every response.
While a style sheet won’t fix bad thinking or teach AI to make judgments about your audience, it does control the voice that AI uses with you. After I set mine up, the outputs became shorter, clearer, and more useful, even when the underlying question remained the same.
I wrote about this previously here: What is a Style Sheet.
Most AI advice focuses on prompts. Few talk about style sheets. The people getting the best results from these tools are doing both. Shaping what the AI does, and instructing it on how to talk to them.
The wider lesson
Every AI tool has a default failure mode. ChatGPT hedges. Claude over-explains. Deep Research, in any form, overperforms. The tools aren’t bad. They’re optimised for things that aren’t always what you need.
The skill is recognising what each tool defaults to doing badly, and prompting against the grain. That’s an editorial skill. A human skill. You’re the one with judgment about your work and your readers. The AI is the junior consultant who doesn’t yet know how to write a one-page memo.
Your job is to teach it.

