How to use AI Well According to Science

Intro

I previously published a blog post about how not to use AI according to science, and I wanted to follow it up with the other side: what does the research say about using AI well?

This post is based heavily on a report generated by Elicit, an AI tool used in academia to search research papers on a topic and synthesise the findings into a report. Think Claude’s Research mode meets Perplexity, but drawing on academic papers rather than the general web. If you want the full deep dive, you can read the complete Elicit report here, including the prompt I used.

Tip 1: Think About What AI Should Do

The best AI practices involve thoughtful decisions about which tasks to delegate versus which should stay with humans. Research found that experts did well when they used AI for repetitive tasks but retained control over complex work requiring nuanced understanding. For well-defined tasks like documentation, code boilerplate, and information gathering, delegating to AI proved effective and didn’t hinder skill development. Experts welcomed this offloading because it freed up cognitive resources for higher-value work.

Routine and Well-Defined Tasks Routine and well-defined tasks are where AI tends to excel – things that aren’t very bespoke and where AI has seen something similar many times in its training data. But even if something wasn’t in the training data, if it’s something you do repeatedly, it’s worth considering how AI can help.

This extends to cases where you’re doing different tasks but a key part is the same — for example, searching a long output for an error message, tracking down all the relevant parts of code, or telling an agent how to use MCP.

Prompt files and Skills (e.g. Claude Skills) have a lot of potential here. If you find yourself telling AI the same thing repeatedly, that’s a signal it could be captured in a prompt file or Skill instead.

A couple of words of warning, however:

If you’re having AI write documentation or prompt files for itself, bear in mind there’s debate about how well AI handles large prompt files, especially ones spanning multiple domains (e.g. asking it to write code and also check for bugs in the same prompt). There’s also a missed opportunity if shorter files could serve as documentation useful to developers, whereas 1,000-line prompts become a wall of text never read by anyone but AI. There may be value in cutting unnecessary context or breaking things into multiple files.

Be cautious about having too many Skills in a project, too. AI generally reads all Skill descriptions to decide what’s relevant, which can add irrelevant context to the chat. And if descriptions aren’t precise about when a Skill should be used, the AI can end up reading Skill instructions unnecessarily.

This isn’t to say large prompt files or many Skills are necessarily bad — just that it’s worth being deliberate rather than defaulting to more = better.

Iteration With Clear Boundaries Effective AI use often involves iteration, prompt refinement, and experimenting with what AI can do. Setting clear objectives before prompting tends to get the best results while also helping you learn the most as you go. Rather than viewing AI as a one-shot solution provider, the most effective users engaged in ongoing dialogue — refining, evaluating, and iterating through multiple rounds.

Tip 2: Think About What AI Shouldn’t Do

Research showed that AI tools often give an immediate performance boost, but in many cases the benefit doesn’t last. This isn’t inevitable though – the solution lies in how AI is used, rather than how much.

Scaffolding This aligns with what many developers have been finding: having a period of planning before letting AI act tends to get the best results. Many developers now use their AI tools in planning mode – where it can’t make changes – before switching to agent mode and allowing it to execute.

A similar approach I found works for me is to create the basic interfaces or methods with no code inside and then just have AI fill in the logic. Others do something similar by writing comments in the code where each method should go, telling AI what the method should do. I find this helps me stay in control, builds a mental map of the code, and avoids giving AI unnecessary context that can distract it from more important things. (Typically, if an AI is given instructions to do a broad range of things it becomes less effective and more forgetful well before the context limit is reached).

Verification The ability to evaluate what AI says is especially important because it reflects a fundamental shift in how we work. Before AI, most things had to be researched from scratch. AI changes that – but it shifts the core work from finding information to evaluating it. This means the ability to critically assess things isn’t just a skill that remains important despite AI; it’s actually more important than it was before.

Tip 3: Think About What Makes You Feel Engaged When Using AI

Whether AI is beneficial or harmful in the long term depends less on how much it’s used and more on how it’s used. The best outcomes happen when people are mentally engaged — when AI provides explanations rather than just answers, and when the person is actively involved in the initial problem-solving and planning, even if they later get stuck and lean more heavily on AI.

For me, I recently switched from Cursor back to Rider with Claude Code. I find I’m more engaged with that setup because I can navigate the code better, and Cursor would often try to do more than I asked, which left me spending most of my time reviewing and reining it in rather than feeling like I was working with it. That’s what helps me feel engaged, but it’ll be different for everyone. It’s worth giving some thought to what makes you feel engaged when using AI. That may have more lasting benefits than just learning the latest tool.

Tip 4: Self-Confidence Can Be Beneficial

Headlines often focus on AI bolstering people’s egos and reinforcing their existing beliefs. There’s truth to this. But what doesn’t get mentioned is the flip side – some people see AI disagree with them and assume AI must be right, either because they lack confidence generally or because it’s a topic they don’t know much about.

The research shows that having confidence in your own abilities and opinions makes people better at critical thinking and more engaged when using AI. So if you catch yourself thinking AI is probably right because you don’t know much about a topic, press pause. Your instincts may be picking up on something that needs more investigation.

This actually challenged an assumption I’d held about AI. I’d tended to assume that deep knowledge of things is gradually becoming less important as the shift moves toward AI handling the detail and humans handling higher-level thinking. The research suggests that’s not quite right – domain knowledge still matters because it’s what lets you push back when AI is wrong.

Tip 5: Think About Centaurs and Cyborgs

Some researchers describe two modes of working with AI: “centaur behaviour” and “cyborg behaviour.”

The centaur approach involves carefully dividing tasks between yourself and AI, saving higher-risk decisions for human oversight. The cyborg approach has humans and AI collaborating closely through iterations of drafting and refinement. Researchers suggest the cyborg approach can be good for clearly defined, low-risk tasks, and encourage people to get skilled at switching between modes as needed.

This reminds me of a recent team discussion. Some people had noticed that PR reviews were taking longer than in the past, because developers were relying heavily on AI in the earlier stages but then reviewing code largely by hand. This meant PRs were often in a lower-quality state and needed more review rounds before they were ready to deploy.

We discussed changing our approach: having people more involved in the early planning stages of a task, and making sure we still discuss things as a team when needed. That then frees us to use AI more effectively in the areas where it does well – like writing code against a clearly defined plan.

Tip 6: Counterfactuals

This wasn’t mentioned in the Elicit report, but I’ve found it so helpful that I have to mention it.

Counterfactuals are sometimes called “what-if” questions – they ask “under what scenario would you change your mind?” In my experience, asking AI this can be surprisingly powerful, especially when you also ask what would confirm its suggestions.

Here’s an example where I asked AI to suggest state management options for a React app. For each suggestion it added a counterfactual (“this changes if”) that mentions important caveats which could influence what the best decision is. This is valuable on its own, but there’s also research – and my personal experience – showing that asking AI for counterfactuals makes people feel more engaged, which increases both accuracy and enjoyment.

Prompt example

My team is working on a website which makes and sells drum cymbals. The React UI site is getting larger and we are wondering about adding state management as we don’t currently have that. What options do you suggest (ordered by what you think is most likely to be the best solution)?

Include things which would confirm your suggestions and counterfactuals (when your suggestions would change)

AI Response

Zustand — This is probably your best starting point. It’s lightweight, has minimal boilerplate, and doesn’t require wrapping your app in providers. You define stores as simple hooks, which makes it very intuitive if your team already knows React hooks. For a cymbal e-commerce site (product catalog, cart, filters, maybe user auth), Zustand handles this comfortably without overcomplicating things.

This is confirmed if: your state needs are moderate (cart, filters, auth, UI state), your team values simplicity, and you don’t have deeply nested component trees that all need to react to the same global state simultaneously.

This changes if: you have very complex async data-fetching needs (in which case, pair it with React Query — see below), or your app grows to dozens of developers needing strict patterns and conventions.

Redux Toolkit (RTK) — The “industry standard” but heavier. RTK has dramatically reduced Redux’s old boilerplate problem, and RTK Query handles server state well. It gives you strict patterns, excellent devtools, and a huge ecosystem.

This is confirmed if: your team is large, you need enforced conventions, you want the best devtools story, or some team members already know Redux.

This changes if: your team is small and you value speed of development — Redux still has more ceremony than Zustand or Jotai, even with RTK.

React Context + useReducer — The built-in option with zero dependencies. Fine for small amounts of shared state (auth, theme), but it re-renders all consumers on any change to the context value, which becomes a performance issue as state grows.

This is confirmed if: you only need to share a few small pieces of state and want zero added dependencies.

This changes if: you’re putting cart state, product filters, and UI state all in context — you’ll likely hit performance issues and end up refactoring toward one of the above anyway.