Sections

Research

Large Language Models learn to collaborate and reason

A December 2024 update on generative AI

January 31, 2025


Key takeaways:

  • Large Language Models (LLMs) have made remarkable advances in reasoning capabilities and as collaborative tools
  • New workspaces for interactive LLM collaboration are transforming how researchers work with AI
  • LLM-powered search is becoming increasingly reliable for research
  • This paper demonstrates over three dozen practical applications of LLMs across seven domains of research
Smartphone with AI chatbot applications laying on a keyboard.
Shutterstock / Tada Images
Editor's note:

This paper was originally published by the Journal of Economic Literature in December 2024. It represents an end-of-2024 update of the 2023 report on “Generative AI for Economic Research.”

Executive summary

Large Language Models (LLMs) have seen remarkable advances in speed, cost efficiency, accuracy, and the capacity to process larger amounts of text over the past year, enabling more advanced use cases. Three developments stand out in recent months:

First, new reasoning capabilities, exemplified by OpenAI’s o1 series, are helping to overcome traditional barriers in LLM-based reasoning, enabling AI models to engage in multi-step problem-solving and logical deduction. This advancement opens new avenues for LLM use in research, particularly in areas requiring complex analysis.

Second, workspaces for interactive collaboration, such as Copilot, Claude’s Artifacts, and ChatGPT’s Canvas, are changing how we interact with LLMs. These workspaces create an environment where users can iteratively develop and refine ideas, shifting away from static chat-style interactions towards a more dynamic, document-oriented collaboration with AI. They allow users to work in tandem with LLMs, offering real-time feedback and allowing for iterative editing.

Third, LLM-powered search, newly integrated into ChatGPT and Gemini, and also offered by startups like Perplexity, is starting to become a useful tool to provide up-to-date answers to questions that are grounded in facts found on the internet, together with the requisite citations—a crucial capability for researchers.

The paper demonstrates these advances through dozens of practical examples across seven domains: ideation and feedback, writing, background research, coding, data analysis, mathematical derivations, and—newly added—research promotion. The latter includes innovative use cases in automatically generating blog posts, presentation slides, and even podcasts.

Each frontier AI lab has released new models in recent months. Google DeepMind and OpenAI and rank first on a range of LLM benchmarks with updated model versions. Elon Musk’s xAI has shot to the #3 spot. Claude 3.5 Opus excels at writing-related tasks and several other benchmarks. Excellent open-source LLMs from Meta and Alibaba are close in capabilities to the models of the other four labs and offer the greatest level of data security.

These developments have significant implications for research, offering unprecedented opportunities for increased productivity. However, they also raise important questions about the future of research methodology and evaluation. As the production of research becomes easier with AI assistance, the bottleneck may shift from generation to assessment of ideas and results. This highlights the need to develop robust methods for evaluating AI-augmented content while ensuring that these powerful tools enhance rather than diminish the quality and impact of our work.

Download full text PDF

Click here for the Dec. 2023 version of the paper and here for additional resources.

The Brookings Institution is committed to quality, independence, and impact.
We are supported by a diverse array of funders. In line with our values and policies, each Brookings publication represents the sole views of its author(s).