Where are we heading in the world of AI?

Roger Basler de Roca
4 min readOct 6, 2024

--

As AI integrates into critical sectors, ensuring its responsible development and deployment has become more crucial than ever.

The report underscores the importance of privacy, data governance, transparency, security, safety, and fairness, noting the ongoing challenges such as obtaining informed consent for data use in training large language models, maintaining privacy without compromising utility, and achieving fairness despite the lack of a universal definition.

Moreover, AI’s contribution to science and medicine is nothing short of revolutionary. From enhancing weather forecasting and discovering new materials to transforming healthcare with advanced diagnostic tools, AI’s potential to benefit humanity is clear.

Yet, alongside this optimism, there remains a cautious awareness of potential risks, such as job displacement and misuse.

Key Trends and Developments in AI Technical Performance in 2023

Here are the key trends and developments in the technical performance of AI systems in 2023, based on the provided excerpt from the Artificial Intelligence Index Report 2024:

  • AI Exceeds Human Performance on Some Tasks: AI has surpassed human performance on benchmarks for tasks such as image classification, visual reasoning, and English language understanding. However, AI still lags behind humans in more complex areas like high-level mathematics, visual commonsense reasoning, and planning.
  • The Rise of Multimodal AI: AI systems have traditionally been limited to specific domains (e.g. language models for text, computer vision models for images). However, recent advancements have led to powerful multimodal models like Google’s Gemini and OpenAI’s GPT-4. These models demonstrate flexibility in processing various input forms, including images, text, and even audio in certain cases.
  • Development of More Challenging Benchmarks: The performance of AI models has begun to plateau on established benchmarks such as ImageNet, SQuAD, and SuperGLUE. This has led researchers to develop more challenging benchmarks to accurately assess the capabilities of new AI models. 2023 saw the emergence of new benchmarks like SWE-bench (coding), HEIM (image generation), MMMU (general reasoning), MoCa (moral reasoning), AgentBench (agent-based behavior), and HaluEval (hallucinations).
  • AI-Generated Data for AI Improvement: New AI models like SegmentAnything and Skoltech are being used to create specialised data for tasks like image segmentation and 3D reconstruction. This is significant because data is crucial for AI advancement. Using AI to generate more data not only enhances current AI capabilities but also paves the way for future algorithmic improvements, particularly in tackling more difficult tasks.
  • Increasing Importance of Human Evaluation: With generative models producing increasingly high-quality outputs (text, images etc.), benchmarking is shifting towards incorporating human evaluations. Examples include assessments like the Chatbot Arena Leaderboard, rather than relying solely on computer-based rankings like ImageNet or SQuAD. Public perception of AI is also becoming a more significant factor in tracking AI progress.
  • More Flexible Robots Through LLMs: The integration of language modelling with robotics has resulted in more adaptable robotic systems, such as PaLM-E and RT-2. These models exhibit enhanced robotic capabilities and can ask questions. This development marks a significant step towards robots that can interact more effectively with the real world.
  • Advancements in Agentic AI Research: Creating AI agents (systems capable of autonomous operation in specific environments) has been a longstanding challenge. However, recent research indicates progress in the performance of autonomous AI agents. These agents are now capable of mastering complex games like Minecraft and performing real-world tasks such as online shopping and research assistance.
  • Superior Performance of Closed LLMs: Closed large language models have demonstrated significantly better performance compared to open-source models. In evaluations across 10 AI benchmarks, closed models outperformed their open counterparts with a median performance advantage of 24.2%. This performance disparity has important implications for discussions and decisions regarding AI policy.
  • Notable Model Releases in 2023: The report highlights significant model releases in 2023, including:
  • Gemini Ultra: The first LLM to achieve human-level performance on the Massive Multitask Language Understanding (MMLU) benchmark.
  • GPT-4: A high-performing model with an estimated training cost of $78 million in compute resources.
  • Gemini Ultra: Another high-performing model, requiring an estimated $191 million in compute resources for training.

These trends reveal significant progress in AI technical performance throughout 2023, marked by advancements in areas like multimodal capabilities, challenging benchmarks, and the integration of AI into robotics and other domains. The report also underscores the increasing focus on responsible AI development, particularly concerning the disparity in performance between closed and open models and the need for standardised evaluations to assess the robustness and safety of these systems.

The report can be found online here as well as a PDF here.

Photo by Igor Omilaev on Unsplash

Disclaimer: This is a non commercial summary and presentation of the “The AI Index 2024 Annual Report,” AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2024 by Nestor Maslej, Loredana Fattorini, Raymond Perrault, Vanessa Parli, Anka Reuel, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Juan Carlos Niebles, Yoav Shoham, Russell Wald, and Jack Clark,

--

--

Roger Basler de Roca
Roger Basler de Roca

Written by Roger Basler de Roca

Over 25 years of experience in IT and AI, runs an AI consultancy, gives 100 talks/year, speaks 6 languages, currently doing a PhD.

No responses yet