OpenAI launches GPT-4o mini, which will replace GPT-3.5 in ChatGPT

Lower-cost AI language model will be free for ChatGPT users.

By: arstechnica.com

Jul 19 2024
0
0 Views

OpenAI launches GPT-4o mini, which will replace GPT-3.5 in ChatGPT

On Thursday, OpenAI announced the launch of GPT-4o mini, a new, smaller version of its latest GPT-4o AI language model that will replace GPT-3.5 Turbo in ChatGPT, reports CNBC and Bloomberg. It will be available today for free users and those with ChatGPT Plus or Team subscriptions and will come to ChatGPT Enterprise next week.

Performance

Predictably, OpenAI says that GPT-4o mini performs well on an array of benchmarks like MMLU (undergraduate level knowledge) and HumanEval (coding), but the problem is that those benchmarks don't actually mean much, and few measure anything useful when it comes to actually using the model in practice. That's because the feel of quality from the output of a model has more to do with style and structure at times than raw factual or mathematical capability. This kind of subjective "vibemarking" is one of the most frustrating things in the AI space right now.

Enlarge / A graph by OpenAI shows GPT-4o mini outperforming GPT-4 Turbo on eight cherry-picked benchmarks.
OpenAI

So we'll tell you this: OpenAI says the new model outperformed last year's GPT-4 Turbo on the LMSYS Chatbot Arena leaderboard, which measures user ratings after having compared the model to another one at random. But even that metric isn't as useful as once hoped in the AI community, because people have been noticing that even though mini's big brother (GPT-4o) regularly outperforms GPT-4 Turbo on Chatbot Arena, it tends to produce dramatically less useful outputs in general (they tend to be long-winded, for example, or perform tasks you didn't ask it to do).

The value of smaller language models

OpenAI isn't the first company to release a smaller version of an existing language model. It's a common practice in the AI industry from vendors such as Meta, Google, and Anthropic. These smaller language models are designed to perform simpler tasks at a lower cost, such as making lists, summarizing, or suggesting words instead of performing deep analysis.

OpenAI’s head of API product, Olivier Godement, told Bloomberg, "In our mission to enable the bleeding edge, to build the most powerful, useful applications, we of course want to continue doing the frontier models, pushing the envelope here. But we also want to have the best small models out there."

Smaller large language models (LLMs) usually have fewer parameters than larger models. Parameters are numerical stores of value in a neural network that store learned information. Having fewer parameters means an LLM has a smaller neural network, which typically limits the depth of an AI model's ability to make sense of context. Larger-parameter models are typically "deeper thinkers" by virtue of the larger number of connections between concepts stored in those numerical parameters.

However, to complicate things, there isn't always a direct correlation between parameter size and capability. The quality of training data, the efficiency of the model architecture, and the training process itself also impact a model's performance, as we've seen in more capable small models like Microsoft Phi-3 recently.

Fewer parameters mean fewer calculations required to run the model, which means either less powerful (and less expensive) GPUs or fewer calculations on existing hardware are necessary, leading to cheaper energy bills and a lower end cost to the user.