Dr Marcel Scharth
OpenAI, the artificial intelligence (AI) research company behind ChatGPT and the DALL-E 2 art generator, has unveiled the highly anticipated GPT-4 model. Excitingly, the company also made it听听to the public through a paid service.
GPT-4 is a large language model (LLM), a neural network trained on massive amounts of data to understand and generate text. It鈥檚 the successor to GPT-3.5, the model behind ChatGPT.
The GPT-4 model introduces a range of enhancements over its predecessors. These include more creativity, more advanced reasoning, stronger performance across multiple languages, the ability to accept visual input, and the capacity to handle significantly more text.
More powerful than the wildly popular ChatGPT, GPT-4 is bound to inspire an in-depth exploration of its capabilities and further accelerate the adoption of generative AI.
Among many听听highlighted by OpenAI, what immediately stands out is GPT-4鈥檚 performance on a range of standardised tests. For example, GPT-4 scores among the top 10% in a simulated US bar exam, whereas GPT-3.5 scores in the bottom 10%.
This table from the OpenAI technical report shows the performance of the model on a range of simulated standardised tests. GPT-4 often performs in the top 20 per cent range. Source: OpenAI
GPT-4 also outperforms GPT-3.5 on a range of writing, reasoning and coding tasks. The following听听illustrate how GPT-4 displays more reliable commonsense reasoning than GPT-3.5.
Another significant development is that GPT-4 is multimodal, unlike previous GPT models. This means it accepts both text and image inputs.
Samples provided by OpenAI reveal GPT-4 is capable of interpreting images, explaining visual humour and providing reasoning based on visual inputs. Such skills are beyond the scope of previous models.
GPT-4 can explain the meaning behind funny memes. Source: OpenAI
This ability to 鈥渟ee鈥 could provide GPT-4 a more comprehensive picture of how the world works 鈥 just as humans acquire enhanced knowledge through observation. This is thought to be an important ingredient for developing sophisticated AI that could bridge the gap between current models and human-level intelligence.
In fact, GPT-4 isn鈥檛 the first language model with these capabilities. A few weeks ago, Microsoft released听, a language model that accepts visual inputs the same way GPT-4 does. Google also recently expanded its听听language model to be able to take in image data and sensor data collected from robots. Multimodality is a growing trend in AI research.
GPT-4 can take in and generate up to 25,000 words of text, which is much more than ChatGPT鈥檚 limit of about 3,000 words.
It can handle more complex and detailed prompts, and generate more extensive pieces of writing. This allows for richer storytelling, more in-depth analysis, summaries of long pieces of text and deeper conversational interactions.
GPT-4 answers a question relating to a Wikipedia article on artificial intelligence.听Author provided
In the example above, I gave the new ChatGPT (which uses GPT-4) the entire Wikipedia article about artificial intelligence and asked it a specific question, which it answered accurately.
Even though the GPT-4听听controversially provides no details about how the model was developed, all signs indicate it鈥檚 essentially a scaled-up version of GPT-3.5 with safety improvements. In other words, it鈥檚 not a new paradigm in AI research.
OpenAI has itself said GPT-4 is subject to the听听as previous language models, such as being prone to reasoning errors and biases, and making up false information.
That said, OpenAI鈥檚 results on GPT-4 suggest it鈥檚 at least more reliable than previous GPT models.
OpenAI used human feedback to fine-tune GPT-4 to produce more helpful and less problematic outputs. GPT-4 is much better at declining inappropriate requests and avoiding harmful content when compared to the initial ChatGPT release.
Its arrival will continue a听. That being whether alternative approaches are required to fundamentally solve issues of truthfulness and reliability, or听听throwing more data and resources at language models will eventually do the job.
One could argue GPT-4 represents only an incremental improvement over its predecessors in many practical scenarios. Results showed human judges preferred GPT-4 outputs over the most advanced variant of GPT-3.5 only about 61% of the time.
GPT-4 also shows no improvement over GPT-3.5 in some tests, including English language and art history exams.
Soon after GPT-4鈥檚 launch, Microsoft听听its highly controversial Bing chatbot was running on GPT-4 all along. The announcement听听by commentators who noticed it was听听than ChatGPT.
This means Bing provides an听听to leverage GPT-4, since it鈥檚 a search engine rather than just a chatbot.
However, as anyone looped in on AI news knows, Bing started to go a bit crazy. But I don鈥檛 think the new ChatGPT will follow since it seems to have been heavily fine-tuned using human feedback.
In its technical report, OpenAI shows how GPT-4 can indeed go completely off the rails without this human feedback training.
One notable aspect of GPT-4鈥檚 release has been that, in addition to Bing, it鈥檚 already being used by companies and organisations such as听,听,听,听听and the听听to build new services and tools.
Its commercial deployment will further heat up competition between major AI labs, and fuel听听for generative technologies.
This article was first published in听听as听'Evolution not revolution: why听GPT-4听is notable, but not听groundbreaking'.听Dr Marcel Scharth is听a Lecturer in Business Analytics at the University of Sydney Business School.