OpenAI announces GPT-4 model that can use images and text as input
OpenAI has announced the latest version of its GPT language model, GPT-4. The main innovation of the new version is that text and images can serve as input. The GPT language model forms the basis for AI chatbots such as ChatGPT and the new Bing.
OpenAI emphasizes that GPT-4 accepts images and text to generate texts as output. According to the company, the new model is less capable than humans in many real-world situations, but exhibits GPT-4 human-level performance across a variety of professional and academic benchmarks.
The predecessor, GPT-3.5, only accepts text as input. In normal, casual conversation, the differences between GPT-3.5 and GPT-4.0 can be subtle. OpenAI states that the differences only really emerge when the task reaches or exceeds a certain level of complexity. Compared to GPT-3.5, GPT-4 is said to be more reliable, more creative, and capable of handling more nuanced instructions.
OpenAI shows some examples of GPT-4’s capabilities where a text question is asked about an attached photo. There are several examples where the model is asked to explain what is funny about the picture.
According to OpenAI, six months of work went into fine-tuning the performance of the latest version. A year ago, GPT-3.5 was trained as an initial test session for the new system. Bugs and theoretical underpinnings have been improved. On that basis, the GPT-4 test session was “unprecedentedly stable,” OpenAI said. This new version became the first OpenAI language model whose training performance could be predicted accurately and ahead of time, the company said.
GPT-4’s text input capability is released via ChatGPT and the new model’s API, where a waiting list is for. To make image input capacity more widely available, OpenAI is currently working with a single partner, namely Be My Eyes. This is a mobile app to make the world more accessible for the blind and partially sighted.