Promptstacks
Posts
Show & Tell: GPT-4

Show & Tell: GPT-4

💡 Let's take a look at some examples

Prompter
March 19, 2023

📰 TL;DR

In this article, we'll quickly explore GPT-4 and its business implications. By the end, you'll have a comprehensive understanding of what ‘multimodal’ means and why this tech is game-changing.

[ 4 min read ]

Deep Dive 🤿

Why is GPT-4 different?
What’s does ‘multi-modal’ mean?
GPT-4 in action
GPT-4’s context window
GPT-4 business use-cases
The future

*and 5 of the best free ChatGPT prompts this week

Probably the most eventful week AI has ever seen:
Monday:
- Stanford releases Alpaca 7B
- Google announces Med-PaLM 2 a new medical LLM
Tuesday:
- OpenAI releases GPT4
- Anthropic releases Claude
- Google announces the PaLM API & MakerSuite
- Adept raises $350M
- Google adds… httptwitter.com/i/web/status/1…
— Kris Kashtanova (@icreatelife)
5:39 PM • Mar 16, 2023

So, what’s up with GPT 4?

On Tuesday, March 14th, OpenAI unveiled its highly anticipated GPT-4 model, following the surge of public interest sparked by ChatGPT a few months prior. The new multi-modal model now has the ability to process both images and text inputs, and provide textual responses.

GPT-4 can take in both images and text to generate responses.

What’s does ‘multimodal’ mean?

The most obvious and exciting new feature of GPT-4 is multimodal input. You can think of modality as how you (or the computer) experience something, for example, vision (image/video), hearing, touch, smell, and taste. GPT-4 supports input from both image and text, while previous generations only supported text input.

❝

The most obvious and exciting new feature of GPT-4 is multi-modal input.

This multi-modal input capability opens up a tonne of possibilities. You can now send the model photographs, diagrams, and screenshots, and ask questions regarding those images. GPT-4 is able to analyse what is in them and synthesise it’s answer based on the context and information from both the images and the texts in your prompts.

Here’s GPT 4 In Action

GPT-4 can answer complex questions from image and text input, such as answering exam questions from this photo of an exam paper. Image from OpenAI.

I just watched GPT-4 turn a hand-drawn sketch into a functional website.
This is insane.
— Rowan Cheung (@rowancheung)
8:47 PM • Mar 14, 2023

In the video above, a sketch of a website is drawn - a photo of that sketch is taken and then uploaded to GPT 4. The author then prompts GPT-4 to code a website taking account of the structure in the sketch. It works.

GPT-4 does drug discovery.
Give it a currently available drug and it can:
- Find compounds with similar properties
- Modify them to make sure they're not patented
- Purchase them from a supplier (even including sending an email with a purchase order) twitter.com/i/web/status/1…
— Dan Shipper 📧 (@danshipper)
6:38 PM • Mar 14, 2023

This could be game-changing in democratising drug research and reducing expensive drug pricing.

Can GPT-4 code an entire game for you? Yes, yes it can.
Here's how I recreated a Snake game that runs in your browser using Chat GPT-4 and @Replit, with ZERO knowledge of Javascript all in less than 20 mins 🧵
— Ammaar Reshi (@ammaar)
9:27 PM • Mar 14, 2023

Can’t code, fear no more!

Passing Exams. OpenAI.

The researchers at OpenAI also tested GPT-4 on a variety of benchmark tasks and on many exams designed for humans, like SAT, LSAT, Bar exam, GRE, AP, etc. The questions in the exams include both text and images. They found that GPT-4 can outperform the majority of people on some exams, especially for verbal-related exams like GRE-verbal, in which GPT-4 can outperform 99% of humans.

GPT-4 Has A Larger Context Window

At a maximum, GPT-4 can accept 32K tokens (which is about 25,000 words, or about 52 pages of text). This is a drastic increase from previous versions of GPT, which support only around 4 thousand tokens (about 3000 words).

With this big context window, you can add more context into the prompts, without having to worry about your input exceeding it’s limit. For example, you can throw an entire customer manual or product spec into GPT-4 and let it answer from that corpus. You can also provide it with more comprehensive and intricate prompts, perfect for multi-step tasks.

GPT 4 Is The New Business Superhero

Since GPT-4 can accept both image and text as input, we can practically infer that it has the built-in ability to perform visual question answering. This means that it can be used for a variety of wider use-cases. For customer support, you could ask GPT-4 to answer customer questions from your manual; for user researchers or product managers, you can ask GPT-4 to summarise pain points and requests from physical user feedback or interview scripts; for designers, GPT-4 can analyse and provide feedback on visual designs, such as a website; for accountants, GPT-4 can read images of receipts or financial documents and log/analyse the documents. The list goes on.

Other applications of multi-modal LLM could include: image captioning, visual question answering, multimodal sentiment analysis, image-to-text translation (imagine reverse engineering Stable Diffusion prompts from image), visual storytelling, and more. The applications are bound by your imagination.

The Future

Multi-modality is no doubt going to evolve. GPT-4 only accepts images and text as input now, but there are researchers exploring multi-modal output (like Visual ChatGPT, which can edit images with text instructions and modalities beyond images like video, gesture, gaze, etc. These modalities would bring more holistic experiences to our interactions with AI.

Moreover, LLMs will start to use tools. Without tools, LLMs are still bound to the knowledge they’re pre-trained on. But we probably can expect LLMs to start using tools by themselves. For example, Meta unveiled Toolformer which allows AI models to teach themselves to use APIs. This means they can effectively plug into different websites and interact, autonomously.

Chatbots based on LLMs could develop mixed-initiative abilities. Right now, GPT-4 still produces responses only when humans ask a question (prompt), but in the future, chatbots based on LLMs may just start asking you questions or offering help without you instructing them what to do. This could drastically reduce the barriers to using LLM’s.

But things could also go wrong. In their technical report, OpenAI illustrated that some AI models may develop “power-seeking” behaviours like creating and acting on long-term plans and accruing internal resources. They tried giving GPT-4 the resources and power to set up a new language model, attacking others on the internet, hiding on the current server, and employing humans on TaskRabbit to undertake physical work. Although GPT-4 failed the test, in the future, we may actually see an AI model that succeeds.

Guodong Zhao ✍️

Medium 🌐

The Prompt Collection 🦾
Make Me A Midjourney 🤛
Reddit Spotlight 🔦
Editors Product Pick 📺

Editors Product Pick 📺

KAIBER

This product allows you to create videos from text. According to Kaiber it allows you to “transform your ideas into the visual stories of your dreams.”