ChatGPT gets screen sharing and real-time video analysis, rivaling Gemini 2
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn more
OpenAI finally added the long-awaited video and screen sharing to its advanced voice mode, allowing users to interact with the chatbot in various modalities.
Both capabilities are now available on iOS and Android mobile apps for ChatGPT Teams, Plus and Pro users, and will roll out to ChatGPT Enterprise and Edu subscribers in January. However, users in the EU, Switzerland, Iceland, Norway, and Liechtenstein will not have access to enhanced voice mode.
OpenAI first teased the feature in May, when the company introduced GPT-4o and discussed ChatGPT learning to “watch” a game and explain what’s going on. Advanced voice mode was introduced for users in September.
Users can access video via new buttons on the Enhanced Voice Mode screen to start a video.
OpenAI’s video mode feels like a video call like Facetime because ChatGPT responds in real time to what users are showing in the video. It can see what’s around the user, identify objects, and even remember people who introduce themselves. In an OpenAI demo as part of the company’s “12 Days of Shipmas” event, ChatGPT used the video feature to help make coffee. ChatGPT saw the coffee accessories, instructed when to place a filter and critiqued the result.
It is also very similar to on Google recently announced Project Astrain which users can open a video chat and Gemini 2.0 will answer questions about what it sees, such as identifying a sculpture found on a street in London. In many ways, these features are more advanced versions of AI devices like the Humane Pin and Rabbit r1 were marketed to: Have an AI voice assistant answer questions about what it sees in a video.
Screen sharing
The new screen sharing feature brings ChatGPT out of the app and into the realm of the browser.
For screen sharing, a three-dot menu allows users to exit the ChatGPT app. They can open apps on their phones and ask ChatGPT questions about what it sees. In the demo, the OpenAI researchers triggered screen sharing, then opened the Messages app to ask ChatGPT for help replying to a photo sent via text message.
However, the enhanced voice mode screen sharing feature has similarities with the recently released features from Microsoft and Google.
Last week, Microsoft released a a pre-release version of Copilot Visionwhich allows Pro subscribers to open a Copilot chat while browsing a web page. Copilot Vision can look at photos on a store’s website or even help play the Geoguessr map guessing game. Google’s Astra project can also read browsers in the same way.
Both Google and OpenAI have released AI chat features for screen sharing on phones to target the user base that might use ChatGPT or Gemini more on the go. But these types of features could signal a way for businesses to collaborate more with AI agents, as the agent can see what a person is looking at on the screen. It may be a precursor to models that use computers, such as Anthropic Using a computerwhere the AI model not only looks at the screen, but actively opens tabs and programs for the user.
Ho-ho-ho, ask Santa a question
In an attempt at levity, OpenAI has also released “Santa Mode” in advanced voice mode. The new preset voice sounds a lot like the jolly old man in the red suit.
Unlike the new features limited to specific users, “Santa Mode” is now available to users with access to advanced voice mode on the mobile app, ChatGPT web version, and Windows and MacOS apps until early January.
However, conversations with Santa will not be saved in the chat history and will not affect ChatGPT’s memory.
Even OpenAI is feeling the Christmas spirit.