ZDNET Test Finds Gemini Handles 3 Video Types as ChatGPT Needs Codex and Claude Fails
Updated
Updated · ZDNet · May 11
ZDNET Test Finds Gemini Handles 3 Video Types as ChatGPT Needs Codex and Claude Fails
9 articles · Updated · ZDNet · May 11
Three video tests — a YouTube clip, a 625MB MP4 and a 1.65GB MOV — showed Gemini could directly analyze all formats in a browser, while Claude said it could not process video or audio streams.
ChatGPT alone also fell short, unable to use the YouTube link and limited to videos under 500MB, but OpenAI's Codex could handle local files and even script a download-and-analysis workflow for YouTube.
A silent drone clip became the clearest differentiator: Gemini correctly inferred gesture-based drone control even though the drone never appeared on screen, and Codex produced a similar reading after extra setup.
Thumbnail creation exposed a different trade-off: Gemini's video understanding was strongest, but its image handoff produced errors, while the Codex-plus-ChatGPT combination generated a more usable thumbnail with iterative prompting.
ZDNET said the results point to practical uses such as timestamped video summaries, security-footage scanning and creator workflows, with Gemini currently the most convenient all-in-one option.
Why does Gemini excel at video analysis but fail at creative tasks, revealing a key weakness in all-in-one AI models?
With AI understanding video better than humans, what separates it from autonomously generating content indistinguishable from reality?
As agentic AI masters complex tasks, how will we control the skyrocketing costs and unpredictable behavior of these powerful new systems?