Will Smith consuming spaghetti and different bizarre AI benchmarks that took off in 2024

When an organization releases a brand new AI video generator, it’s not lengthy earlier than somebody makes use of it to make a video of actor Will Smith consuming spaghetti.

It’s turn out to be one thing of a meme in addition to a benchmark: Seeing whether or not a brand new video generator can realistically render Smith slurping down a bowl of noodles. Smith himself parodied the pattern in an Instagram put up in February.

Google Veo 2 has completed it.

We at the moment are consuming spaghett finally. pic.twitter.com/AZO81w8JC0

— Jerrod Lew (@jerrod_lew) December 17, 2024

Will Smith and pasta is however one in all a number of weird “unofficial” benchmarks to take the AI group by storm in 2024. A 16-year-old developer constructed an app that offers AI management over Minecraft and exams its capability to design constructions. Elsewhere, a British programmer created a platform the place AI performs video games like Pictionary and Join 4 in opposition to one another.

It’s not like there aren’t extra tutorial exams of an AI’s efficiency. So why did the weirder ones blow up?

Picture Credit:Paul Calcraft

For one, most of the industry-standard AI benchmarks don’t inform the common individual very a lot. Firms usually cite their AI’s capability to reply questions on Math Olympiad exams, or determine believable options to PhD-level issues. But most individuals — yours really included — use chatbots for issues like responding to emails and primary analysis.

Crowdsourced {industry} measures aren’t essentially higher or extra informative.

Take, for instance, Chatbot Area, a public benchmark many AI lovers and builders comply with obsessively. Chatbot Area lets anybody on the internet price how nicely AI performs on explicit duties, like creating an online app or producing a picture. However raters have a tendency to not be consultant — most come from AI and tech {industry} circles — and forged their votes based mostly on private, hard-to-pin-down preferences.

The Chatbot Area interface.Picture Credit:LMSYS

Ethan Mollick, a professor of administration at Wharton, not too long ago identified in a put up on X one other drawback with many AI {industry} benchmarks: they don’t evaluate a system’s efficiency to that of the common individual.

“The fact that there are not 30 different benchmarks from different organizations in medicine, in law, in advice quality, and so on is a real shame, as people are using systems for these things, regardless,” Mollick wrote.

Bizarre AI benchmarks like Join 4, Minecraft, and Will Smith consuming spaghetti are most definitely not empirical — and even all that generalizable. Simply because an AI nails the Will Smith check doesn’t imply it’ll generate, say, a burger nicely.

Mcbench — Notice the typo; there’s no such mannequin as Claude 3.6 Sonnet.Picture Credit:Adonis Singh

One professional I spoke to about AI benchmarks advised that the AI group concentrate on the downstream impacts of AI as an alternative of its capability in slim domains. That’s wise. However I’ve a sense that bizarre benchmarks aren’t going away anytime quickly. Not solely are they entertaining — who doesn’t like watching AI construct Minecraft castles? — however they’re straightforward to grasp. And as my colleague Max Zeff wrote about not too long ago, the {industry} continues to grapple with distilling a expertise as complicated as AI into digestible advertising.

The one query in my thoughts is, which odd new benchmarks will go viral in 2025?

TechCrunch has an AI-focused e-newsletter! Enroll right here to get it in your inbox each Wednesday.

Will Smith consuming spaghetti and different bizarre AI benchmarks that took off in 2024

Tremendous League storylines to comply with in 2025: Wigan Warriors nonetheless on high? Leeds Rhinos the following Manchester United? Warrington Wolves lastly make it...

Apple’s ELEGNT framework might make dwelling robots really feel much less like machines and extra like companions

What In regards to the Worth of Beef?

At present on Sky Sports activities Racing: Juby Ball returns to motion at Chepstow reside on Sunday | Racing Information

Apple’s new analysis robotic takes a web page from Pixar’s playbook

Related articles

Apple’s ELEGNT framework might make dwelling robots really feel much less like machines and extra like companions

Apple’s new analysis robotic takes a web page from Pixar’s playbook

Samsung’s Galaxy S25 telephones, OnePlus 13 and Oura Ring 4

Hugging Face brings ‘Pi-Zero’ to LeRobot, making AI-powered robots simpler to construct and deploy

Follow us

Company

Latest news

There’s By no means Been a Extra Harmful Time to Use Road Medication. Here is Why. : ScienceAlert

Tremendous League storylines to comply with in 2025: Wigan Warriors nonetheless on high? Leeds Rhinos the following Manchester United? Warrington Wolves lastly make it...

Apple’s ELEGNT framework might make dwelling robots really feel much less like machines and extra like companions

Popular news

Anyword Evaluation: Is It the Proper AI Writing Device For You?

World Cyber Resilience Report 2024: Overconfidence and Gaps in Cybersecurity Revealed

The magical great thing about the Higher Lakes of the Plitvice Lakes Nationwide Park