Connect with us

Technology

Can AI really compete with human data scientists? OpenAI’s new benchmark puts it to the test

Published

on

Can AI really compete with human data scientists? OpenAI’s new benchmark puts it to the test

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


OpenAI has introduced a new tool to measure artificial intelligence capabilities in machine learning engineering. The benchmark, called MLE-bench, challenges AI systems with 75 real-world data science competitions from Kaggle, a popular platform for machine learning contests.

This benchmark emerges as tech companies intensify efforts to develop more capable AI systems. MLE-bench goes beyond testing an AI’s computational or pattern recognition abilities; it assesses whether AI can plan, troubleshoot, and innovate in the complex field of machine learning engineering.

A schematic representation of OpenAI’s MLE-bench, showing how AI agents interact with Kaggle-style competitions. The system challenges AI to perform complex machine learning tasks, from model training to submission creation, mimicking the workflow of human data scientists. The agent’s performance is then evaluated against human benchmarks. (Credit: arxiv.org)

AI takes on Kaggle: Impressive wins and surprising setbacks

The results reveal both the progress and limitations of current AI technology. OpenAI’s most advanced model, o1-preview, when paired with specialized scaffolding called AIDE, achieved medal-worthy performance in 16.9% of the competitions. This performance is notable, suggesting that in some cases, the AI system could compete at a level comparable to skilled human data scientists.

However, the study also highlights significant gaps between AI and human expertise. The AI models often succeeded in applying standard techniques but struggled with tasks requiring adaptability or creative problem-solving. This limitation underscores the continued importance of human insight in the field of data science.

Advertisement

Machine learning engineering involves designing and optimizing the systems that enable AI to learn from data. MLE-bench evaluates AI agents on various aspects of this process, including data preparation, model selection, and performance tuning.

A comparison of three AI agent approaches to solving machine learning tasks in OpenAI’s MLE-bench. From left to right: MLAB ResearchAgent, OpenHands, and AIDE, each demonstrating different strategies and execution times in tackling complex data science challenges. The AIDE framework, with its 24-hour runtime, shows a more comprehensive problem-solving approach. (Credit: arxiv.org)

From lab to industry: The far-reaching impact of AI in data science

The implications of this research extend beyond academic interest. The development of AI systems capable of handling complex machine learning tasks independently could accelerate scientific research and product development across various industries. However, it also raises questions about the evolving role of human data scientists and the potential for rapid advancements in AI capabilities.

OpenAI’s decision to make MLE-benc open-source allows for broader examination and use of the benchmark. This move may help establish common standards for evaluating AI progress in machine learning engineering, potentially shaping future development and safety considerations in the field.

As AI systems approach human-level performance in specialized areas, benchmarks like MLE-bench provide crucial metrics for tracking progress. They offer a reality check against inflated claims of AI capabilities, providing clear, quantifiable measures of current AI strengths and weaknesses.

The future of AI and human collaboration in machine learning

The ongoing efforts to enhance AI capabilities are gaining momentum. MLE-bench offers a new perspective on this progress, particularly in the realm of data science and machine learning. As these AI systems improve, they may soon work in tandem with human experts, potentially expanding the horizons of machine learning applications.

Advertisement

However, it’s important to note that while the benchmark shows promising results, it also reveals that AI still has a long way to go before it can fully replicate the nuanced decision-making and creativity of experienced data scientists. The challenge now lies in bridging this gap and determining how best to integrate AI capabilities with human expertise in the field of machine learning engineering.


Source link
Continue Reading
Advertisement
Click to comment

You must be logged in to post a comment Login

Leave a Reply

Servers computers

D-Link 42U Network Rack installed

Published

on

D-Link 42U Network Rack installed

source

Continue Reading

Technology

Tesla’s Optimus bot makes a scene at the robotaxi event

Published

on

Tesla’s Optimus bot makes a scene at the robotaxi event

A bunch of Tesla’s humanoid Optimus robots walked out alongside the reveal of Tesla’s new Robovan vehicle at tonight’s Cybercab event. The robot is also seen in a video doing daily human tasks like bringing in a package off the porch and watering your plants.

“The Optimus will walk amongst you,” Tesla CEO Elon Musk qips. “You’ll be able to walk right up to them, and they will serve drinks.”

Musk explains it can basically “do anything” and mentions examples like walking your dog, babysitting your kids, mowing your lawn, serving you drinks, etc. He said it will cost $20,000 to $30,000 “long term.”

“I think this will be the biggest product ever of any kind,” Musk says.

Advertisement

After the presentation, livestream footage showed people interacting with Optimus robots at tables and in crowds. Still, the robots weren’t doing much other than waving in the style of Astro Bot. There was a table of drinks — but the Optimus bot was not seen doing more than holding a cup of ice. However, one bot could hand over small gift bags at another table and play rock paper scissors with guests. And there was an enclosed gazebo with a bunch of dancing robots inside.

Hey, it can do something!
GIF: Tesla

The Tesla bot was not a serious product when Musk first revealed the project in 2021, when a man in a robot suit took the stage to perform a silly dance. But in 2022, the company showed off a crude prototype that gingerly walked onstage.

Musk has loftily promised that Optimus will be a “fundamental transformation for civilization.” And he made bigger ones for the investors: that it’ll bring “two orders of magnitude” of potential improvement of economic output and that it can be “made in very high volume, ultimately millions of units.” Musk said it would cost around “$20,000” and allow for “a future where there is no poverty.”

Advertisement

Source link

Continue Reading

Technology

NYT Strands today — hints, answers and spangram for Friday, October 11 (game #222)

Published

on

NYT Strands homescreen on a mobile phone screen, on a light blue background

Strands is the NYT’s latest word game after the likes of Wordle, Spelling Bee and Connections – and it’s great fun. It can be difficult, though, so read on for my Strands hints.

Want more word-based fun? Then check out my Wordle today, NYT Connections today and Quordle today pages for hints and answers for those games.

Source link

Continue Reading

Servers computers

2U 16in Universal Vented Rack Mount Cantilever Shelf – CABSHELFV | StarTech.com

Published

on

2U 16in Universal Vented Rack Mount Cantilever Shelf - CABSHELFV | StarTech.com



The CABSHELFV 2U 16in Depth Universal Vented Rack Mount Shelf lets you add a compact, 2U shelf to virtually any standard 19-inch server rack or cabinet with front mount options. This TAA compliant product adheres to the requirements of the US Federal Trade Agreements Act (TAA), allowing government GSA Schedule purchases.

Our vented rack shelves improve air flow and help to lower temperatures in the rack. Constructed using SPCC commercial grade cold-rolled steel, this durable fixed rack shelf can hold up to 22kg (50lbs) of equipment – a perfect solution for storing small, non-rackmount equipment, tools, peripherals or accessories in your rack to keep them readily accessible.

Backed by a StarTech.com Lifetime warranty.

To learn more visit StarTech.com

https://www.amazon.com/StarTech-com-Vented-Server-Mount-Shelf/dp/B008X3JHJQ/ref=sr_1_1?dchild=1&keywords=cabshelf22v&qid=1600979374&sr=8-1&th=1 .

source

Continue Reading

Technology

NYT Crossword: answers for Thursday, October 10

Published

on

NYT Crossword: answers for Monday, September 23


The New York Times crossword puzzle can be tough! If you’re stuck, we’re here to help with a list of today’s clues and answers.

Source link

Continue Reading

Servers computers

Austin Hughes InfraPower, Rack Power Distribution Unit

Published

on

Austin Hughes InfraPower, Rack Power Distribution Unit



AUSTIN-HUGHES – InfraPower
Rack Power Distribution Unit

InfraPower provides a complete power management solution from Basic to Intelligent kWh Outlet Measurement PDU :

1. Intelligent PDU (W Series) : Monitored or Switched, Outlet Measurement available

2. Metered PDU (MD Series): Local monitoring via a digital ammeter

3. Basic PDU : Basic Series PDUs are designed for cost efficient and reliable power distribution in data center.

Austin Hughes rPDU model available in 1Phase & 3Phase.

PT. Uni Network Communications is the distributor of AUSTIN-HUGHES.
Austin Hughes is a design and manufacturing group that offers a broad range of solutions based around 19-inch Rack-mounted technology.

Austin Hughes solutions include :
+ InfraSolution® SmartCard access control & monitoring for global branded racks
+ InfraPower® intelligent kWh power management
+ InfraCool® intelligent airflow management
+ InfraGuard rack environmental sensor system
+ CyberView™ dedicated KVM switch & rackmount display and UltraView professional LCD screen.

Please Contact Us for more informations:
PT. Uni Network Communications

Jl. Batu Jajar No.11A, Sawah Besar
Jakarta Pusat – 10120
Phone : 021 3512977
Fax : 021 3512526
Email : sales@abba-rack.com | marketing@unc.co.id

www.abba-rack.com || www.unc.co.id || www.kvm.co.id

source

Continue Reading

Trending

Copyright © 2024 WordupNews.com