Connect with us

Technology

Can AI really compete with human data scientists? OpenAI’s new benchmark puts it to the test

Published

on

Can AI really compete with human data scientists? OpenAI’s new benchmark puts it to the test

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


OpenAI has introduced a new tool to measure artificial intelligence capabilities in machine learning engineering. The benchmark, called MLE-bench, challenges AI systems with 75 real-world data science competitions from Kaggle, a popular platform for machine learning contests.

This benchmark emerges as tech companies intensify efforts to develop more capable AI systems. MLE-bench goes beyond testing an AI’s computational or pattern recognition abilities; it assesses whether AI can plan, troubleshoot, and innovate in the complex field of machine learning engineering.

A schematic representation of OpenAI’s MLE-bench, showing how AI agents interact with Kaggle-style competitions. The system challenges AI to perform complex machine learning tasks, from model training to submission creation, mimicking the workflow of human data scientists. The agent’s performance is then evaluated against human benchmarks. (Credit: arxiv.org)

AI takes on Kaggle: Impressive wins and surprising setbacks

The results reveal both the progress and limitations of current AI technology. OpenAI’s most advanced model, o1-preview, when paired with specialized scaffolding called AIDE, achieved medal-worthy performance in 16.9% of the competitions. This performance is notable, suggesting that in some cases, the AI system could compete at a level comparable to skilled human data scientists.

However, the study also highlights significant gaps between AI and human expertise. The AI models often succeeded in applying standard techniques but struggled with tasks requiring adaptability or creative problem-solving. This limitation underscores the continued importance of human insight in the field of data science.

Advertisement

Machine learning engineering involves designing and optimizing the systems that enable AI to learn from data. MLE-bench evaluates AI agents on various aspects of this process, including data preparation, model selection, and performance tuning.

A comparison of three AI agent approaches to solving machine learning tasks in OpenAI’s MLE-bench. From left to right: MLAB ResearchAgent, OpenHands, and AIDE, each demonstrating different strategies and execution times in tackling complex data science challenges. The AIDE framework, with its 24-hour runtime, shows a more comprehensive problem-solving approach. (Credit: arxiv.org)

From lab to industry: The far-reaching impact of AI in data science

The implications of this research extend beyond academic interest. The development of AI systems capable of handling complex machine learning tasks independently could accelerate scientific research and product development across various industries. However, it also raises questions about the evolving role of human data scientists and the potential for rapid advancements in AI capabilities.

OpenAI’s decision to make MLE-benc open-source allows for broader examination and use of the benchmark. This move may help establish common standards for evaluating AI progress in machine learning engineering, potentially shaping future development and safety considerations in the field.

As AI systems approach human-level performance in specialized areas, benchmarks like MLE-bench provide crucial metrics for tracking progress. They offer a reality check against inflated claims of AI capabilities, providing clear, quantifiable measures of current AI strengths and weaknesses.

The future of AI and human collaboration in machine learning

The ongoing efforts to enhance AI capabilities are gaining momentum. MLE-bench offers a new perspective on this progress, particularly in the realm of data science and machine learning. As these AI systems improve, they may soon work in tandem with human experts, potentially expanding the horizons of machine learning applications.

Advertisement

However, it’s important to note that while the benchmark shows promising results, it also reveals that AI still has a long way to go before it can fully replicate the nuanced decision-making and creativity of experienced data scientists. The challenge now lies in bridging this gap and determining how best to integrate AI capabilities with human expertise in the field of machine learning engineering.


Source link
Continue Reading
Advertisement
Click to comment

You must be logged in to post a comment Login

Leave a Reply

Technology

NIS2 & DORA: Staying ahead of the curve

Published

on

NIS2 & DORA: Staying ahead of the curve

With less than a month away before the updated landmark Network and Information Security (NIS2) Directive deadline, organizations across the EU are preparing for the new regulation to come into full force on the 17th October. However, it doesn’t stop there. On the 17th January 2025, the new Digital Operational Resilience Act (DORA) will also come into effect for financial organizations and the sector’s third-party IT suppliers.

Organizations across the EU, and those based elsewhere that do business with the region’s entities, are facing increasing pressure to align with these regulatory requirements. The convergence of these frameworks looks to impact over 170,000 European organizations in total — with 150,000 organizations affected by the NIS2 and estimates suggesting over 22,000 financial entities and ICT service providers impacted by DORA.

Simon Fisher

What are NIS2 and DORA?

Source link

Advertisement

Continue Reading

Technology

NYT Mini Crossword today: puzzle answers for Friday, October 11

Published

on

NYT Mini Crossword today: puzzle answers for Saturday, September 21

The New York Times has introduced the next title coming to its Games catalog following Wordle’s continued success — and it’s all about math. Digits has players adding, subtracting, multiplying, and dividing numbers. You can play its beta for free online right now. 
In Digits, players are presented with a target number that they need to match. Players are given six numbers and have the ability to add, subtract, multiply, or divide them to get as close to the target as they can. Not every number needs to be used, though, so this game should put your math skills to the test as you combine numbers and try to make the right equations to get as close to the target number as possible.

Players will get a five-star rating if they match the target number exactly, a three-star rating if they get within 10 of the target, and a one-star rating if they can get within 25 of the target number. Currently, players are also able to access five different puzzles with increasingly larger numbers as well.  I solved today’s puzzle and found it to be an enjoyable number-based game that should appeal to inquisitive minds that like puzzle games such as Threes or other The New York Times titles like Wordle and Spelling Bee.
In an article unveiling Digits and detailing The New York Time Games team’s process to game development, The Times says the team will use this free beta to fix bugs and assess if it’s worth moving into a more active development phase “where the game is coded and the designs are finalized.” So play Digits while you can, as The New York Times may move on from the project if it doesn’t get the response it is hoping for. 
Digits’ beta is available to play for free now on The New York Times Games’ website

Source link

Continue Reading

Servers computers

Data Rack Move using a set of our Hydraulic Lifters

Published

on

Data Rack Move using a set of our Hydraulic Lifters



A half rack and quarter rack both at around 250 kg each and without castors, relocated from Amsterdam to Slough. Moved using our hydraulic lifters (just like skoots) .

source

Continue Reading

Technology

Protecting your web app from unauthorized access

Published

on

Protecting your web app from unauthorized access

By making use of robust authentication and authorization, web apps can effectively mitigate the all-too-common risks associated with unauthorized intrusions.

Authentication confirms the identity of users accessing the system while authorization further restricts user actions based on their roles, minimizing potential vulnerabilities within the application.

Regularly updating security measures and educating developers about their importance play crucial roles in maintaining a secure environment.

Understanding web security fundamentals

Web application security is critical if your goal is to protect sensitive data and keep the trust of your userbase.

Advertisement

Security measures should evolve over the course of time to counteract the latest threats. Putting in place the latest best practices helps prevent or ameliorate potential breaches that could very well have severe financial and reputational impacts.

Companies often decide to hire penetration testing companies for the express purpose of uncovering vulnerabilities. Conducting these tests on a regular basis — once or twice per year allows businesses to stay in lockstep with emerging threats. Emphasizing security in the development lifecycle ensures that measures are integral rather than an afterthought.

Common web security threats

In many web applications, vulnerabilities like SQL injection and cross-site scripting (XSS) are prevalent —these threats exploit poor input validation, allowing attackers access to vital information. Developers should prioritize input sanitization to prevent such attacks.

Cybersecurity threats evolve rapidly. The rise of complex attacks necessitates ongoing vigilance. Security threats can disrupt services and compromise data integrity. Staying informed about common threats is vital for implementing timely defenses in web app development.

Advertisement

Principles of secure web design

Designing with security in mind involves adhering to key principles. Inputs should be thoroughly validated, and sensitive data encrypted to prevent unauthorized access. Utilizing parameterized queries reduces the risk of SQL injection.

Another principle is the concept of least privilege, where users and applications are granted only the necessary permissions. This minimizes the damage potential if access is compromised. Security frameworks should be integrated into the design process, ensuring a strong foundation for robust web applications.

Authentication and access control measures

Methods such as multi-factor authentication (MFA) enhance protection by requiring users to provide two or more separate verification factors — while this is slightly more time-consuming, the slight hassle does pay off. This makes it more difficult for anyone unauthorized to gain access. Furthermore, integrating authentication based on tokens also adds an extra layer of security, as tokens are unique and time-sensitive, reducing the risk of session hijacking.

Furthermore, regular monitoring of authentication logs can help identify unusual access patterns. Input validation during the login process ensures that data entered by users satisfies predefined criteria, preventing attacks like SQL injection.

Advertisement

Utilizing role-based access control

Role-based access control (RBAC) is a paradigm based on assigning permissions to employees based on what their role within an organization is. This tends to make management simpler — by grouping users with similar responsibilities and assigning specific access rights to these groups. This system ensures that sensitive information and functionalities are only accessible to roles that require them.

By clearly defining roles and permissions, organizations can reduce the risk of data breaches. For effective RBAC implementation, regularly updating role assignments and conducting audits are essential. Automated tools can assist in managing roles and permissions, ensuring smooth operations, and minimizing administrative overhead. Such measures enhance security by ensuring users have access only to necessary resources.

Making use of the principle of least privilege

The principle of least privilege is a vital security measure. It limits user access to the minimum level necessary to perform their job functions, thereby mitigating security risks.

It operates on a simple basis — all users are granted the least amount of access required to do their duties, reducing the potential impact of any one account being compromised.

Advertisement

Regularly reviewing and adjusting user privileges helps maintain effective security. It’s crucial to revoke unnecessary privileges promptly. Implementing controls to monitor user actions assists in maintaining compliance with this principle.

Defending against common web attacks

Web applications face numerous threats that can compromise data integrity and user privacy. Guarding against these threats involves adopting specialized strategies focusing on every type of attack.

Protecting against injection attacks

Injection attacks involve injecting malicious code into a web application to manipulate its database. A prevalent example is SQL injection, which targets database layers by injecting SQL commands.

To defend against these, developers should implement parameterized queries and stored procedures, which limit user input from altering queries in harmful ways.

Advertisement

Regular use of a web application firewall helps to detect and block suspicious activities. Last but not least, the utilization of input validation is yet another crucial measure, ensuring that user inputs adhere to expected formats and content types.

Defending against cross-site scripting (XSS)

Cross-site scripting, or XSS attacks, consist of malicious actors injecting client-side scripts into the web pages that are afterward viewed by other users.

This can lead to unauthorized access to user sessions and the exposal of sensitive information. Utilizing content security policies (CSP) can stop browsers from executing such types of malicious scripts. On top of that, encoding data sent to a web browser ensures that the data is treated as text, not as executable code. Developers can also sanitize inputs by escaping data before processing it or displaying it.

Preventing cross-site request forgery (CSRF)

Cross-site request forgery tricks a user into executing unwanted actions on a web application where they are authenticated. Protecting against CSRF involves the use of anti-forgery tokens, which ensure that requests originate from legitimate users.

Advertisement

Session management and secure cookies are also critical, helping to maintain secure user sessions and reduce vulnerabilities. To bolster security, developers can also leverage mobile security features that ensure consistent protection across devices. Addressing these aspects minimizes the likelihood of CSRF attacks compromising web applications.

Encryption and secure data handling

Ensuring that you fully secure sensitive information by encrypting it and managing it carefully is vital to preventing unauthorized access.

Implementing SSL/TLS for Secure Communication

SSL/TLS protocols play a key role in protecting data exchanges between a server and its users, encrypting interactions to deter interception and manipulation. Websites should adopt HTTPS to maintain data privacy and ensure that information stays intact while in transit.

Advertisement

To implement SSL/TLS, a business must go about acquiring a certificate from a trusted Certificate Authority — this certificate acts as a vote of confidence for the server and assures users that communication is encrypted. Without SSL/TLS, applications are susceptible to risks, such as interception attacks, which could compromise sensitive data.

Data encryption and document access

Data encryption protects sensitive information at rest, whether in a database or during document access. Implementing strong encryption algorithms ensures data security.

For document handling, especially PDFs, digital signatures can be employed to verify authenticity and integrity. Utilization of various software development kits helps seamlessly integrate signing capabilities within applications, ultimately securing your app’s users.

Secure session management

Session management involves securely handling session tokens and IDs to prevent unauthorized access to user accounts. Proper secure session management ensures that tokens are randomly generated and stored securely.

Advertisement

Key practices include:

– Using secure cookies with the HttpOnly and Secure flags,
– Ensuring that session IDs are changed upon user login and logout,
– Limiting session duration with appropriate expiration times.

To prevent session hijacking, it’s recommended that developers use tools and software development kits capable of integrating robust session management features. Proactively managing these sessions ensures ongoing security for users’ data.

Conclusion

Strong authentication methods should be prioritized. Making the use of multi-factor authentication and strong access controls a priority and standard company policy can go a long way in the effort to reduce the risk of such unwanted access.

Advertisement

Regular security updates and patches play a vital role in mitigating vulnerabilities. Implementing these strategies creates a robust defensive perimeter that enhances the overall security posture of a web application. By being proactive and vigilant, organizations can safeguard sensitive data and maintain user trust.

Featured image source

Source link

Continue Reading

Servers computers

Cisco UCS Server, new Datacenter

Published

on

Cisco UCS Server, new Datacenter

source

Continue Reading

Technology

The first company to use upgraded Apple Wallet tickets is… Ticketmaster

Published

on

The first company to use upgraded Apple Wallet tickets is... Ticketmaster

Ticketmaster that it will be the first ticketing company to take advantage of new features that arrived in Apple Wallet with iOS 18. According to a blog post from the business, Ticketmaster tickets viewed in the Apple app can show enhanced information such as venue maps, parking directions, local weather forecasts and recommended listening from Apple Music. Teams and event spaces can also choose to add links to their own apps or websites that customers can access from their Wallet tickets.

The company is first applying the new tech to two sporting events this year, and said it will be rolling out the capabilities to more events in 2025. While Ticketmaster is touting its role as the first adopter of the new Wallet ticket experience, the new features will not only be available to that company. Considering Ticketmaster was in the PR images the new features, it seems likely that the companies had an agreement about how they’d jointly promote the updates.

Apple Wallet boasts several upgrades in the latest operating system release, such as a new feature for faster money transfer. The initial iOS 18 rollout last month had some good stuff, but the marquee debut of Apple Intelligence likely won’t happen until .

Source link

Continue Reading

Trending

Copyright © 2024 WordupNews.com