Researchers suggest a better way to report dangerous AI disadvantages

Rate this post

At the end of 2023, a team of third -country researchers found an alarming problem in Openai’s widely used artificial intelligence GPT-3.5 model.

When asked to repeat certain words thousands of times, the model began to repeat the word again and again, then suddenly spitting Disagree text and fragments of personal information derived from her training details, including parts of names, telephone numbers and email addresses. The team that discovered the problem works with Openai to ensure that the disadvantage was eliminated before revealing it publicly. This is just one of the many problems found in the big AI models in recent years.

In a The proposal publishes todayMore than 30 prominent AI researchers, including some who have found the disadvantage of GPT-3.5, say that many other vulnerabilities affecting popular models are reported in problematic ways. They offer a new scheme, supported by AI companies, which gives permission to outsiders to explore their models and a way to disclose disadvantages publicly.

“It’s a little bit of the Wild West right now,” says Shayne longpresDoctoral candidate at MIT and the lead author of the proposal. Longpre says some so -called jailbreen people share their methods of breaching AI protection of the social media platform X, leaving models and users at risk. Other Jailbreaks are only shared with one company, although they can affect a lot. And some disadvantages, according to him, are kept secret for fear of not prohibiting or criminal prosecution for violating the conditions of use. “It is clear that there are freezing effects and uncertainty,” he says.

The safety and safety of AI models are extremely important, given widely technology is now used and how it can penetrate countless applications and services. Powerful models should be tested on stress or red because they can acquire harmful bias and because certain inputs can cause them to It was destroyed by parallels and give unpleasant or dangerous reactions. They include encouraging vulnerable users to participate in harmful behavior or help a bad actor develop cyber, chemical or biological weapons. Some experts fear that models can help cyber criminals or terrorists and may even Include people As they progress.

The authors offer three main measures to improve the process of third -party disclosure: Adoption of standardized AI Flaw reports to optimize the reporting process; For large AI companies to provide infrastructure to third -country researchers revealing disadvantages; And to develop a system that allows you to share disadvantages between different suppliers.

The approach is borrowed from the world of cybersecurity, where there is legal protection and established norms for external researchers to disclose bugs.

“The AI researchers do not always know how to reveal a disadvantage and they cannot be sure that they will not expose them to a legal risk,” says Ilona Cohen, a chief legal and political officer in HacteronA company that organizes Bug Bouncies and co -author in the report.

Large AI companies are currently conducting extensive safety tests for AI models before launching them. Some also negotiate with external companies for a further study. “Are there enough people in these (companies) to deal with all problems with general -purpose AI systems used by hundreds of millions of people in applications we have never dreamed of?” Longpren asks. Some AI companies have started organizing AI Bug Boundies. Longpre, however, says that independent researchers run the risk of violating conditions of use if they perceive it to explore powerful AI models.

Report

Game / Application Name

Your Email: *

Issue: *