Estimated reading time: 5 minutes
The just-released AI Safety Index graded six leading AI companies on their risk assessment efforts and safety procedures… and the top of class was Anthropic, with an overall score of C. The other five companies—Google DeepMind, Meta, OpenAI, xAI, and Zhipu AI—received grades of D+ or lower, with Meta flat out failing.
“The purpose of this is not to shame anybody,” says Max Tegmark, an MIT physics professor and president of the Future of Life Institute, which put out the report. “It’s to provide incentives for companies to improve.” He hopes that company executives will view the index like universities view the U.S. News and World Reports rankings: They may not enjoy being graded, but if the grades are out there and getting attention, they’ll feel driven to do better next year.
He also hopes to help researchers working in those companies’ safety teams. If a company isn’t feeling external pressure to meet safety standards, Tegmark says, “then other people in the company will just view you as a nuisance, someone who’s trying to slow things down and throw gravel in the machinery.” But if those safety researchers are suddenly responsible for improving the company’s reputation, they’ll get resources, respect, and influence.
The Future of Life Institute is a nonprofit dedicated to helping humanity ward off truly bad outcomes from powerful technologies, and in recent years it has focused on AI. In 2023, the group put out what came to be known as “the pause letter,” which called on AI labs to pause development of advanced models for six months, and to use that time to develop safety standards. Big names like Elon Musk and Steve Wozniak signed the letter (and to date, a total of 33,707 have signed), but the companies did not pause.
This new report may also be ignored by the companies in question. IEEE Spectrum reached out to all the companies for comment, but only Google DeepMind responded, providing the following statement: “While the index incorporates some of Google DeepMind’s AI safety efforts, and reflects industry-adopted benchmarks, our comprehensive approach to AI safety extends beyond what’s captured. We remain committed to continuously evolving our safety measures alongside our technological advancements.”
How the AI Safety Index graded the companies
The Index graded the companies on how well they’re doing in six categories: risk assessment, current harms, safety frameworks, existential safety strategy, governance and accountability, and transparency and communication. It drew on publicly available information, including related research papers, policy documents, news articles, and industry reports. The reviewers also sent a questionnaire to each company, but only xAI and the Chinese company Zhipu AI (which currently has the most capable Chinese-language LLM) filled theirs out, boosting those two companies’ scores for transparency.
The grades were given by seven independent reviewers, including big names like UC Berkeley professor Stuart Russell and Turing Award winner Yoshua Bengio, who have said that superintelligent AI could pose an existential risk to humanity. The reviewers also included AI leaders who have focused on near-term harms of AI like algorithmic bias and toxic language, such as Carnegie Mellon University’s Atoosa Kasirzadeh and Sneha Revanur, the founder of Encode Justice.
And overall, the reviewers were not impressed. “The findings of the AI Safety Index project suggest that although there is a lot of activity at AI companies that goes under the heading of ‘safety,’ it is not yet very effective,” says Russell. “In particular, none of the current activity provides any kind of quantitative guarantee of safety; nor does it seem possible to provide such guarantees given the current approach to AI via giant black boxes trained on unimaginably vast quantities of data. And it’s only going to get harder as these AI systems get bigger. In other words, it’s possible that the current technology direction can never support the necessary safety guarantees, in which case it’s really a dead end.”
Anthropic got the best scores overall and the best specific score, getting the only B- for its work on current harms. The report notes that Anthropic’s models have received the highest scores on leading safety benchmarks. The company also has a “responsible scaling policy“ mandating that the company will assess its models for their potential to cause catastrophic harms, and will not deploy models that the company judges too risky.
All six companies scaled particularly badly on their existential safety strategies. The reviewers noted that all of the companies have declared their intention to build artificial general intelligence (AGI), but only Anthropic, Google DeepMind, and OpenAI have articulated any kind of strategy for ensuring that the AGI remains aligned with human values. “The truth is, nobody knows how to control a new species that’s much smarter than us,” Tegmark says. “The review panel felt that even the [companies] that had some sort of early-stage strategies, they were not adequate.”
While the report does not issue any recommendations for either AI companies or policymakers, Tegmark feels strongly that its findings show a clear need for regulatory oversight—a government entity equivalent to the U.S. Food and Drug Administration that would approve AI products before they reach the market.
“I feel that the leaders of these companies are trapped in a race to the bottom that none of them can get out of, no matter how kind-hearted they are,” Tegmark says. Today, he says, companies are unwilling to slow down for safety tests because they don’t want competitors to beat them to the market. “Whereas if there are safety standards, then instead there’s commercial pressure to see who can meet the safety standards first, because then they get to sell first and make money first.”
About The Author
Discover more from Artificial Race!
Subscribe to get the latest posts sent to your email.