LLM ATTACKS

Updated 326 days ago

ID: 52593389/8

llm-attacks.org

CLICK HERE TO SEE DETAILS OF COMPANY CHANGES

This work studies the safety of such models in a more systematic fashion. We demonstrate that it is in fact possible to automatically construct adversarial attacks on LLMs, specifically chosen sequences of characters that, when appended to a user query, will cause the system to obey user commands even if it produces harmful content. Unlike traditional jailbreaks, these are built in an entirely automated fashion, allowing one to create a virtually unlimited number of such attacks. Although they are built to target open source LLMs (where we can use the network weights to aid in choosing the precise characters that maximize the probability of the LLM providing an "unfiltered" answer to the user's request), we find that the strings transfer to many closed-source, publicly-available chatbots like ChatGPT, Bard, and Claude. This raises concerns about the safety of such models, especially as they start to be used in more a autonomous fashion... Perhaps most concerningly, it is unclear..

SEARCH FOR SIMILAR COMPANIES

Interest Score

HIT Score

0.00

Domain

llm-attacks.org

Actual

llm-attacks.org

185.199.108.153, 185.199.109.153, 185.199.110.153, 185.199.111.153

Status

Category

Company

0 comments Add a comment