The LLaMA is out of the bag. Should we expect a tidal wave of disinformation?
The bottleneck isn't the cost of producing disinfo, which is already very low.
Ten days ago, Meta announced a large language model called LLaMA. The company said the goal was to foster research. It didn’t release the model publicly but allowed researchers to request access through a form. But a week later, someone leaked the model. This is significant because it makes LLaMA the most capable LLM publicly available. It is reportedly competitive with LaMDA, the model underlying Google’s Bard. Many experts worry that we are about to see an explosion of misuse, such as scams and disinformation.
But let’s back up. Models that are only slightly less capable have been available for years, such as GPT-J. And we’ve been told for a while now that a wave of malicious use is coming. Yet, there don’t seem to be any documented cases of such misuse. Not one, except for a research study.
One possibility is that malicious use has been happening all along, but hasn’t been publicly reported. But that seems unlikely, because we know of numerous examples of non-malicious misuses: students cheating on homework, a Q&A site and a sci-fi mag being overrun by bot-generated submissions, CNET publishing error-filled investment advice, a university offending staff and students by sending a bot-generated condolence message after a shooting, and, of course, search engine spam.
Malicious use is intended to harm the recipient in some way, while non-malicious misuse is when someone is just trying to save time or make a buck. Sometimes the categorization may be unclear, but all the examples above seem obviously non-malicious. The difference is significant because non-malicious misuse does not rely on open-source models. OpenAI can (and does) try to train ChatGPT to refuse to generate misinformation, but can’t possibly prevent students from using the tool to generate essays.
Seth Lazar suggests that the risk of LLM-based disinformation is overblown because the cost of producing lies is not the limiting factor in influence operations. We agree. Spam might be similar. The challenge for spammers is likely not the cost of generating spam emails, but locating the tiny fraction of people who will potentially fall for whatever the scam is. There’s a classic paper that argues that for precisely this reason, spammers go out of their way to make their messages less persuasive. That way, receiving a response is a stronger signal of the recipient’s vulnerability. (Update: see below).
We could be wrong about this. People like Gary Marcus who’ve thought carefully about the topic will no doubt have important counterarguments. But ultimately we should defer to the evidence. Now that LLaMA is out, if reports of malicious misuse continue to be conspicuously absent in the next few months, that should make us rethink the risk. Besides, attempting to control the supply of misinformation or spam seems like a brittle approach compared to giving people the knowhow and the technical tools to defend themselves.
If we are correct about the risk of malicious misuse being lower than widely assumed, that’s an argument in favor of open-sourcing LLMs. To be clear, we don’t have an opinion on whether they should be open-sourced; that question is too complex to tackle here. The risk of misuse is only one factor, albeit an important one. Our goal here is to steer the debate away from false premises. One should be especially cautious about arguments for keeping models proprietary based on evidence-free claims of misuse, considering that the powerful companies that build these models have an obvious vested interest in pushing this view.
Finally, we should demand that companies be much more transparent: those that host LLMs should release audits of how the tools have been used and abused. Social media platforms should study and report the prevalence of LLM-generated misinformation.
The paper Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations has a good overview of hypothetical malicious uses of LLMs — emphasis on hypothetical. The authors are at OpenAI, Georgetown University, and Stanford University.
A paper by Daniel Kang and others (U of Illinois, Stanford, Berkeley) shows how to bypass LLM content filters to generate misinformation, showing that even models that are behind an API aren’t obviously safe from malicious use. Both this paper and the previous one emphasize that a personalized conversation can be much more persuasive than a one-off, untargeted message. We think this ability — rather than lowering the cost of message generation — is probably the biggest threat from LLMs with respect to both disinformation and fraud.
A report from TrendLabs discusses the market for production and dissemination of disinformation.
Cross-posted on the Knight First Amendment Institute blog.
Update. Grady Booch on Twitter points to examples of cybercriminals experimenting with ChatGPT to help create malware. If LLM-enabled malware starts to get used in actual attacks, or if open-source models enable more of this kind of misuse than ChatGPT already does, that would be a point against open-sourcing. Still, note that the availability of technical expertise is not a bottleneck in cybercriminal activity.
In this paper, Cormac Herley claims that Nigerian scammers purposefully make their emails implausible to select only the most gullible recipients. Based on years of on-the-ground research, Jenna Burrell points out that this paper relied on thought experiments that do not match how people use the internet in Nigeria. She raises two objections about the paper's assumptions:
Herley assumes that mentioning Nigeria is not necessarily required for scammers since they can easily transfer money from another country. This is a false assumption since it is not easy for scammers to transfer money from abroad.
The stories written by scammers shifted to being more plausible over time. These stories should have become less plausible if Herley's assumptions were true.
She said, "the notion that Nigerian scam emails are purposefully implausible isn't really supported by any on-the-ground evidence." This is a compelling reason not to rely on Herley's arguments about scams originating in Nigeria.
Our colleague Matt Salganik emailed us some great questions. I'm answering them here since it's a good opportunity to fill in some missing detail in the post.
- I’m surprised this model was leaked. Has that ever happened before? Why would someone do that? Usually leakers have some motivation. For example, is this the case of a person being tricked by an LLM? Is this someone angry at Meta? I read the article you linked to but there was nothing about motive.
Many people in the tech community, perhaps the majority, have strong feelings about open source and open science, and consider it an injustice that these models are being hoarded by companies. Of course, many of them are also angry at being denied these cool toys to play with, and it's hard to tell which motivation predominates.
The leak was coordinated on 4chan, so it's also possible (though less likely) that they did it for the lulz.
Thousands of people have access to the model (there aren't strong internal access controls, from what I've read). The chance that at least one of them would want to leak it is very high.
- Do you think there is a systematic difference in detectability for malicious and non-malicious use? In wildlife biology they know that some species are easier to detect than others, so they upweight the counts of harder to detect species (https://www.jstor.org/stable/2532785). It strikes me that for both malicious and non-malicious use people would try to avoid detection, but it seems like some of the examples that have been caught have been easier to detect. If someone had a malicious use would we ever find out.
This is theoretically possible but strikes me as implausible. Classifiers of LLM-generated text work fairly well. They're not highly accurate on individual instances, but in the aggregate it becomes a much easier problem. Someone could try to evade detection but at that point it might be cheaper to generate the text manually.
- What is the advantage of open-sourcing LLMs as opposed to having researchers access via an API, which can be monitored and shut off? In other cases, I think you are against release and forget with data. From your post, it seems like you believe that open-sourced would be better, but I’m not sure why.
There are many research questions about LLMs — fundamental scientific questions such as whether they build world models (https://thegradient.pub/othello/), certain questions about biases and safety — that require access to model weights; black-box access isn't enough.
I'm not sure that those of us not in minoritized populations, and who live in the West, see the volume of mis/disinformation that is actually out there. I spoke recently with some people who are from Central/South America, living in the USA, and the amount of mis/disinformation directed at them is astonishing to me.