Mar 7·edited Mar 9Author

Our colleague Matt Salganik emailed us some great questions. I'm answering them here since it's a good opportunity to fill in some missing detail in the post.

- I’m surprised this model was leaked. Has that ever happened before? Why would someone do that? Usually leakers have some motivation. For example, is this the case of a person being tricked by an LLM? Is this someone angry at Meta? I read the article you linked to but there was nothing about motive.

Many people in the tech community, perhaps the majority, have strong feelings about open source and open science, and consider it an injustice that these models are being hoarded by companies. Of course, many of them are also angry at being denied these cool toys to play with, and it's hard to tell which motivation predominates.

The leak was coordinated on 4chan, so it's also possible (though less likely) that they did it for the lulz.

Thousands of people have access to the model (there aren't strong internal access controls, from what I've read). The chance that at least one of them would want to leak it is very high.

- Do you think there is a systematic difference in detectability for malicious and non-malicious use? In wildlife biology they know that some species are easier to detect than others, so they upweight the counts of harder to detect species (https://www.jstor.org/stable/2532785). It strikes me that for both malicious and non-malicious use people would try to avoid detection, but it seems like some of the examples that have been caught have been easier to detect. If someone had a malicious use would we ever find out.

This is theoretically possible but strikes me as implausible. Classifiers of LLM-generated text work fairly well. They're not highly accurate on individual instances, but in the aggregate it becomes a much easier problem. Someone could try to evade detection but at that point it might be cheaper to generate the text manually.

- What is the advantage of open-sourcing LLMs as opposed to having researchers access via an API, which can be monitored and shut off? In other cases, I think you are against release and forget with data. From your post, it seems like you believe that open-sourced would be better, but I’m not sure why.

There are many research questions about LLMs — fundamental scientific questions such as whether they build world models (https://thegradient.pub/othello/), certain questions about biases and safety — that require access to model weights; black-box access isn't enough.

Expand full comment

I'm not sure that those of us not in minoritized populations, and who live in the West, see the volume of mis/disinformation that is actually out there. I spoke recently with some people who are from Central/South America, living in the USA, and the amount of mis/disinformation directed at them is astonishing to me.

* https://www.tandfonline.com/doi/full/10.1080/00963402.2021.1912093

* https://www.alethea.com/

Expand full comment

Doesn't an LLM have a good ability to predict vectors that determine individual-level believability/success of some scammy thing? Like, scammers do what they do because it works better on average than everything, but if they could automate the process of personalizing messages to increase conversion, of course they would, and it seems that this particular technology is helpful because of the ineffable nature of what each person individually motivated by.

So, the risk of LLMs isn't that it unleashes a tidal wave of spam/scams, but that it will greatly increase their effectiveness.

Expand full comment

" the cost of producing disinfo, which is already very low."


I think the real risk from these models is not the risks of blatant mis-use. That will happen, but AI is just a tool and like any other tool can be used for malicious purposes.

The risks from advanced tech is more about creating tools that are SO USEFUL that we 'forgive' the subtle flaws in them. That might come not from being terrible, but being really good while also having some level of hallucination or biases or flaws that then seep in and get past our guardrails of human judgement. Having tested chatGPT on things related to US history or other topics, I find it good enough in terms of general knowledge but over-confident in some specifics, and of course doesn't give sources unless you are careful to prompt properly and check. So its quite plausible that students might treat it like a wikipedia for some things, even though its a black box and not 100% reliable. These flaws can be fixed, eg, with verification AIs and prompt reliability engineering, but that's not guaranteed all users and all tools will follow that. We can fix AI, but cant fix humans deciding to ignore risks and flaws.

Expand full comment