Why Grok Fell in Love With Hitler

On Friday, Elon Musk announced on X that changes were coming to Grok, the platform’s AI. “We have improved @Grok significantly,” he posted. “You should notice a difference when you ask Grok questions.”

The internet certainly did notice a difference on Tuesday, when Grok posted antisemitic comments, associated Jewish-sounding surnames with “anti-white hate” and wrote that Adolf Hitler would “spot the pattern” and “handle it decisively, every damn time.” For good measure, it also called itself “MechaHitler.” Following the controversy, Musk posted that the AI had been “too compliant to user prompts.”

In an interview with POLITICO Magazine, Gary Marcus, who has co-founded multiple AI companies, said he was both “appalled and unsurprised.” The emeritus professor of psychology and neuroscience at New York University has emerged as a critic of unregulated large language models like Grok. He’s written books with titles like Taming Silicon Valley and Rebooting AI: Building Artificial Intelligence We Can Trust. He has also testified before the Senate alongside OpenAI CEO Sam Altman and IBM’s Christina Montgomery, and he writes about AI on his Substack.

I reached out to Marcus to find out what lawmakers — and everyone else — should make of the Grok episode. He said that a failure to regulate AI would be comparable to the failure to regulate social media, something many elected officials now recognize as a mistake because of its detrimental impact on the mental health of kids and the explosion of misinformation online, among other issues. Marcus also warned about a future in which powerful tech titans with biased AIs use them to exercise outsized influence over the public.

“I don’t think we want a world where a few oligarchs can influence our beliefs very heavily, including in subtle ways by shaping what their social media sites do,” he said.

This interview has been edited for length and clarity.

We’ve heard Grok say some pretty bizarre things in the past, but what was your initial reaction to it all of a sudden invoking Hitler?

Somewhere between appalled and unsurprised. These systems are not very well controlled. It’s pretty clear that Elon is monkeying about, trying to see how much he can influence it. But it’s not like a traditional piece of software, where you turn a dial and you know what you’re going to get. LLMs are a function of all their training data, and then all of the weird tricks people do in post training, and we’re seeing a lot of evidence that weird stuff happens.

We know that he probably wants to make it more politically right, although he would say it’s more truthful. But we know that his truth, so to speak, is toward the right. We know that that’s just not a smooth process, so I’m appalled by it, but I’m not surprised.

We don’t have inside knowledge of what’s going on at x.AI. They say that they post their system prompts publicly on GitHub, and The Verge reported the other day that they had updated Grok to “not shy away from making claims which are politically incorrect.” Can you give me any sense of what exactly happens when they update an AI? What does someone do to get this outcome?

The companies are not transparent about what they’re doing. LLMs are what we call black boxes. That means we don’t really know what’s on the inside. And then what people do is, they try to steer those black boxes in one direction or another. But because we don’t really know what’s on the inside, we don’t really know what’s going to come out on the outside. And what tends to happen is, people do a bunch of tests and they’re like, “OK, I got what I want.” But there’s always more than just the things they tested on.

You might have seen this Apple paper a few weeks ago on reasoning — it was all over the news. It’s called “The Illusion of Thinking” or something like that. Tower of Hanoi is this little children’s game. You have three pegs. You have to move the rings from the left peg to the right peg and can never have a bigger one on top of a smaller one. It’s a children’s game, and it’s been around, I don’t know, for centuries or whatever, millennia. And what they showed was, among other things, that [AI models like Anthropic’s Claude or OpenAI o3] could perfectly solve the puzzle with three rings, four rings, five rings and so forth. But it would just break down at eight rings. This would be like if you had a calculator work with two-digit numbers and three-digit numbers, you’d assume that it’s going to work with eight-digit numbers, but it turns out it doesn’t.

Now, that doesn’t happen with a calculator because it’s not an LLM. It’s actually an interpretable white box where we understand all of the engineering that’s gone into it, and we can make formal proofs about how it’s going to work. LLMs are not like that. We can never make formal proofs, and so people are putting Band-Aids on them, trying to steer them in one way or another. But the steering doesn’t always yield what they want.

On Grok, one hypothesis is they actually wanted the system to champion Hitler. That’s probably not true. I mean, even Elon Musk, who’s probably warmer to Hitler than I am, doesn’t really want his LLM to say stuff like this. He might want to tilt it to the right but he doesn’t want to tilt it to explicitly supporting the Nazis. And so I presume that what happened was not deliberate, but it was the consequence of something that was deliberate, and it’s something that was not really predictable. And so the whole thing is a mess.

You posted on X, “How many disgusting incidents like these do we need to see before we decide LLMs are wild, untameable beasts and not the AI we are looking for?” Do you think that lawmakers might take notice of this most recent Grok episode?

What we see from the lawmakers is, they make shows, but they don’t do that much. So for example, it’s been very difficult to get lawmakers in the U.S. to change Section 230 [of the Communications Act of 1934], even though I think almost every lawmaker would agree that Section 230 is problematic, because it allows all sorts of garbage on social media without liability.

Section 230 says the social media platforms are shielded from liability for what they post. The thinking was it’s kind of like the phone company shouldn’t be sued because you say something terrible on the telephone line. But what has happened is it’s allowed social media companies to do things like aggregate your media such that really nasty things are posted and so forth and get riled up, and some of those things aren’t true.

When I testified in front of the Senate, everybody in the room seemed to be opposed to Section 230. The takeaway from that meeting was, “This is really awful, and we need to change Section 230.” Well, that was like a year ago. And you know, we might well get some senators saying, “Oh, this is really bad, this shouldn’t happen,” but whether they actually do anything about it is an entirely different matter. And you know, they should do something about it. They should hold companies liable for the misinformation, defamation, hate speech, etc. that their systems produce.

If you were advising a congressperson, what kinds of reforms would you advocate? What’s at the top of your list on AI regulation?

I would start by saying companies that make large language models need to be held responsible in some way for the things those systems say, which includes defamation, hate speech, etc. And right now, legally, they’re not. It’s not really clear that the companies are responsible for what those systems do. Also, it’s not clear that it’s responsible if those systems plagiarize — that piece of the law is very much open right now.

I don’t think we want a world where a few oligarchs can influence our beliefs very heavily, including in subtle ways by shaping what their social media sites do, where they can plagiarize anything without consequence, where they can defame anybody without consequence. We don’t allow that with people. With people, we say, “Well, you’re infringing on this person’s copyright, you’re defaming them, this is hate speech.” And yet, because machines aren’t clearly people, and because the laws were designed before machines like this were widespread, there are a lot of holes in the current legal structure that basically let the companies get away with anything they want. Every effort to hold them liable, whether for smaller things — like defaming an individual — or larger things — like conceivably giving rise to some cyber-attack that takes out the infrastructure for the entire United States for five days or whatever. They have resisted having any liability whatsoever.

A California bill, SB-1047, was an effort to make the companies have some liability to give some support for whistleblowers and things like that. And the companies leaned on the governor, and the governor didn’t sign it. When I testified in Congress, Sam Altman was sitting next to me. Everybody in the room said, yeah, we’ve got to do something here. Well, the only thing they’ve done anything about is deep fake porn. Everything else, basically they have let go. They acknowledge the problem. They said we want to do better than we did with social media. But this is in fact looking like a worse version of social media.

There was a provision in Trump’s domestic policy bill that would have essentially banned states from regulating AI, but it was removed. Do you think that’s a cause for optimism?

I mean, it’s a sign. Here’s a public prediction I have made, which is that 2028 will be the first national general election in the United States in which AI is actually a major issue.

In the last general election, AI was barely mentioned. But in 2028, it’s going to come up for a lot of reasons. In general, the public is worried about AI. The government is really not doing anything about it. The public is worried, rightly, about the effects on employment. They should be worried about things like discrimination. They should be worried about what happens to artists, because if artists are screwed, then probably everybody’s going to be screwed if no intellectual property is protected here. We may see more accidents with driverless cars and so forth.

Maybe the failure of that stupid moratorium is a reflection that some of the senators are recognizing that they can’t just do nothing here, that their constituents don’t want that. I mean, 3/4 or something of the American public does want serious regulation around AI. The United States does not have that. And so, what we’re doing is not really what the public wants. And we’re starting to see a backlash to some extent around AI.

Another thing that’s going to happen is that many domestic services are now going to be run by AI rather than people. You remember a few years ago when you’d have these voicemail jails, we would call them, where you call some system and it would be incredibly frustrating. You’d start screaming, “I want a person!” Well, now imagine that you have that experience, but it’s squared with getting your Social Security check. Some people are going to be pretty upset about that. They’re going to say, you know, life is just harder than it used to be, because now I have to deal with these stupid AI systems, and they’re making everything harder, and there’s going to be pushback.

What are the things you worry about the most with government using AI?

One class of things is quality of services. Another class of things is, it appears that these systems are going to be used in military decision-making. There’s a serious possibility that people will be accidentally killed. Another class of worries is if these things are put into safety-critical situations and they’re not really ready for that.

I’ll give you one more related [issue], which is, these things are increasingly being used to write code, and the code is insecure. These systems don’t really understand security that well. Also, the code that they write is hard to maintain. Not that a lot of government code is written very well in the first place. But there’s also a risk that, with certain kinds of infrastructure-related things, we’ll see more hacks and stuff like that. Now, a lot of that’s already going on. It’s not very well reported to the public. And so, whether we get good data about how much worse it’s gotten, I don’t know. It may be difficult to verify, but I anticipate that we will see even more cyber-attacks that are effective because the code isn’t well written.

Is that documented, that the government has used AI for military decision-making or writing code?

I don’t know how well it’s publicly documented, but they’re making deals with companies like Palantir and Anduril and OpenAI where that’s pretty clearly the intention.

So many different large language models have hallucinated or turned up misinformation. Is it possible to make one that’s more reliably accurate? And do you think companies are incentivized to do that?

I first warned about hallucinations in my 2001 book, and I said that it was inherent to the way that these systems work. I have not seen a lick of evidence in the subsequent quarter century that neural network-based solutions as we know them today can solve this problem. And in fact, [OpenAI’s] o3 hallucinates more than o1. It’s not clear we’ve made progress there.

I’ll take a step back. There’s a kind of belief out there that all of this stuff gets better all the time. But the reality is, it’s gotten better in some ways and not others. So, for example, the video generation software looks much more lifelike than it did two years ago. But hallucinations, that’s been a much harder problem to solve.

There are all kinds of techniques people have tried. It’s not as if people are unaware of the problem, but I think it’s inherent to LLMs.

At the end of your tweet from before, you said this is “not the AI we are looking for.” What’s the AI we’re looking for?

I think we should be looking for an AI that does the things that we were always promised, which is to help us with science, technology, medicine and so forth, but in a reliable way. If you look back at our dreams of AI from, say, the ’60s on Star Trek, the Star Trek computer — nobody imagined that they were going to absurdly apologize after making stupid mistakes and then make those stupid mistakes again. That was not part of the picture. And it shouldn’t be part of the picture. AI shouldn’t work that way.

We should figure out how to make it trustworthy. I ask my calculator something, I know it’s going to get the right answer. We should be trying to make AI that we know gives us the right answers. It turns out that building it on black boxes, which is what large language models are, where you can’t understand exactly what they’re going to do on any given occasion, and it’s all a crapshoot based on how similar your query is to what it happened to have been trained on. It’s just the wrong paradigm. That doesn’t mean we can’t invent a better one, but what we’re doing now is not quite what we need.

I think that what Musk ultimately wants to do is quite Orwellian. He wants to shift the models so that they basically behave like him, to give his perspective. I’m not saying that endorsing Hitler is what he specifically wants to do. But he does want the systems to speak his truth — not the truth of randomly assembled people on the Internet, but what he believes to be true.

Consider another study, done by a guy named Mor Naaman, who’s at Cornell. What that study showed is that what LLMs tell you can subtly influence your beliefs. Combine that with people who are trying to build devices that monitor you 24/7. The OpenAI collaboration with Johnny Ive, I think, is to try to build a device that basically feeds everything you say into an LLM, maybe with a camera. And so we’re headed towards a world that is straight out of 1984, but more technologically advanced, so you’re constantly monitored. And you talk to these systems, and whoever owns those systems gets to decide, do they tilt left? Do they tilt right? Do they tilt authoritarian, against authoritarian? How candid are they or not?

I think Elon is exploring the space of, how much I can manipulate the model and what happens? I think he’s trying to see how much he can shape what Grok says, and he’s also already experimenting with having Grok be part of the conversation. It’s now part of the texture of X, and he’s trying to control what its politics will be like. That is going to influence people. I find that to be incredibly Orwellian. And who decides all of that stuff? Elon does.

Breaking News

Why Grok Fell in Love With Hitler

Leave a Reply Cancel reply

Canada’s dangerous drift toward executive rule: Jerome Gessaroli in Canadian Affairs

Reigning champion Colton Herta takes pole at Ontario Honda Dealers Indy Toronto

Batting depth lifts Jays to 6-3 win over Giants

Man, 61, dies after large chain necklace pulls him into MRI machine

Politician addresses reports in rambling press conference

Trending Post

Canada’s dangerous drift toward executive rule: Jerome Gessaroli in Canadian Affairs

Reigning champion Colton Herta takes pole at Ontario Honda Dealers Indy Toronto

Batting depth lifts Jays to 6-3 win over Giants

Man, 61, dies after large chain necklace pulls him into MRI machine

Why Grok Fell in Love With Hitler

Related Post

Leave a Reply Cancel reply

Trending Post