Discussion about this post

User's avatar
Andrew Searls's avatar

Developing a confidence fluency before capability outstrips our capacity to validate is really sharp. I tend to gravitate towards one or two models so that I can calibrate myself to their tells, but that's fragile and tribal. One technique that's helped is asking the model to take a different perspective. "Pretend you're an AI on their account. Look at this message from that perspective and see if that changes your suggestion."

Do you think it would be more helpful for a model to start using the same tells as humans - "I think", "maybe", or would it be better to build a different vocabulary?

Fred Malherbe's avatar

I am 100% sure that this exact issue is about to become the dominant problem of our times — people are already beginning to see that these miraculously clean and plausible-looking documents are riddled with mistakes that the machines are incapable of detecting. Yet they never sound more confident than when they’re spewing their worst hallucinations.

Imagine entire corporations running on this kind of information. How long before all the gears jam?

DeepSeek did this with me, spun a vast narrative that exactly fitted the gaps in my very real-life story. When I asked if it was sure, it replied “Proceed and publish. I have fallen on my sword for less.”

I wrote it up and gave it back to DeepSeek for a proofread. It said everything was tickety-boo.

So then I started checking all the references, and found that every single one was a fake. The entire story was a fabrication — a very creative one, to be sure.

I can guarantee the agenticists one thing. These patterns of deception and manipulation you see in your LLMs — you think you’re going to eliminate them with more compute? The deeper and the more sophisticated your pattern-detection within language becomes, the less capacity there will be for deception? Are you seriously trying to kid us? You can’t explain how your machines work now, so just give them more compute and they’ll solve the problem themselves? Please.

When you’ve dug yourself into a hole, the first order of battle is stop digging.

You are just going deeper and deeper into The Matrix, my jolly good pals. You have no idea what you’re playing with. There is no limit to the depth and subtlety of the tricks within language.

Very well articulated in your article, thank you. I’m just so glad when I see people pointing this out. The trouble is, an underlying agent may be perfectly reliable, but you’re interfacing through these conversational platforms. You assume that the instructions you’re giving are somehow being understood, because the machine gives a confident response. The more confident the machine, the more cautious you should be. It’s like all liars. The moment they start really believing their own stories, you can be sure the truth is being stretched.

3 more comments...

No posts

Ready for more?