LLMs can unmask pseudonymous users at scale with surprising accuracy

return2ozma@lemmy.world · 9 days ago

LLMs can unmask pseudonymous users at scale with surprising accuracy

Fmstrat@lemmy.world · 7 days ago

This seems like an invalid test.

One of them collected posts from Hacker News and LinkedIn profiles and then linked them by using cross-platform references that appeared in user profiles. They then stripped all identifying references from the posts and ran a large language model on them.

If I post something on LinkedIn, and then post the same thing on Hacker News, of course an LLM could match my accounts up.

Am I missing something?

FauxPseudo @lemmy.world · 8 days ago

From a Facebook post I made on February 17th:

There are giant AI data firms that promise they can go through massive troves of data and pull out general and specific information from them. Information that is actionable and accurate. Give it 6 million data points and it’ll find all the links and organize them for you and unmask hidden details that aren’t visible to the naked eye.

Not one of those companies is stepping up to go through the publicly released Epstein files.

Spaniard@lemmy.world · 8 days ago

Today I asked AI to tell me which phone providers were available short by price and offers and it lied all the time, when I pointed it the AI corrected most of it but also removed some that were accurate for some reason.

It would have been quicker if I did that myself instead of ask AI, oh also didn’t provide all companies.

Maybe those companies have better AI that can make no mistakes but I doubt it, I think the LLMs will lie and no one has time to check if they are correct.

General_Effort@lemmy.world · 8 days ago

There were reports of people trying to unredact the files almost immediately.

FauxPseudo @lemmy.world · 8 days ago

But that’s not the same, is it?

General_Effort@lemmy.world · 8 days ago

I don’t think you can do literally the same thing on the Epstein files. Maybe I’m misunderstanding what you have in mind.

FauxPseudo @lemmy.world · 8 days ago

In theory, using the information and the released files and the information the public sources, it should be possible to figure out who those redacted names are based on writing style and other factors. We should be able to deanonymize.

General_Effort@lemmy.world · 8 days ago

Hmm. Maybe but it is not the same problem as those discussed in OP. I also have some doubts about the paper, but that’s another story. You could try it out?

FauxPseudo @lemmy.world · 7 days ago

I’m not qualified to design the prompts and home users can’t really pile in 3 million+ documents.

General_Effort@lemmy.world · 7 days ago

Prompts are in the appendix: https://arxiv.org/abs/2602.16800

I don’t know how far you get on the free tier but it should be at least enough for a proof of principle; to get other people to chip in. You didn’t have qualms demanding other people should do this for free.

Mind that this is a serious GDPR violation in Europe. So there will be serious pressure on AI companies to prevent this kind of use.

FlashMobOfOne@lemmy.world · 8 days ago

And it will falsely identify people at even greater scale, because it is an imprecise and buggy tool.

RblScmNerfHerder@lemmy.world · 7 days ago

Yeah, but if it falsely identifies the right people, is it really buggy?

BlameTheAntifa@lemmy.world · 8 days ago

How dare you claim that the hallucination engine hallucinates. The Billionaires have declared this heresy.

DarkCloud@lemmy.world · 9 days ago

Great, we’re at a point where “researchers” are helping tech bros hurt the public interest. Could they just NOT publish this shit? Stop giving helpful tips to tyrannical oligarchs!

Academics can be stupid idiots sometimes.

ShotDonkey@lemmy.world · 8 days ago

Tbh I read the research article and it’s not rocket science that they were doing. Any 2nd rate FBI analyst would have come up with these ideas sooner or later to try and match anonymous profiles with veryfied ones using LLMs.

zerofk@lemmy.zip · 8 days ago

Researchers’ work has always been abused by others. The advancement and free distribution of knowledge should not be curtailed for fear of malicious parties.

maplesaga@lemmy.world · 9 days ago

Average people download gamed and apps and their phone is loaded to the tilt with bloatware. You think they care?

SupraMario@lemmy.world · 8 days ago

The average person puts their entire lives on Facebook or linkedin with their real names…they don’t give a shit.

Art3mis@lemmy.world · 8 days ago

“WeLl I hAvE nOtHiNg To HiDe”

SupraMario@lemmy.world · 8 days ago

The number of times I’ve heard this from people in the secops field is frighteningly high.

thedeadwalking4242@lemmy.world · 8 days ago

I call BS we can’t even get AI models to determine if an AI write text. This as go to me some magic statistics

ComradePenguin@lemmy.ml · 8 days ago

Is this the first step towards using local LLMs for anonymity? 🫠 Always rephrasing each sentence somewhat. Truly dystopian stuff

ShotDonkey@lemmy.world · 8 days ago

The results, especially the high numbers stated in the news article (68% recall, 90% accuracy) are overestimated as their verification method (i.e., whether the LLM detected really the right account) come from matching veryfied accounts with a test set of anonymous accounts of which they knew the real name. They knew the real name bcs the persons had a public link to their LinkedIn in their “anonymous” profile (which was removed for the sake of testing wheter the LLm can match the two acfounts. That being said: a user who uses a pseudonym but links his/her account publically to a, say, LinkedIn account doesn’t really care about anonymity and might hand out many more ‘breadcrumbs’ to follow than a truly anonymous account.

But I still think that also in the case of a fully anonymous account, people can be fingerprinted and matched with non-anonymous identities due to language, style etc. by a LLM.

GamingChairModel@lemmy.world · 7 days ago

Reminds me of an AI tool that could identify authorship of articles with surprisingly high accuracy, and then they peeked under the hood and realized it was just looking for the author byline at the top of the article that says “By John Doe,” where it completely failed if the article didn’t explicitly say who the author was.

XeroxCool@lemmy.world · 7 days ago

I can’t believe this product, modeled after humans, would lie and cheat like humans

deadymouse@lemmy.world · 8 days ago

For those who don’t know, we’ve been living in a dystopia since the 2000s.

cley_faye@lemmy.world · 8 days ago

Yeah. I got a hunch of that a while ago, while trying some “old” scenarios of de-anonymization we used to do by hand. Just asking questions and posting pictures got surprisingly accurate results. A single picture with (to me) no significant landmark could lead to localizing a specific part of a city, and that was using a local LLM with a relatively small model, running on a 16GB VRAM 4060Ti.

It is now time to remember fondly the time where the younger people were warned by older people to not post all their stuff online, not over-share, be cautious about strangers, etc. I’m not sure when we lost that, but oh boy, it’s a festival.

anon_8675309@lemmy.world · 8 days ago

Hmmm interesting. I’ve never used AI to try and find out stuff about myself. Maybe I’ll try. Just curious.

scala@lemmy.ml · edit-2 7 days ago

That’s how they get you

Widdershins@lemmy.world · 8 days ago

scarabic@lemmy.world · 8 days ago

Do y’all not write differently when you’re trying to be discreet on Blind?

DarkSideOfTheMoon@lemmy.world · 8 days ago

Brazil has 200 million ppl, how they would find someone in Rio like me?

how_we_burned@lemmy.zip · 7 days ago

Filter for Brazilian who are Pink Floyd fans
Filter for Brazilians who can speak English
Filter for Brazilians who are left/socialist and who are on alternative social media sites.

And so on

DarkSideOfTheMoon@lemmy.world · 7 days ago

Fuuuu