• Fmstrat@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    7 days ago

    This seems like an invalid test.

    One of them collected posts from Hacker News and LinkedIn profiles and then linked them by using cross-platform references that appeared in user profiles. They then stripped all identifying references from the posts and ran a large language model on them.

    If I post something on LinkedIn, and then post the same thing on Hacker News, of course an LLM could match my accounts up.

    Am I missing something?

  • FauxPseudo @lemmy.world
    link
    fedilink
    English
    arrow-up
    48
    arrow-down
    1
    ·
    8 days ago

    From a Facebook post I made on February 17th:

    There are giant AI data firms that promise they can go through massive troves of data and pull out general and specific information from them. Information that is actionable and accurate. Give it 6 million data points and it’ll find all the links and organize them for you and unmask hidden details that aren’t visible to the naked eye.

    Not one of those companies is stepping up to go through the publicly released Epstein files.

    • Spaniard@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 days ago

      Today I asked AI to tell me which phone providers were available short by price and offers and it lied all the time, when I pointed it the AI corrected most of it but also removed some that were accurate for some reason.

      It would have been quicker if I did that myself instead of ask AI, oh also didn’t provide all companies.

      Maybe those companies have better AI that can make no mistakes but I doubt it, I think the LLMs will lie and no one has time to check if they are correct.

        • General_Effort@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          8 days ago

          I don’t think you can do literally the same thing on the Epstein files. Maybe I’m misunderstanding what you have in mind.

          • FauxPseudo @lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            8 days ago

            In theory, using the information and the released files and the information the public sources, it should be possible to figure out who those redacted names are based on writing style and other factors. We should be able to deanonymize.

            • General_Effort@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              8 days ago

              Hmm. Maybe but it is not the same problem as those discussed in OP. I also have some doubts about the paper, but that’s another story. You could try it out?

              • FauxPseudo @lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                7 days ago

                I’m not qualified to design the prompts and home users can’t really pile in 3 million+ documents.

                • General_Effort@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  7 days ago

                  Prompts are in the appendix: https://arxiv.org/abs/2602.16800

                  I don’t know how far you get on the free tier but it should be at least enough for a proof of principle; to get other people to chip in. You didn’t have qualms demanding other people should do this for free.

                  Mind that this is a serious GDPR violation in Europe. So there will be serious pressure on AI companies to prevent this kind of use.

  • DarkCloud@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    ·
    9 days ago

    Great, we’re at a point where “researchers” are helping tech bros hurt the public interest. Could they just NOT publish this shit? Stop giving helpful tips to tyrannical oligarchs!

    Academics can be stupid idiots sometimes.

    • ShotDonkey@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      8 days ago

      Tbh I read the research article and it’s not rocket science that they were doing. Any 2nd rate FBI analyst would have come up with these ideas sooner or later to try and match anonymous profiles with veryfied ones using LLMs.

    • zerofk@lemmy.zip
      link
      fedilink
      English
      arrow-up
      1
      ·
      8 days ago

      Researchers’ work has always been abused by others. The advancement and free distribution of knowledge should not be curtailed for fear of malicious parties.

  • maplesaga@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    2
    ·
    9 days ago

    Average people download gamed and apps and their phone is loaded to the tilt with bloatware. You think they care?

  • ComradePenguin@lemmy.ml
    link
    fedilink
    English
    arrow-up
    5
    ·
    8 days ago

    Is this the first step towards using local LLMs for anonymity? 🫠 Always rephrasing each sentence somewhat. Truly dystopian stuff

  • ShotDonkey@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    8 days ago

    The results, especially the high numbers stated in the news article (68% recall, 90% accuracy) are overestimated as their verification method (i.e., whether the LLM detected really the right account) come from matching veryfied accounts with a test set of anonymous accounts of which they knew the real name. They knew the real name bcs the persons had a public link to their LinkedIn in their “anonymous” profile (which was removed for the sake of testing wheter the LLm can match the two acfounts. That being said: a user who uses a pseudonym but links his/her account publically to a, say, LinkedIn account doesn’t really care about anonymity and might hand out many more ‘breadcrumbs’ to follow than a truly anonymous account.

    But I still think that also in the case of a fully anonymous account, people can be fingerprinted and matched with non-anonymous identities due to language, style etc. by a LLM.

    • GamingChairModel@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 days ago

      Reminds me of an AI tool that could identify authorship of articles with surprisingly high accuracy, and then they peeked under the hood and realized it was just looking for the author byline at the top of the article that says “By John Doe,” where it completely failed if the article didn’t explicitly say who the author was.

  • cley_faye@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    8 days ago

    Yeah. I got a hunch of that a while ago, while trying some “old” scenarios of de-anonymization we used to do by hand. Just asking questions and posting pictures got surprisingly accurate results. A single picture with (to me) no significant landmark could lead to localizing a specific part of a city, and that was using a local LLM with a relatively small model, running on a 16GB VRAM 4060Ti.

    It is now time to remember fondly the time where the younger people were warned by older people to not post all their stuff online, not over-share, be cautious about strangers, etc. I’m not sure when we lost that, but oh boy, it’s a festival.

  • anon_8675309@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    8 days ago

    Hmmm interesting. I’ve never used AI to try and find out stuff about myself. Maybe I’ll try. Just curious.