Note: This post now archived and as such no longer works

An external image showing your user-agent and the total "hit count"

  • TriLinderOP
    link
    fedilink
    arrow-up
    202
    arrow-down
    1
    ·
    1 year ago

    This is possible because Lemmy doesn’t proxy external images but instead loads them directly. While not all that bad, this could be used for Spy pixels by nefarious posters and commenters.

    Note, that the only thing that I willingly log is the “hit count” visible in the image, and I have no intention to misuse the data.

    • targetx@programming.dev
      link
      fedilink
      arrow-up
      60
      arrow-down
      1
      ·
      1 year ago

      Nice example!

      I think proxying everything through lemmy would have a pretty big bandwidth/scalability impact. I expect the lemmy clients dont send any unique user info on these image requests so not sure how useful it would be as a spy pixel? Maybe I’m missing something :-)

      • Goddard Guryon@sopuli.xyz
        link
        fedilink
        arrow-up
        17
        ·
        edit-2
        1 year ago

        It would be interesting to see just how much info is shared when lemmy requests the image. If there is [potentially] sensitive info being shared, the devs might be interested in working on it too (I have no idea how to check such a thing, this comment is just so I can find the post later when more people have shared their wisdom on it)

        • Muddybulldog@mylemmy.win
          link
          fedilink
          English
          arrow-up
          36
          ·
          edit-2
          1 year ago

          None (by Lemmy), as Lemmy doesn’t actually request the image (that would be proxying). Your browser requests the image directly by URL. Lemmy, technically, doesn’t even know an image exists. It just provides the HTML and lets your browser do the work.

          • A_A@lemmy.world
            link
            fedilink
            arrow-up
            17
            ·
            edit-2
            1 year ago

            Exactly. The text of this post is simply :

            ![An external image showing your user-agent and the total "hit count"](https://trilinder.pythonanywhere.com/image.jpg)
            I get the same result when I browse directly to the link.

            So, if OP links a malcious website we have a problem … (?).

            • Goddard Guryon@sopuli.xyz
              link
              fedilink
              arrow-up
              10
              ·
              1 year ago

              Oh dangit, it’s simpler than I thought. So the only data being sent is…just whatever is sent in your average GET request.

              • newIdentity@sh.itjust.works
                link
                fedilink
                arrow-up
                13
                ·
                1 year ago

                Yes. It’s also a pretty standard way of serving images. A lot of Email clients do that too.

                That’s also how these services that show you when a email is read work.

            • newIdentity@sh.itjust.works
              link
              fedilink
              arrow-up
              7
              ·
              edit-2
              1 year ago

              Not really that huge of a problem. When making requests you also usually send a header which includes the user agent.

              The program just logs how many times the image has been requested and it reads the user agent data. No Javascript is actually executed.

              Well it might be possible to have a XSS somehow but I haven’t really done much research into this possibility.

              In general it’s a pretty standard way of handling embedded images. Email does this too. That’s how you have these services that can check if someone read a mail

          • CoderKat@lemm.ee
            link
            fedilink
            English
            arrow-up
            4
            ·
            1 year ago

            Yup. And to add, your browser will send things like:

            1. Your IP address. Technically this is sent by the OS doing networking and is unavoidable. At best, a VPN can hide this, because the VPN sits in the middle.

            2. Various basic request headers, which most notably contains user agent (identifies browser) and language headers, both which you can fake if you want to.

            3. Cookies for that domain (if you have any). Those can track you across multiple requests and thus build up a profile of you.

            • odbol@lemmy.world
              link
              fedilink
              arrow-up
              1
              ·
              1 year ago

              That’s why you should use a native app, which won’t send any of that identifying info (except for IP but there’s nothing you can do on that)

    • ono@lemmy.ca
      link
      fedilink
      English
      arrow-up
      24
      ·
      edit-2
      1 year ago

      Notably, this allows remote parties to associate your IP address with your interests, as revealed by the Lemmy communities that you browse.

      One way is for the image host to use the HTTP Referer field. (Standards-respecting web browsers pass the URL of the web page being viewed to the server hosting the image.)

      Another way is by posting an image with a unique URL.

      Even if Referer is withheld and the image is not unique, the image host can still do basic fingerprinting of your client’s request header and your OS’s TCP quirks, and associate that fingerprint with your IP address.

      An option for Lemmy to proxy media would be very helpful. Small instances could perhaps disable it, although they might not need to, since the additional load would scale with the number of users on that instance.

      • PoliticalAgitator@lemm.ee
        link
        fedilink
        arrow-up
        7
        ·
        edit-2
        1 year ago

        Notably, this allows remote parties to associate your IP address with your interests, as revealed by the Lemmy communities that you browse.

        I suspect with a coordinated pool of posts or multiple comments on the same post, you could narrow that IP address down to an actual user account.

        When a new comment is posted by a user, store, against their username, all IP addresses that visited since the last comment in that thread (by anyone). When a second comment is posted by a user, remove any IP addresses that don’t appear in both lists.

        I suspect you would have a very short list after two comments, and a single address after 3. It would also be extremely easy to both lure someone into viewing an image and bait them into multiple replies. Geolocate that IP and you know know vaguely where that user lives.

        Time to make sure you’re always on a VPN I guess.

        • TriLinderOP
          link
          fedilink
          English
          arrow-up
          4
          ·
          1 year ago

          You could also send the image through a DM if you want to find a particular user

        • ono@lemmy.ca
          link
          fedilink
          English
          arrow-up
          3
          ·
          1 year ago

          Even without that, once your Lemmy interests are sold/shared by IP address, they can be associated with your real identity as soon as you log in to a service that knows who you are.

    • lazylion_ca@lemmy.ca
      link
      fedilink
      arrow-up
      17
      ·
      1 year ago

      Were you expecting otherwise? Loading an external image is no different than loading an external website with images. Lemmy and reddit are link aggregators, not proxies. Having to proxy everything would run a significant bandwidth for instance admin who are often paying out of pocket for hosting.

    • Anticorp
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      1 year ago

      How do you get an image to run code? I guess I somehow missed something important in website development.

      Edit: I saw that you said you’re using Pillow to actually render the image from code. That’s neat! …and scary

      • CoderKat@lemm.ee
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        Proxying external images means that instead of the image being downloaded from the original link, your Lemmy server would download it and serve it for you. The Lemmy server acts as a proxy.

        But it means performing a lot of extra traffic. And realistically you’d want to cache the image because otherwise your server will likely get banned for the high volume of requests you send. But caching the images requires more storage and can have potential for legal issues.

        And images are one thing, but literally any content is the problem. Images are just the most obvious because they often load without even having to click on the image and thus you’ll get far higher volume of user data. Literally anything you link to has this issue and you cannot proxy all of it.

    • roon
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      1 year ago

      Share source code? I’m curious

      • TriLinderOP
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        It’s just a simple Flask server. I parse the user-agent using the user_agents Python library, apply some conditionals upon the result, render the image using Pillow and send it to the user.

  • Anticorp
    link
    fedilink
    arrow-up
    56
    ·
    1 year ago

    Oh neat, Jerboa doesn’t identify itself. Cool.

  • rektifier@sh.itjust.works
    link
    fedilink
    arrow-up
    55
    ·
    edit-2
    1 year ago

    I’m fine with this. Instances shouldn’t proxy or cache images because it opens instance owners to a lot more liability than text. A client side setting to not load images in comments by default is better.

    • FancyFeaster@lemmy.fail
      link
      fedilink
      English
      arrow-up
      7
      ·
      1 year ago

      Each instance stores post thumbnails locally even if the post was on another server. It actually takes up quite a bit of hdd space.

  • edric@lemm.ee
    link
    fedilink
    arrow-up
    51
    ·
    edit-2
    1 year ago
    • Mlem - knows exactly that it’s Mlem.
    • Memmy - sees Mobile Safari webkit.
    • Voyager - same as Memmy.
    • Thunder - just sees Mobile Client.
  • Zetaphor@zemmy.cc
    link
    fedilink
    English
    arrow-up
    35
    arrow-down
    1
    ·
    1 year ago

    Salient demonstration, but if image proxying were to come to Lemmy I’d hope it was made optional, as it could overburden smaller instances, especially one-person instances (like mine). We also need a simple integrated way of configuring object storage.

    • ReversalHatchery@beehaw.org
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      A better solution could be having an image proxy as a separate service, and somehow managing a list of proxies that are used for loading the image. Of course the clients themselves would have to deal with choosing to use the proxy… except if the backend serves the proxied image URL instead of the original one (and maybe that too under a new name)

  • Forcen@lemmy.one
    link
    fedilink
    English
    arrow-up
    20
    arrow-down
    1
    ·
    edit-2
    1 year ago

    Easiest way to stop this from happening is to use ublock origin to block all third party request on your instance.

    One way to do this is via dynamic filtering. This is for advanced users so be sure to read the info page: https://github.com/gorhill/uBlock/wiki/Dynamic-filtering

    (Consider backing up your ublock settings before doing this)

    If you are using lemmy.ml your rule would be this:

    lemmy.ml * 3p block
    

    if you’re using another instance then change the domain or use both rules cause you might end up visiting the others as well. Note that adding this rule wont work unless enable advanced features in ublock origin.

    EDIT: THIS MIGHT BREAK THINGS ON YOUR INSTANCE, its recommended to learn how to use dynamic filtering to unbreak it: https://github.com/gorhill/uBlock/wiki/Dynamic-filtering:-quick-guide If it breaks stuff just remove that rule.

    You could also block it using static filters but I can’t remember how to do that exactly, if you know please reply below.

  • minorsecond@lemm.ee
    link
    fedilink
    English
    arrow-up
    16
    ·
    1 year ago

    I’ll be damned. I tried this from three different platforms and you’ve nailed it.

    • _I_@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      Yeah, I’m using Mullvad with misc DNS blockers enabled so it has nothing on me ᕕ( ᐛ )ᕗ

    • sfgifz@lemmy.world
      link
      fedilink
      arrow-up
      6
      ·
      edit-2
      1 year ago

      It says unknown (mobile?) client for me too, using Sync with Bluetooth and location enabled and Play Store Services installed.

      Whoever wrote that image tracking over-hyped it?

      • TriLinderOP
        link
        fedilink
        English
        arrow-up
        7
        ·
        1 year ago

        The user-agent detection definitely isn’t great, this was just meant as a quick proof of concept for anyone curios.

        • Anticorp
          link
          fedilink
          arrow-up
          3
          ·
          1 year ago

          It successfully identified Firefox when I checked it from the browser. Maybe some of the apps don’t identify themselves in the useragent string?

  • jozo@lemmy.sdf.org
    link
    fedilink
    arrow-up
    13
    ·
    1 year ago

    What does it say? on jerboa is states that i use unknown mobile client, with infinity, android client. All i have is adaway on my phone

  • ares35@kbin.social
    link
    fedilink
    arrow-up
    12
    ·
    1 year ago

    for a little extra creepiness, modify the image-generating script to add geoip location data and http referer to the image.