Lemmy.ml front page has been full of nginx errors, 500, 502, etc. And 404 errors coming from Lemmy.

Every new Lemmy install begins with no votes, comments, postings, users to test against. So the problems related to performance, scaling, error handling, stability under user load can not easily be matched given that we can not download the established content of communities.

Either the developers have an attitude that the logs are of low quality and not useful for identifying problems in the code and design, or the importance of getting these logs in front of the technical community and trying to identify the underlying patterns of faults is being given too low of a priority.

It’s also important to make each log of failures identifiable to where in the code this specific timeout, crash, exception, resource limit is encountered. Users and operations personnel reporting generic messages that are non-unique only slow down server operators, programmers, database experts, etc.

There are also a number of problems testing federation given the nature of multiple servers involved and trying not to bring down servers in front of end-users. It’s absolutely critical that failures for servers to federate data be taken seriously and attempts to enhance logging activities and triangulate causes of why peer instances have missing data be track down to protocol design issues, code failures, network failures, etc. Major Lemmy sites doing large amounts of data replication are an extremely valuable source of data about errors and performance. Please, for the love of god, share these logs and let us look for the underlying causes in hard to reproduce crashes and failures!

I really hope internal logging and details of the inner workings of the biggest Lemmy instances is shared more openly with more eyes on how to keep scaling the applications as the number of posts, messages, likes and votes continue to grow each and every day. Thank you.

Three recently created communities: !lemmyperformance@lemmy.ml!lemmyfederation@lemmy.ml!lemmycode@lemmy.ml

  • monobot
    link
    fedilink
    arrow-up
    10
    arrow-down
    1
    ·
    2 years ago

    I agree with you, but I think you sound a bit too harsh for developers.

    I think they are doing their best currently and have probably identified more immediate issues before addressing all that we see.

    There are other big instances which could share the logs, let’s ask lemmy.world and beehaw if they can share the logs and leave main developers to work.

    Another bug thing I am thinking can benefit from information sharing is bot account detection.

    I would like to take a look at that data and find ways to identify bots. I just don’t know what data can be useful, but will try to make my own instance and work on it.

    • RoundSparrowOP
      link
      fedilink
      arrow-up
      2
      arrow-down
      5
      ·
      edit-2
      2 years ago

      I agree with you, but I think you sound a bit too harsh for developers.

      The failure to inform users by official announcement or mention in the 0.18 release notes that Lemmy is failing to replicate data reliably I think is a failure of the project management. “your data doesn’t matter here on the Lemmy network”. Why are end users not being told that their messages are in fact not reliably being shared to other instances? Why are the server install and release notes not warning the community that each additional instance being brought online is increasing the replication workload of establishes sites - that are already faltering?

      The problem is being covered up, brushed under the rug. The issues of creating tools to adequately load test federation and track problems wasn’t raised during project development as an important ToDo item, call for assistance, nor has it really been noticed by most of the server operators. I’ve personally been going around to dozens of Lemmy instances and hand observing the failures to replicate data. No thought was put into even the most primitive tools to operate a server and have a sense of ‘how would you know’ if federation was failing?

      Yet, the leaders of Lemmy have created directories of “recommended sites” to go sign up with and given the impression that you can access active communities from peer instances to help offload the server reliability problem. Federation itself is unreliable on Lemmy to Lemmy!

      Either they are covering up the problem, hiding it out of pride, or not opening bugs on GitHub or not calling for help in the 0.18 Release Notes. Which is it?

      • ericjmorey@lemmy.world
        link
        fedilink
        arrow-up
        8
        ·
        edit-2
        2 years ago

        The problem is being covered up, brushed under the rug

        Either they are covering up the problem, hiding it out of pride, or not opening bugs on GitHub or not calling for help in the 0.18 Release Notes.

        You’ve raised many good points to come to a wildly accusatory conclusion.

        Get off of this line of thinking if you want to raise support for fixing the valid issues you’ve raised.

        Have you made any attempt to see if these issues have been raised on GitHub? Have you made any attempt to create issues on GitHub? Have you made any attempt to submit code enhancements via GitHub?

        • RoundSparrowOP
          link
          fedilink
          arrow-up
          2
          arrow-down
          3
          ·
          edit-2
          2 years ago

          Have you made any attempt to see if these issues have been raised on GitHub? Have you made any attempt to create issues on GitHub? Have you made any attempt to submit code enhancements via GitHub?

          Have the developers of Lemmy put a message to end-users that data is being lost? have the operators of servers opened issues on GitHub about ‘pending subscribe’ failures on federation? Have the developers of Lemmy put warnings in the Release Notes that scaling is an active concern and needs urgent attention? Have the operators of major Lemmy websites published their nginx server logs and application code logs so that the bugs and design problems of the code are logged?

          Is the logging in the Rust code so worthless, so poor, that they don’t share logs and find them of any use?

          Most of all, eat your own dogfood, where are Lemmy developers using Lemmy to archive / show history of discussions about scalability and performance issues causing major crashes?

          “I like Rust” but don’t like using Lemmy to discuss the performance concerns of scaling and data integrity in a monolithic Rust application?

          • ericjmorey@lemmy.world
            link
            fedilink
            arrow-up
            5
            ·
            2 years ago

            It looks like you want to vent and issue demands of others rather than be productive. I hope that serves you well.

            • RoundSparrowOP
              link
              fedilink
              arrow-up
              1
              arrow-down
              3
              ·
              2 years ago

              It looks like you want to vent and issue demands of others rather than be productive. I hope that serves you well.

              And how is this reply productive? **You are just trying to cover up the very real scaling problems of Lemmy code and project mismanagement by insulting me **personally for having autism that you misinterpret. It took me 3 weeks of courage to make this post about how mismanaged this project is, and here you are - DENIAL of the problems in the design and code.

              It looks like you want to vent

              If I want to vent: I’ll go to Reddit and start making postings about how unreliable Lemmy is as an application and how it can’t even share community comments on a daily basis - and the developers don’t give a single shit to read their own Lemmy communities and make postings about needing help with Rust webapp 101 and PostgreSQL.

      • monobot
        link
        fedilink
        arrow-up
        3
        ·
        2 years ago

        I like your posts and hunt for performance issues, I just think that developers decided (wether you and I agree or not) some other features are more important.

        Until few weeks ago communication was clear since there were not many people here, so there was no need for some specific notes you are mentioning.

        Now we do need them and reminding developers of it, or even better doing it would be much appreciated I expect.

        I have seen developers on some threads here or on github issues commenting on repication problems and they are hard at work for those.

        Even caching is discussed, as I understand, they first need to implement cache control headers so that admins can set up caching, as they see fit, outside lemmy.

        There is a lo of good will around, please have understanding and be part of it. Give the time to grow up to this opportunity.

        • RoundSparrowOP
          link
          fedilink
          arrow-up
          1
          arrow-down
          5
          ·
          edit-2
          2 years ago

          Even caching is discussed, as I understand,

          Where? On Lemmy? Why do they think Lemmy isn’t good enough to discuss Lemmy?

          Sure a lot of people in denial that"eat your own dogfood" is a concept in development. You are creating a messaging system, why aren’t you using it?

          I just think that developers decided (wether you and I agree or not) some other features are more important.

          End-user comments not making it out of a server to peer servers is unimportant. That is how I would describe the Lemmy Release Notes for 0.18 - that is how I would describe the response I got to the GitHub issue I opened…

          There is a lo of good will around, please have understanding and be part of it. Give the time to grow up to this opportunity.

          I see you want me to shut my mouth up and not be honest about how unreliable Lemmy is. I’ve been authoring social media messaging apps since 1984. But tell me to “grow up”. The project management doesn’t need to “grow up”, when you can’t even see them using Lemmy to discuss Lemmy programming and database design?

          So far, this community is insular and “Feedback not welcome” has been the response. Lemmy branding mania with no reality about how unstable and unreliable Lemmy is right here, right now.

            • RoundSparrowOP
              link
              fedilink
              arrow-up
              1
              arrow-down
              4
              ·
              edit-2
              2 years ago

              I understand their GitHub comment is a little confusing, but interpreting it as telling you to grow up is… telling.

              Please explain it to me then, if I misinterperted?

              The creators of Lemmy have misinterpreted how to read books on when to add caching layers to a webapp, and how to test with significant amounts of data.

              Am I misinterpreting the project management’s incompetence and areas that need improvement?

              Help me out in interpretation please, i have autism, and I don’t always interpret things the same as other people. In fact, i question if people take interpretaiton for granted and like mocking and insulting each other as a way to deflect truth and honesty in social matters.

              interpreting it as telling you to grow up is… telling.

              I interpreted the 500 errors on the front page of Lemmy.ml with 40 years of social media application development expertise under my belt. Maybe it is you who is interpreting the code development, server operations and how badly it iis being done wrongly? Do you know how to interpret the pattern of nginx 500 errors and missed federation comment replication?

              Are you gaslighting me,m intimidating me as a human person, dehumanizing me?

              • Are you gaslighting me,m intimidating me as a human person, dehumanizing me?

                Yes I was trying to do exactly all those things, you got me.

                And I’m autistic too. That doesn’t preclude you from having a superiority complex or just being an asshole.

                • RoundSparrowOP
                  link
                  fedilink
                  arrow-up
                  1
                  arrow-down
                  4
                  ·
                  edit-2
                  2 years ago

                  Yes I was trying to do exactly all those things, you got me.

                  Yes, I did get you, as you obviously can not talk about the lack of sharing logs from the busy server that was crashing hourly / falling over itself under moderate load.

                  And I’m autistic too. That doesn’t preclude you from having a superiority complex or just being an asshole.

                  Yes, check yourself, you already admitted “yes” that you are gaslighitng me. That you are trying to intimidate me. I’m sorry you have been abused so much for being autistic and are afraid of facts and truth about an open source project that’s being mismanaged. That your identity with a software application exeeds your love for human persons. Maybe read BIble page “1 John 4:20” for inspiration. You clearly seem like a damaged individual that society has harmed.

                  My name is Stephen Alfred Gutrknecht, what’s your name since we are sharing personal details about our mental health? I created community !autistic_adults@lemmy.ml as one of the first things when I joined lemmy.ml - and I also run a information website, because I find many autitics hide behind a ‘tough guy’ personona and don’t stand up for truth and honesty about the situation. www.GutknechtAutism.org - nice to meet you, and I hope you can turn away from using intimidation and gaslight tactics about obvious and serious problems in an open source Rust software project.

      • monobot
        link
        fedilink
        arrow-up
        3
        ·
        2 years ago

        https://github.com/LemmyNet/lemmy/issues/2975

        Caching is being worked on in shape of cache control headers, not in a way you mention sql cache, but will get better.

        If it shows it itms not enough I can imagine devs will change their opinion on it, like they did with websockets.

        • RoundSparrowOP
          link
          fedilink
          arrow-up
          1
          arrow-down
          4
          ·
          edit-2
          2 years ago

          not in a way you mention sql cache, but will get better.

          Avoiding the SQL datababase and use caching is webapp programming 101, it is fundamental to all the crashes Lemmy is showing. We are talking the MOST BASIC thing in creating webapps. I really can’t over-emphasize this point.

          You don’t go query the site table every single time a federation incoming comment comes in.

          SELECT "local_site"."id", "local_site"."site_id", "local_site"."site_setup", "local_site"."enable_downvotes", "local_site"."enable_nsfw", "local_site"."community_creation_admin_only", "local_site"."require_email_verification", "local_site"."application_question", "local_site"."private_instance", "local_site"."default_theme", "local_site"."default_post_listing_type", "local_site"."legal_information", "local_site"."hide_modlog_mod_names", "local_site"."application_email_admins", "local_site"."slur_filter_regex", "local_site"."actor_name_max_length", "local_site"."federation_enabled", "local_site"."federation_worker_count", "local_site"."captcha_enabled", "local_site"."captcha_difficulty", "local_site"."published", "local_site"."updated", "local_site"."registration_mode", "local_site"."reports_email_admins" FROM "local_site" LIMIT $1

          And back to the very subjhect line of this posting, you SHARE YOUR CRASH LOGS when your server is crashing, why is lemmy.ml not putting the crash logs up on Girhub issues when for 30 days I’ve seen 500 errors on the front page?