• MalReynolds@slrpnk.net
    link
    fedilink
    English
    arrow-up
    64
    arrow-down
    1
    ·
    edit-2
    5 months ago

    When you need speed in Python, after profiling, checking for errors, and making damn sure you actually need it, you code the slow bit in C and call it.

    When you need speed in C, after profiling, checking for errors, and making damn sure you actually need it, you code the slow bit in Assembly and call it.

    When you need speed in Assembly, after profiling, checking for errors, and making damn sure you actually need it, you’re screwed.

    Which is not to say faster Python is unwelcome, just that IMO its focus is frameworking, prototyping or bashing out quick and perhaps dirty things that work, and that’s a damn good thing.

    • Dave.@aussie.zone
      link
      fedilink
      arrow-up
      21
      ·
      edit-2
      5 months ago

      Generally I bash together the one-off programs in Python and if I discover that my “one off” program is actually being run 4 times a week, that’s when I look at switching to a compiled language.

      Case in point: I threw together a python program that followed a trajectory in a point cloud and erased a box around the trajectory. Found a python point cloud library, swore at my code (and the library code) for a few hours, tidied up a few point clouds with it, job done.

      And then other people in my company also needed to do the same thing and after a few months of occasional use, I rewrote it using C++ and Open3D. A few days of swearing this time (mainly because my C++ is a bit rusty, and Open3D’s C++ interface is a sparsely-documented back end to their main python front end).

      End result though is that point clouds that took 3 minutes to process before in python now take 10 seconds, and now there’s a visualisation widget that shows the effects of the processing so you don’t have to open the cloud in another viewer to see that it was ok.

      But anyway, like you said, python is good for prototyping, and when you hash out your approach and things are fairly nailed down and now you’d like some speed, jump to a compiled language and reap the benefits.

      • sugar_in_your_tea@sh.itjust.works
        link
        fedilink
        arrow-up
        4
        ·
        5 months ago

        Python is also pretty good for production, provided you’re using libraries optimized in something faster. Is there a reason you didn’t use Open3D’s Python library? I’m guessing you’d get close to the same performance of the C++ code in a lot less time.

        That said, if you’re doing an animation in 3D, you should probably consider a game engine. Godot w/ GDScript would probably be good enough, though you’d spend a few days learning the engine (but the next project would be way faster).

        If you’re writing a performance-critical library, something compiled is often the better choice. If you’re just plugging libraries together, something like Python is probably a better use of your time since the vast majority of CPU time can generally be done in a library.

    • souperk@reddthat.com
      link
      fedilink
      arrow-up
      15
      ·
      edit-2
      5 months ago

      While I agree with most of what you say, I have a personal anecdote that highlights the importance of performance as a feature.

      I have a friend that studies economics and uses python for his day to day. Since computer science is not his domain, he finds it difficult to optimize his code, and learning a new language (C in this case) is not really an option.

      Some of his experiments take days to run, and this is becoming a major bottleneck in his workflow. Being able to write faster code without relying on C is going to have a significant impact on his research.

      Of course, there are other ways to achieve similar results, for example another friend is working on DIAS a framework that optimizes pandas in the runtime. But, the point still stands, there are a tonne of researchers relying on python to get quick and dirty results, and performance plays a significant in that when the load of data is huge.

      • MalReynolds@slrpnk.net
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        1
        ·
        5 months ago

        Sure, I was being mildly facetious, but pointing to a better pattern, the nature of python means it is, barring some extreme development, always going to be an order of magnitude slower than compiled. If you’re not going to write even a little C, then you need to look for already written C / FORTRAN / (SQL for data) / whatever that you can adapt to reap those benefits. Perhaps a general understanding of C and a good knowledge of what your Python is doing is enough to get a usable result from a LLM.

      • sugar_in_your_tea@sh.itjust.works
        link
        fedilink
        arrow-up
        6
        ·
        edit-2
        5 months ago

        I have an alternative anecdote.

        My coworker is a Ph.D in something domain specific, and he wrote an app to do some complex simulation. The simulation worked well on small inputs (like 10), but took minutes on larger inputs (~100), and we want to support very large inputs (1000+) but the program would get killed with out of memory errors.

        I (CS background) looked over the code and pointed out two issues:

        • bubble sort in a hot path
        • allocated all working memory at the start and used 4D arrays, when 3D arrays and a 1D result array would’ve sufficed (O(n4) -> O(n3))

        Both problems would have been solved had they used Python, but they used Fortran because “fast,” but it doesn’t have builtin sort or data structures. Python provides classes, sortable lists (with quicksort!), etc, so they could’ve structured their code better and avoided the architectural mistakes that caused runtime and memory to explode. Had they done that, I could’ve solved performance problems by switching lists to numpy arrays and throwing numba on the hot loops and been done in a day, but instead we spent weeks rewriting it (nobody understands Fortran, and that apparently included the original dev).

        Python lets you focus on the architecture. Compiled languages often get you stuck in the weeds, especially if you don’t have a strong CS background and just hack things until it works.

    • sugar_in_your_tea@sh.itjust.works
      link
      fedilink
      arrow-up
      13
      ·
      edit-2
      5 months ago

      I’d really like to see Rust fit in where C(++) does now for Python. I know some libraties do it (e.g. Pydantic), but it really should be more common. It should work really well with the GIL… (or the TIL or whatever the new one is)

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          arrow-up
          3
          ·
          5 months ago

          Well, it is happening, I just don’t know how “blessed” it is by Python maintainers (i.e. are Python releases blocked by Rust binding updates?). It’s 100% possible today and there are projects that use Rust bindings, I just don’t know how that fits in with Python development vs the C++ API.

      • barnaclebutt@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        edit-2
        5 months ago

        Or you could use cython, which is much easier to integrate with a python project. It is only marginally slower than Rust but a little less safe. Numpy libraries are usually the fast. Numba is a little clunky, but can also speed up code. There’s lots of options to speed up python code.

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          arrow-up
          2
          ·
          5 months ago

          Yup, Cython rocks.

          You can also use numba if you just need to accelerate one part of the app. We did that with a heavy part of the app and our naïve Python (using numpy) was about as fast as our naïve Rust, but only when wr turned on parallel processing in numba (I could’ve easily beat it with parallel Rust, but that requires extra work and wouldn’t fit as nicely into the rest of the app).

  • Diplomjodler@lemmy.world
    link
    fedilink
    arrow-up
    35
    ·
    5 months ago

    In all the stuff I do in Python, runtime is not a consideration at all. Developer productivity is far more of a bottleneck. Having said that, I do of course see the value in these endeavours.

    • FizzyOrange@programming.dev
      link
      fedilink
      arrow-up
      12
      arrow-down
      3
      ·
      5 months ago

      If everyone had a magic lamp that told them whether performance was going to be an issue when they started a project then maybe it wouldn’t matter. But in my experience people start Python projects with “performance doesn’t matter”, write 100k lines of code and then ask “ok it’s too slow now, what do we do”. To which the answer is “you fucked up, you shouldn’t have used Python”.

      • sugar_in_your_tea@sh.itjust.works
        link
        fedilink
        arrow-up
        8
        arrow-down
        1
        ·
        5 months ago

        No, it’s usually “microservices” or “better queries” or something like that. Python performance shouldn’t be an issue in a well-architected application. Source: I work on a project with hundreds of thousands of lines of Python code.

        • Valmond@lemmy.world
          link
          fedilink
          arrow-up
          4
          ·
          5 months ago

          100k lines of code doesn’t mean anything.

          You can make a 1k python lines bog down your new shiny PC, as well 1M lines run just fine.

          • sugar_in_your_tea@sh.itjust.works
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            5 months ago

            Exactly. We have hundreds of thousands of lines of code that work reasonably well. I think we made the important decisions correctly, so performance issues in one area rarely impact others.

            We rewrote ~1k lines of poorly running Fortran code into well-written Python code, and that worked because we got the important parts right (reduced big-O CPU from O(n3) to O(n2 log n) and memory from O(n4) to O(n3)). Runtime went from minutes to seconds in medium size data sets, and made large data sets possible to run (those would OOM due to O(n4) storage in RAM). If you get the important parts right, Python is probably good enough, and you can get linear optimizations from there by moving parts to a compiled language (or use a JIT like numba). Python wasn’t why we could make it fast, it’s just what we prototyped with so we could focus on the architecture, and we stopped optimizing when it was fast enough.

        • hark@lemmy.world
          link
          fedilink
          arrow-up
          4
          ·
          5 months ago

          That depends on what the application needs to do. There’s a reason why all performance-critical libraries for Python aren’t written in Python.

          • sugar_in_your_tea@sh.itjust.works
            link
            fedilink
            arrow-up
            3
            ·
            5 months ago

            Sure, and we use those, like numpy, scipy, and tensorflow. Python is best when gluing libraries together, so the more you can get out of those libraries, the better.

            Python isn’t fast, but it’s usually fast enough to shuffle data from one library to the next.

            • hark@lemmy.world
              link
              fedilink
              arrow-up
              4
              ·
              5 months ago

              Usually, but when it isn’t then you’ve got a bottleneck. Multithreaded performance is a major weak point if you need to do any processing that isn’t handled by one of the libraries.

              • sugar_in_your_tea@sh.itjust.works
                link
                fedilink
                arrow-up
                2
                ·
                edit-2
                5 months ago

                Then you need to break up your problem into processes. Python doesn’t really do multi-threading (hopefully that changes with the GIL going away), but most things can scale reasonably well in a process pool if you manage the worker queue properly (e.g. RabbitMQ works well).

                It’s not as good as proper threadimg, but it’s a lot simpler and easier to scale horizontally. You can later rewrite certain parts if hosting costs become a larger issue than dev costs.

                • hark@lemmy.world
                  link
                  fedilink
                  arrow-up
                  3
                  ·
                  5 months ago

                  A process pool means extra copying of data around which incurs a huge cost and this is made worse by the tendency for parallel-processing-friendly workloads often consisting of large amounts of data.

        • FizzyOrange@programming.dev
          link
          fedilink
          arrow-up
          4
          arrow-down
          4
          ·
          5 months ago

          Well yeah if by “well architected” you mean “doesn’t use Python”.

          “microservices” or “better queries”

          Not everything is a web service. Most of the slow Python code I encounter is doing real work.

          • sugar_in_your_tea@sh.itjust.works
            link
            fedilink
            arrow-up
            3
            ·
            5 months ago

            We also do “real work,” and that uses libraries that use C(++) under the hood, like scipy, numpy, and tensorflow. We do simulations of seismic waves, particle physics simulations, etc. Most of our app is business logic in a webapp, but there’s heavy lifting as well. All of “our” code is Python. I even pitched using Rust for a project, but we were able to get the Python code “fast enough” with numba.

            We separate expensive logic that can take longer into background tasks from requests that need to finish quickly. We auto-scale horizontally as needed so everything remains responsive.

            That’s what I mean by “architected well,” everything stays responsive and we just increase our hosting costs instead of development costs. If we need to, we could always rewrite parts in a faster language, provided that costs less than the development costs. We really don’t spend much time at all optimizing python code, so we’re not at that point yet.

            That being said, I do appreciate faster-running code. I use Rust for most of my personal projects, but that’s because I don’t have to pay a team to maintain my projects.

            • FizzyOrange@programming.dev
              link
              fedilink
              arrow-up
              1
              ·
              5 months ago

              Matrix code is the very best case for offloading work from Python to something else though.

              Think about something like a build system (e.g. scons) or a package installer (pip). There is no part of them that you can point to and say “that’s the slow bit, write it in C” because the slowness is distributed through the entire thing.

              • sugar_in_your_tea@sh.itjust.works
                link
                fedilink
                arrow-up
                1
                ·
                5 months ago

                Both of those are largely bound by i/o, but with some processing in between, so the best way to speed things up is probably am async i/o loop that feeds a worker pool. In Python, you’d use processes, which can be expensive and a little complicated, but workable.

                And as you pointed out, scons and pip exist, and they’re fast enough. I actually use poetry, and it’s completely fine.

                You could go all out and build something like cargo, but it’s the architecture decisions that matter most in something i/o bound like that.

                • FizzyOrange@programming.dev
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  5 months ago

                  they’re fast enough

                  Strong disagree. I switched from pip to uv and it sped my install time up from 58 seconds to 7. Yeah really. If pip is i/o bound where is all that speed up coming from?

      • Martín@lemmy.world
        link
        fedilink
        arrow-up
        5
        arrow-down
        1
        ·
        5 months ago

        Q: what do we do? A: profile and decompose. Should not be that distant as a thought

        • FizzyOrange@programming.dev
          link
          fedilink
          arrow-up
          3
          arrow-down
          3
          ·
          5 months ago

          Profiling is an extremely useful tool for optimising the system that you have. It doesn’t help if you have the wrong system entirely though.

          • sugar_in_your_tea@sh.itjust.works
            link
            fedilink
            arrow-up
            3
            arrow-down
            1
            ·
            5 months ago

            That’s why you need an architect to design the project for the expected requirements. They’ll ask the important questions, like:

            • how many users (if it’s a server)
            • any long-running processes? If so, what will those be doing?
            • how responsive does it need to be? What’s “fast enough”?
            • what’s more important short term, feature delivery or performance? Long term? How far away is “long term”?
            • what platforms do we need to support?
            • is AI or similar in the medium term goals? What’s the use case?
            • how bad is downtime? How much are you willing to spend if downtime isn’t an option?

            You don’t need all the answers up front, but you need enough to design a coherent system. It’s like building a rail system, building a commuter line is much different than a light rail network, and the planners will need to know if those systems need to interact with anything else.

            If you don’t do that, you’re going to end up overspending in some area, and probably significantly.

          • Martín@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            5 months ago

            Upfront analysis and design is very close to independent from the technology, particularly at the I/O level

  • erp@lemmy.world
    link
    fedilink
    arrow-up
    26
    arrow-down
    2
    ·
    5 months ago

    Why on earth would we try to make snakes faster? Science has gone too far this time. What’s next, give them arms?

  • Ephera
    link
    fedilink
    arrow-up
    20
    arrow-down
    1
    ·
    5 months ago

    The aim is to offer the speed of C or C++ while retaining the user-friendly feel of Python itself.

    These kind of claims always annoy me. Like, sure, there’s some room for interpretation there, but at the end of the day, C, C++ and also Rust achieve their speed by having handling baked into the semantics for:

    • non-GC memory management
    • passing by-reference vs. by-value
    • and in the case of Rust, also for handling multi-threaded processing.

    Unless he comes up with a revolutionary new memory management strategy, or achieves a massive jump in static analysis to replace human intelligence, then you simply can’t achieve similar speed while keeping the semantics of Python.

    • FizzyOrange@programming.dev
      link
      fedilink
      arrow-up
      6
      arrow-down
      1
      ·
      5 months ago

      That’s not really true. C# and Java are reference-based, uses GC and can be multithreaded, and are very comparable to Rust/C++/C performance. Certainly no more than twice as bad. Whereas Python is probably 50x as bad.

      The real answer is that Python developers have deliberately avoided worrying about performance when designing the language, until maybe 2 years ago. That means it has ended up being extremely dynamic and difficult to optimise, and the CPython implementation itself has also not focused on performance so it isn’t fast.

      But I agree the aim of offering C/C++ speed is never going to be met with Python syntax.

      • sugar_in_your_tea@sh.itjust.works
        link
        fedilink
        arrow-up
        2
        ·
        5 months ago

        They can probably beat or at least match Javascript, which has been heavily optimized, but the cap is going to be something like Lua (not LuaJit) without significant, painful changes.

        If you want faster Python today, you can try numba or Cython, both solve the problem in a different way with different tradeoffs.