So I’m considering going deep into a data viz library, and I’m wondering what you people think. I’m not asking reddit because I know for a fact that all the hardcore people that know their stuff are on lemmy.

Here are my requirements:

  • API must at least pretend to be reasonably designed.
    • I know that viz libraries are complex. But I want something with carefully chosen primitives that scale reasonably well from “data goes in, chart goes out” to nit-picky adjustments.
  • Defaults must not be ugly.
    • Or at least there should be an easy way to bypass the default ugliness. I know that design is subjective, but how am I supposed to trust a library that operates on the visual space and yet decides that a bad default is ok?
    • Here looks like ggplot has the upper hand. But there is a stylesheet that makes matplotlib look like ggplot, so maybe that’s not a big problem.
  • Must have a future.
    • The github contribution chart on matplotlib just keep going up, it’s insane. While ggplot not so much. But maybe it’s hard to compete with the python hype machine, and that is that.
  • Bonus points if interactive and renders to web too.

Non-requirements:

  • Easy learning curve.
    • I am a hardcore programm0r. I like it rough, as long as it’s worth the effort.
  • Heavy math stuff.
    • I’m not designing rockets or wind turbines. I just want a way to visually represent data as lines, charts, pies, or maps, or maybe violins if I’m feeling fancy.

Thanks

  • liori@lemm.ee
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    2 years ago

    Given these criteria, ggplot2 wins by a landslide. The API, thanks to R’s nonstandard evaluation feature, is crazy good compared to whatever is available in Python. Not having to use numpy/pandas as inputs is a bonus as well, somehow pandas managed to duplicate many bad features of R’s data frame and introduce its own inconsistences, without providing many of the good features¹. Styling defaults are decent, definitely much better than matplotlib’s, and it’s much easier to consistently apply custom styling. Future of ggplot2 is defined by downstream libraries, ggplot2 is just the core of the ecosystem, which, at this point, is mature and stable. Matplotlib’s activity is mostly because that lack of nonstandard evaluation makes it more cumbersome to implement flexible APIs, and so it just takes more work. Both have very minimal support for interactive and web, it’s easier to just use shiny/dask to wrap them than to force them alone to do web/interactive stuff. Which, btw, again I’d say shiny » dask if nothing but for R’s nonstandard evaluation feature.

    Note though that learning proper R takes time, and if you don’t know it yet, you will underestimate time necessary to get friendly. Nonstandard evaluation alone is so nonstandard that it gives headaches to people who’d otherwise be skilled programmers already. matplotlib would hugely win by flexibility, which you apparently don’t need—but there’s always that one tiny tweak you would wish to be able to do. Also, it’s usually much easier to use the platform’s default, whatever publishing platform you’re going to use.

    As for me, if I have choice, I’m picking ggplot2 as a default. So far it was good enough for significant majority of my academic and professional work.

    ¹ Admitably numpy was not designed for data analysis directly, and pandas has some nice features missing from R’s data frames.