Spark vs Presto: A Comprehensive Comparison

ericjmorey@programming.dev · 1 year ago

Spark vs Presto: A Comprehensive Comparison

Sem · edit-2 1 year ago

Thank you! The conclusion is quite good, like use spark as ETL and Presto (Trino) for analytical queries but the article looks very outdated.

Spark is not about RDDs. Today the most usage of Spark is via DataFrame API. And it is not just syntax. The Catalyst itslef provide a lot of performance optimizations, like predicate pushdown on the level of orc/parquet reading, automatic skew joins detection, prunning, etc.

Also Presto in this case should be called as Trino because there was a rebranding in 2020

ericjmorey@programming.dev · 1 year ago

I was a questioning the quality of the source, thanks for confirming that it’s not a top quality article.