Should I use Presto or Hive?¶
While Presto may be the better choice for most scenarios, one should not discount Hive as there is always a use case too demanding for Presto.
As Presto has a limitation on the maximum amount of memory each task can store, it fails if the query requires a significant amount of memory. While this error handling logic (or a lack thereof) is acceptable for interactive queries, it is not suitable for daily/weekly reports that must run reliably. Hive may be a better alternative for such tasks.
|Optimized for batch processing of large ETL jobs and batch SQL queries on huge data sets.||Used for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.|
|Mature SQL – ANSI SQL.||Less mature SQL (still ANSI compliant).|
|Easily extensible.||Some extensibility, but limited compared to Hive.|
|Optimized for query throughput.||Optimized for latency.|
|Needs more resources per query.||Resource-efficient.|
|Suitable for large fact-to-fact joins.||Optimized for star schema joins (1 large fact table and many smaller dimension tables).|
|Suitable for large data aggregations.||Interactive queries and quick data exploration.|
|Rich ecosystem (plenty of resources online)||Less rich ecosystem (but now improving with big users such as Facebook, Netflix).|