Should I use Presto or Hive?¶
While Presto may be the better choice for most scenarios, one should not discount Hive as there is always a use case too demanding for Presto.
As Presto has a limitation on the maximum amount of memory each task can store, it fails if the query requires a significant amount of memory. While this error handling logic (or a lack thereof) is acceptable for interactive queries, it is not suitable for daily/weekly reports that must run reliably. Hive may be a better alternative for such tasks.
Hive | Presto |
Optimized for batch processing of large ETL jobs and batch SQL queries on huge data sets. | Used for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. |
Mature SQL – ANSI SQL. | Less mature SQL (still ANSI compliant). |
Easily extensible. | Some extensibility, but limited compared to Hive. |
Optimized for query throughput. | Optimized for latency. |
Needs more resources per query. | Resource-efficient. |
Suitable for large fact-to-fact joins. | Optimized for star schema joins (1 large fact table and many smaller dimension tables). |
Suitable for large data aggregations. | Interactive queries and quick data exploration. |
Rich ecosystem (plenty of resources online) | Less rich ecosystem (but now improving with big users such as Facebook, Netflix). |