Speaker(s):
Jun-04 11:50-12:20 in Melody

A critical aspect of any table format is the rapid identification of files relevant for a query irrespective of the underlying data volume. The focus of this presentation is on the job planning process in Apache Iceberg, highlighting its efficiency and ability to scale to tens of millions of files. This session will explain how the project leverages a hybrid strategy for planning jobs, seamlessly transitioning between local and distributed execution for optimal performance.

Attendees will gain insights into the design of Apache Iceberg metadata and how it underpins effective job execution. This talk will benefit engineers considering Apache Iceberg’s adoption as well as those who already use it and seek to optimize their existing production environments.