OptimizationSpec: + https://iceberg.apache.org/docs/1.4.2/spark-procedures/ A table that has many files - should have those files aggregated + Given data Datum(0,label_0,0,2025-03-28,2025-03-28 13:59:34.403) Datum(1,label_1,1,2025-03-27,2025-03-28 13:59:34.603) Datum(2,label_2,2,2025-03-26,2025-03-28 13:59:34.803) ... + And 20000 rows are initially written to table 'polaris.my_namespace.OptimizationSpec' + When we execute the SQL CALL system.rewrite_data_files( table => "polaris.my_namespace.OptimizationSpec", options => map('min-input-files','2')) + Then the files added to the original 4 are: /tmp/polaris/my_namespace/OptimizationSpec/data/00000-92-ddfc702b-aea3-4aa5-a667-4cb744e2cf4c-0-00001.parquet + And there are no files deleted from the subsequent 5 + And these new files contain all the data + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - should have snapshots removed when expired + Given there are already 5 files for table polaris.my_namespace.OptimizationSpec + When we execute the SQL: CALL system.expire_snapshots( table => "polaris.my_namespace.OptimizationSpec", older_than => TIMESTAMP '2025-03-28 13:59:42.466', stream_results => true) + Then old files have been removed and only 1 remain + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +