OptimizationSpec: + https://iceberg.apache.org/docs/1.4.2/spark-procedures/ A table that has many files - should have those files aggregated + Given data Datum(0,label_0,0,2024-11-15,2024-11-15 16:48:19.792) Datum(1,label_1,1,2024-11-14,2024-11-15 16:48:19.992) Datum(2,label_2,2,2024-11-13,2024-11-15 16:48:20.192) ... + And 20000 rows are initially written to table 'polaris.my_namespace.OptimizationSpec' + When we execute the SQL CALL system.rewrite_data_files( table = > "polaris.my_namespace.OptimizationSpec", options = > map('min-input-files', '2') ) + Then the files added to the original 4 are: /tmp/polaris/my_namespace/OptimizationSpec/data/00000-106-618c1604-8992-4820-9933-957f50ecf870-0-00001.parquet + And there are no files deleted from the subsequent 5 + And these new files contain all the data + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - should have snapshots removed when expired + Given there are already 5 files for table polaris.my_namespace.OptimizationSpec + When we execute the SQL: CALL system.expire_snapshots( table = > "polaris.my_namespace.OptimizationSpec", older_than = > TIMESTAMP '2024-11-15 16:48:29.713', stream_results = > true ) + Then old files have been removed and only 1 remain + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +