ZOrderingSpec: + https://iceberg.apache.org/docs/1.4.2/spark-procedures/ A table with no particular order - should have z-ordered files after rewriting the data + Given 40000 rows of data that look like Datum(8871,label_8871,1,2001-01-28,2025-05-13 17:07:33.888) Datum(28580,label_28580,0,1947-02-12,2025-05-13 18:13:15.688) Datum(24317,label_24317,2,1958-10-15,2025-05-13 17:59:03.088) ... are initially written to table 'polaris.my_namespace.ZOrderingSpec' + When we execute the SQL: CALL system.rewrite_data_files( table => "polaris.my_namespace.ZOrderingSpec", strategy => 'sort', sort_order => 'zorder(id, date)', options => map('min-input-files','4', 'target-file-size-bytes','49152') ) + Then added to the original 4 files are: /tmp/polaris/my_namespace/ZOrderingSpec/data/00002-1131-8745e865-6a71-484c-a5aa-16f67448a645-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00004-1133-8745e865-6a71-484c-a5aa-16f67448a645-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00001-1130-8745e865-6a71-484c-a5aa-16f67448a645-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00002-1131-8745e865-6a71-484c-a5aa-16f67448a645-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00006-1135-8745e865-6a71-484c-a5aa-16f67448a645-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00007-1136-8745e865-6a71-484c-a5aa-16f67448a645-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00000-1129-8745e865-6a71-484c-a5aa-16f67448a645-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00009-1138-8745e865-6a71-484c-a5aa-16f67448a645-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00007-1136-8745e865-6a71-484c-a5aa-16f67448a645-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00007-1136-8745e865-6a71-484c-a5aa-16f67448a645-0-00003.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00003-1132-8745e865-6a71-484c-a5aa-16f67448a645-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00008-1137-8745e865-6a71-484c-a5aa-16f67448a645-0-00003.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00008-1137-8745e865-6a71-484c-a5aa-16f67448a645-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00003-1132-8745e865-6a71-484c-a5aa-16f67448a645-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00005-1134-8745e865-6a71-484c-a5aa-16f67448a645-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00009-1138-8745e865-6a71-484c-a5aa-16f67448a645-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00006-1135-8745e865-6a71-484c-a5aa-16f67448a645-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00001-1130-8745e865-6a71-484c-a5aa-16f67448a645-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00005-1134-8745e865-6a71-484c-a5aa-16f67448a645-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00000-1129-8745e865-6a71-484c-a5aa-16f67448a645-0-00003.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00009-1138-8745e865-6a71-484c-a5aa-16f67448a645-0-00003.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00004-1133-8745e865-6a71-484c-a5aa-16f67448a645-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00002-1131-8745e865-6a71-484c-a5aa-16f67448a645-0-00003.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00003-1132-8745e865-6a71-484c-a5aa-16f67448a645-0-00003.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00008-1137-8745e865-6a71-484c-a5aa-16f67448a645-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00000-1129-8745e865-6a71-484c-a5aa-16f67448a645-0-00001.parquet + And there is no overlap in the `id` dimension. The ranges of the id look like: 0 to 1999 2000 to 3999 4000 to 4414 4415 to 6414 6415 to 8137 8138 to 10137 10138 to 12137 12138 to 12180 12181 to 14180 14181 to 16180 16181 to 16229 16230 to 18229 18230 to 20033 20034 to 22033 22034 to 23757 23758 to 25757 25758 to 27544 27545 to 29544 29545 to 31544 31545 to 31575 31576 to 33575 33576 to 35575 35576 to 35791 35792 to 37791 37792 to 39791 39792 to 39999 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - should needs its data rewritten to maintain its z-order + Given a z-ordered table called 'polaris.my_namespace.ZOrderingSpec' + When 40000 rows of new data that look like Datum(69191,label_29191,1,1945-06-11,2025-05-13 18:15:17.888) Datum(59351,label_19351,1,1972-05-20,2025-05-13 17:42:29.888) Datum(58757,label_18757,2,1974-01-04,2025-05-13 17:40:31.088) ... are appended to table 'polaris.my_namespace.ZOrderingSpec' + Then the ranges of the ids overlap in the 4 new files and look like: 40000 to 79997 40001 to 79993 40002 to 79996 40003 to 79999 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +