ZOrderingSpec: + https://iceberg.apache.org/docs/1.4.2/spark-procedures/ A table with no particular order - should have z-ordered files after rewriting the data + Given 40000 rows of data that look like Datum(6919,label_6919,4,2005-12-06,2024-11-15 17:11:23.592) Datum(12167,label_12167,2,1991-07-25,2024-11-15 17:28:53.192) Datum(32111,label_32111,1,1936-12-16,2024-11-15 18:35:21.992) ... are initially written to table 'polaris.my_namespace.ZOrderingSpec' + When we execute the SQL: CALL system.rewrite_data_files(table => "polaris.my_namespace.ZOrderingSpec", strategy => 'sort', sort_order => 'zorder(id, date)', options => map('min-input-files','4', 'target-file-size-bytes','49152') ) + Then added to the original 4 files are: /tmp/polaris/my_namespace/ZOrderingSpec/data/00006-34-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00005-33-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00003.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00002-30-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00004-32-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00009-37-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00003-31-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00003.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00007-35-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00001-29-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00005-33-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00005-33-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00000-28-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00002-30-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00006-34-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00003-31-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00008-36-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00000-28-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00009-37-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00003.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00007-35-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00001.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00008-36-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00007-35-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00003.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00001-29-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00003-31-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00000-28-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00003.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00009-37-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00004-32-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00002.parquet /tmp/polaris/my_namespace/ZOrderingSpec/data/00006-34-a64be0f2-4cf6-430a-bb30-3dde6cbef8f7-0-00003.parquet + And there is no overlap in the `id` dimension. The ranges of the id look like: 0 to 1999 2000 to 3999 4000 to 4186 4187 to 6186 6187 to 8058 8059 to 10058 10059 to 11967 11968 to 13967 13968 to 15967 15968 to 16059 16060 to 18059 18060 to 20013 20014 to 22013 22014 to 24013 24014 to 24241 24242 to 26241 26242 to 28241 28242 to 28284 28285 to 30284 30285 to 32284 32285 to 32288 32289 to 34288 34289 to 35807 35808 to 37807 37808 to 39807 39808 to 39999 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - should needs its data rewritten to maintain its z-order + Given a z-ordered table called 'polaris.my_namespace.ZOrderingSpec' + When 40000 rows of new data that look like Datum(45226,label_5226,1,2010-07-26,2024-11-15 17:05:44.992) Datum(56679,label_16679,4,1979-03-18,2024-11-15 17:43:55.592) Datum(60060,label_20060,0,1969-12-14,2024-11-15 17:55:11.792) ... are appended to table 'polaris.my_namespace.ZOrderingSpec' + Then the ranges of the ids overlap in the 4 new files and look like: 40000 to 79994 40001 to 79993 40007 to 79996 40012 to 79999 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +