IcebergCRUDSpec: A dataset we CRUD - should create the appropriate Iceberg files + Given data Datum(0,label_0,0,2024-11-15,2024-11-15 16:48:19.792) Datum(1,label_1,1,2024-11-14,2024-11-15 16:48:19.992) Datum(2,label_2,2,2024-11-13,2024-11-15 16:48:20.192) ... + When writing to table 'polaris.my_namespace.IcebergCRUDSpec' + Then reading the table back yields the same data + And there is no mention of the table in the metastore + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - should support updates with 'update polaris.my_namespace.IcebergCRUDSpec set label='ipse locum'' + Given SQL update polaris.my_namespace.IcebergCRUDSpec set label = 'ipse locum' + When we execute it + Then all rows are updated + And look like: Datum(10,ipse locum,0,2024-11-05,2024-11-15 16:48:21.792) Datum(11,ipse locum,1,2024-11-04,2024-11-15 16:48:21.992) Datum(12,ipse locum,2,2024-11-03,2024-11-15 16:48:22.192) ... + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - should be able to have its schema updated + Given SQL ALTER TABLE polaris.my_namespace.IcebergCRUDSpec ADD COLUMNS (new_string string comment 'new_string docs') + When we execute it + Then all rows are updated + And look like: [10,ipse locum,0,2024-11-05,2024-11-15 16:48:21.792,null] [11,ipse locum,1,2024-11-04,2024-11-15 16:48:21.992,null] [12,ipse locum,2,2024-11-03,2024-11-15 16:48:22.192,null] ... + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - should be able to have rows removed + Given SQL DELETE FROM polaris.my_namespace.IcebergCRUDSpec where id < 10 + And the parquet files: /tmp/polaris/my_namespace/IcebergCRUDSpec/data/.00001-784-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet.crc /tmp/polaris/my_namespace/IcebergCRUDSpec/data/00001-784-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet /tmp/polaris/my_namespace/IcebergCRUDSpec/data/.00000-783-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet.crc /tmp/polaris/my_namespace/IcebergCRUDSpec/data/.00000-789-276e99a2-e6a8-4159-a380-16c7586af0b3-0-00001.parquet.crc /tmp/polaris/my_namespace/IcebergCRUDSpec/data/00003-786-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet /tmp/polaris/my_namespace/IcebergCRUDSpec/data/00002-785-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet /tmp/polaris/my_namespace/IcebergCRUDSpec/data/.00002-785-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet.crc /tmp/polaris/my_namespace/IcebergCRUDSpec/data/.00003-786-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet.crc /tmp/polaris/my_namespace/IcebergCRUDSpec/data/00000-783-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet /tmp/polaris/my_namespace/IcebergCRUDSpec/data/00000-789-276e99a2-e6a8-4159-a380-16c7586af0b3-0-00001.parquet + When we execute it + Then there are no longer an rows with ID < 10 + And no files are deleted but there are the following new parquet files: /tmp/polaris/my_namespace/IcebergCRUDSpec/data/00000-795-f89adb64-7650-4519-9895-79e6997f23af-0-00001.parquet /tmp/polaris/my_namespace/IcebergCRUDSpec/data/.00000-795-f89adb64-7650-4519-9895-79e6997f23af-0-00001.parquet.crc + And those new files contain just the data with id >= 10 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - should can have its history queried + Given a table that has seen changes + When we execute: select * from polaris.my_namespace.IcebergCRUDSpec.history + Then we see entries in the history table thus: +-----------------------+-------------------+-------------------+-------------------+ |made_current_at |snapshot_id |parent_id |is_current_ancestor| +-----------------------+-------------------+-------------------+-------------------+ |2024-11-15 16:48:35.557|2135145595533504396|NULL |true | |2024-11-15 16:48:35.761|963363221342154824 |2135145595533504396|true | |2024-11-15 16:48:36.258|6388286490046277703|963363221342154824 |true | +-----------------------+-------------------+-------------------+-------------------+ + And we can view a snapshot image with SQL: select * from polaris.my_namespace.IcebergCRUDSpec VERSION AS OF 2135145595533504396 +---+--------+------------+----------+-----------------------+ id |label |partitionKey|date |timestamp | +---+--------+------------+----------+-----------------------+ 0 |label_0 |0 |2024-11-15|2024-11-15 16:48:19.792| 1 |label_1 |1 |2024-11-14|2024-11-15 16:48:19.992| 2 |label_2 |2 |2024-11-13|2024-11-15 16:48:20.192| 3 |label_3 |3 |2024-11-12|2024-11-15 16:48:20.392| 4 |label_4 |4 |2024-11-11|2024-11-15 16:48:20.592| 5 |label_5 |0 |2024-11-10|2024-11-15 16:48:20.792| 6 |label_6 |1 |2024-11-09|2024-11-15 16:48:20.992| 7 |label_7 |2 |2024-11-08|2024-11-15 16:48:21.192| 8 |label_8 |3 |2024-11-07|2024-11-15 16:48:21.392| 9 |label_9 |4 |2024-11-06|2024-11-15 16:48:21.592| 10 |label_10|0 |2024-11-05|2024-11-15 16:48:21.792| 11 |label_11|1 |2024-11-04|2024-11-15 16:48:21.992| 12 |label_12|2 |2024-11-03|2024-11-15 16:48:22.192| 13 |label_13|3 |2024-11-02|2024-11-15 16:48:22.392| 14 |label_14|4 |2024-11-01|2024-11-15 16:48:22.592| 15 |label_15|0 |2024-10-31|2024-11-15 16:48:22.792| 16 |label_16|1 |2024-10-30|2024-11-15 16:48:22.992| 17 |label_17|2 |2024-10-29|2024-11-15 16:48:23.192| 18 |label_18|3 |2024-10-28|2024-11-15 16:48:23.392| 19 |label_19|4 |2024-10-27|2024-11-15 16:48:23.592| +---+--------+------------+----------+-----------------------+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - should when vacuumed, have old files removed + Given the 12 files are: /tmp/polaris/my_namespace/IcebergCRUDSpec/data/.00001-784-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet.crc /tmp/polaris/my_namespace/IcebergCRUDSpec/data/00000-795-f89adb64-7650-4519-9895-79e6997f23af-0-00001.parquet /tmp/polaris/my_namespace/IcebergCRUDSpec/data/00000-783-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet /tmp/polaris/my_namespace/IcebergCRUDSpec/data/.00000-783-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet.crc /tmp/polaris/my_namespace/IcebergCRUDSpec/data/.00000-789-276e99a2-e6a8-4159-a380-16c7586af0b3-0-00001.parquet.crc /tmp/polaris/my_namespace/IcebergCRUDSpec/data/00003-786-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet /tmp/polaris/my_namespace/IcebergCRUDSpec/data/00002-785-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet /tmp/polaris/my_namespace/IcebergCRUDSpec/data/00000-789-276e99a2-e6a8-4159-a380-16c7586af0b3-0-00001.parquet /tmp/polaris/my_namespace/IcebergCRUDSpec/data/00001-784-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet /tmp/polaris/my_namespace/IcebergCRUDSpec/data/.00000-795-f89adb64-7650-4519-9895-79e6997f23af-0-00001.parquet.crc /tmp/polaris/my_namespace/IcebergCRUDSpec/data/.00002-785-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet.crc /tmp/polaris/my_namespace/IcebergCRUDSpec/data/.00003-786-bb026164-920c-42cb-9f5a-1a972d134193-0-00001.parquet.crc + When we call expireSnapshot + Then the original 12 files now look like: /tmp/polaris/my_namespace/IcebergCRUDSpec/data/.00000-795-f89adb64-7650-4519-9895-79e6997f23af-0-00001.parquet.crc /tmp/polaris/my_namespace/IcebergCRUDSpec/data/00000-789-276e99a2-e6a8-4159-a380-16c7586af0b3-0-00001.parquet /tmp/polaris/my_namespace/IcebergCRUDSpec/data/00000-795-f89adb64-7650-4519-9895-79e6997f23af-0-00001.parquet /tmp/polaris/my_namespace/IcebergCRUDSpec/data/.00000-789-276e99a2-e6a8-4159-a380-16c7586af0b3-0-00001.parquet.crc + And there are 8 new files + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - should should delete all files when dropped + Given polaris.my_namespace.IcebergCRUDSpec has 2 + When we execute: DROP TABLE polaris.my_namespace.IcebergCRUDSpec PURGE + Then there are no files + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +