-
Notifications
You must be signed in to change notification settings - Fork 14
Timezone for partitioning of Iceberg tables #1349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: antalya-25.8
Are you sure you want to change the base?
Timezone for partitioning of Iceberg tables #1349
Conversation
|
@codex review |
|
Codex Review: Something went wrong. Try again later by commenting “@codex review”. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
arthurpassos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should submit this to upstream and get their opinion as well. Otherwise, as far as I can evaluate this PR, it looks good to me.
| { | ||
| String transform_name; | ||
| std::optional<size_t> argument; | ||
| std::optional<String> time_zone; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, the TransformAndArgument struct is a generic struct that holds a function name and a single optional argument. It can hold any function, not only date specific functions. I suggest you add a comment explaining why this optional time_zone field exists.
| DECLARE(Bool, serialize_string_in_memory_with_zero_byte, true, R"( | ||
| Serialize String values during aggregation with zero byte at the end. Enable to keep compatibility when querying cluster of incompatible versions. | ||
| )", 0) \ | ||
| DECLARE(Timezone, iceberg_partition_timezone, "", R"( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a short explanation on how this interacts with server / session timezone settings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not interact. Server or session time zone is used when setting is not set, it is in empty value describe below
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Setting
iceberg_partition_timezonewith timezone, used to create Iceberg tables partitions.Documentation entry for user-facing changes
Solved #1299
When Iceberg table created with third-party tools, data are split on partitions with specific time zone, UTC in most cases.
But ClickHouse tries to make partition pruning based on server time zone, and when server time zone is not UTC, some data can be incorrectly pruned during select queries.
This PR introduces new setting
iceberg_partition_timezone, which contains time zone used for partitions. This time zone can be different from server time zone, session time zone or column time zone.Default value is empty for backward compatibility. Empty value means 'use current time zone' as before.
Support for writing is added but not tested, due to it not working in antalya branch and in upstream master (no related to current PR):
#1350
Try to investigate it later.
CI/CD Options
Exclude tests:
Regression jobs to run: