iceberg 表函数

为 Amazon S3、Azure、HDFS 或本地存储中的 Apache Iceberg 表提供只读的类表接口。

语法

icebergS3(url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method])
icebergS3(named_collection[, option=value [,..]])

icebergAzure(connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method])
icebergAzure(named_collection[, option=value [,..]])

icebergHDFS(path_to_table, [,format] [,compression_method])
icebergHDFS(named_collection[, option=value [,..]])

icebergLocal(path_to_table, [,format] [,compression_method])
icebergLocal(named_collection[, option=value [,..]])

参数

参数的描述与表函数 `s3`、`azureBlobStorage`、`HDFS` 和 `file` 中参数的描述一致。`format` 代表 Iceberg 表中数据文件的格式。

返回值 一个具有指定结构的表，用于读取指定 Iceberg 表中的数据。

示例

SELECT * FROM icebergS3('http://test.s3.amazonaws.com/clickhouse-bucket/test_table', 'test', 'test')

信息

ClickHouse 目前支持通过 `icebergS3`、`icebergAzure`、`icebergHDFS` 和 `icebergLocal` 表函数以及 `IcebergS3`、`icebergAzure`、`IcebergHDFS` 和 `IcebergLocal` 表引擎读取 Iceberg 格式的 v1 和 v2 版本。

定义命名集合

这是一个配置命名集合以存储 URL 和凭据的示例

<clickhouse>
    <named_collections>
        <iceberg_conf>
            <url>http://test.s3.amazonaws.com/clickhouse-bucket/</url>
            <access_key_id>test<access_key_id>
            <secret_access_key>test</secret_access_key>
            <format>auto</format>
            <structure>auto</structure>
        </iceberg_conf>
    </named_collections>
</clickhouse>

SELECT * FROM icebergS3(iceberg_conf, filename = 'test_table')
DESCRIBE icebergS3(iceberg_conf, filename = 'test_table')

模式演变 目前，借助 CH，您可以读取架构随时间变化的 iceberg 表。我们目前支持读取已添加和删除列以及列顺序已更改的表。您还可以将必需值的列更改为允许 NULL 值的列。此外，我们还支持简单类型的允许类型转换，即：

int -> long
float -> double
decimal(P, S) -> decimal(P', S)，其中 P' > P。

目前，无法更改嵌套结构或数组和映射中元素的类型。

分区裁剪

ClickHouse 支持在 Iceberg 表的 SELECT 查询期间进行分区裁剪，这有助于通过跳过不相关的数据文件来优化查询性能。现在它仅适用于身份转换和基于时间的转换（小时、天、月、年）。要启用分区裁剪，请设置 `use_iceberg_partition_pruning = 1`。

别名

表函数 `iceberg` 现在是 `icebergS3` 的别名。

另请参阅

语法​

参数​

定义命名集合​

语法

参数

定义命名集合