azureBlobStorage 表函数

提供一个类似表的接口，用于在 Azure Blob Storage 中选择/插入文件。此表函数类似于 s3 函数。

语法

azureBlobStorage(- connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression, structure])

参数

connection_string|storage_account_url — connection_string 包括帐户名称和密钥（创建连接字符串），或者您也可以在此处提供存储帐户 URL，并将帐户名称和帐户密钥作为单独的参数（请参阅参数 account_name 和 account_key）
container_name - 容器名称
blobpath - 文件路径。在只读模式下支持以下通配符：*、**、?、{abc,def} 和 {N..M}，其中 N、M — 数字，'abc'、'def' — 字符串。
account_name - 如果使用 storage_account_url，则可以在此处指定帐户名称
account_key - 如果使用 storage_account_url，则可以在此处指定帐户密钥
format — 文件的格式。
compression — 支持的值：none、gzip/gz、brotli/br、xz/LZMA、zstd/zst。默认情况下，它将通过文件扩展名自动检测压缩。（与设置为 auto 相同）。
structure — 表的结构。格式 'column1_name column1_type, column2_name column2_type, ...'。

返回值

一个具有指定结构的表，用于在指定文件中读取或写入数据。

示例

与 AzureBlobStorage 表引擎类似，用户可以使用 Azurite 模拟器进行本地 Azure Storage 开发。更多详情请参阅此处。下面我们假设 Azurite 在主机名 azurite1 上可用。

使用以下命令将数据写入 azure blob 存储

INSERT INTO TABLE FUNCTION azureBlobStorage('http://azurite1:10000/devstoreaccount1',
    'testcontainer', 'test_{_partition_id}.csv', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==',
    'CSV', 'auto', 'column1 UInt32, column2 UInt32, column3 UInt32') PARTITION BY column3 VALUES (1, 2, 3), (3, 2, 1), (78, 43, 3);

然后可以使用以下命令读取它

SELECT * FROM azureBlobStorage('http://azurite1:10000/devstoreaccount1',
    'testcontainer', 'test_1.csv', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==',
    'CSV', 'auto', 'column1 UInt32, column2 UInt32, column3 UInt32');

┌───column1─┬────column2─┬───column3─┐
│     3     │       2    │      1    │
└───────────┴────────────┴───────────┘

或使用 connection_string

SELECT count(*) FROM azureBlobStorage('DefaultEndpointsProtocol=https;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;EndPointSuffix=core.windows.net',
    'testcontainer', 'test_3.csv', 'CSV', 'auto' , 'column1 UInt32, column2 UInt32, column3 UInt32');

┌─count()─┐
│      2  │
└─────────┘

虚拟列

_path — 文件路径。类型：LowCardinality(String)。
_file — 文件名。类型：LowCardinality(String)。
_size — 文件大小，以字节为单位。类型：Nullable(UInt64)。如果文件大小未知，则值为 NULL。
_time — 文件的最后修改时间。类型：Nullable(DateTime)。如果时间未知，则值为 NULL。

参见

AzureBlobStorage 表引擎

Hive 风格分区

当将 use_hive_partitioning 设置为 1 时，ClickHouse 将检测路径中的 Hive 风格分区 (/name=value/)，并允许在查询中使用分区列作为虚拟列。这些虚拟列将具有与分区路径中相同的名称，但以 _ 开头。

示例

使用使用 Hive 风格分区创建的虚拟列

SELECT * from azureBlobStorage(config, storage_account_url='...', container='...', blob_path='http://data/path/date=*/country=*/code=*/*.parquet') where _date > '2020-01-01' and _country = 'Netherlands' and _code = 42;

虚拟列​

Hive 风格分区​

虚拟列

Hive 风格分区