跳至主要内容
跳至主要内容
编辑此页

ClickHouse 中的存储过程和查询参数

如果您来自传统的关系数据库,您可能正在寻找 ClickHouse 中的存储过程和预处理语句。本指南解释了 ClickHouse 对这些概念的处理方式,并提供了推荐的替代方案。

ClickHouse 中存储过程的替代方案

ClickHouse 不支持带有控制流逻辑(IF/ELSE、循环等)的传统存储过程。这是基于 ClickHouse 作为分析数据库的架构所做出的有意设计决策。对于分析数据库来说,循环是不鼓励的,因为处理 O(n) 个简单查询通常比处理更少数量的复杂查询要慢。

ClickHouse 针对

  • 分析型工作负载 - 对大型数据集进行复杂聚合
  • 批量处理 - 高效处理大量数据
  • 声明式查询 - SQL 查询,描述要检索的数据,而不是如何处理它

带有过程逻辑的存储过程与这些优化相悖。相反,ClickHouse 提供了与它的优势相符的替代方案。

用户自定义函数 (UDF)

用户自定义函数允许您封装可重用的逻辑,而无需控制流。ClickHouse 支持两种类型

基于 Lambda 的 UDF

使用 SQL 表达式和 lambda 语法创建函数

示例数据的示例
-- Create the products table
CREATE TABLE products (
    product_id UInt32,
    product_name String,
    price Decimal(10, 2)
)
ENGINE = MergeTree()
ORDER BY product_id;

-- Insert sample data
INSERT INTO products (product_id, product_name, price) VALUES
(1, 'Laptop', 899.99),
(2, 'Wireless Mouse', 24.99),
(3, 'USB-C Cable', 12.50),
(4, 'Monitor', 299.00),
(5, 'Keyboard', 79.99),
(6, 'Webcam', 54.95),
(7, 'Desk Lamp', 34.99),
(8, 'External Hard Drive', 119.99),
(9, 'Headphones', 149.00),
(10, 'Phone Stand', 15.99);
-- Simple calculation function
CREATE FUNCTION calculate_tax AS (price, rate) -> price * rate;

SELECT
    product_name,
    price,
    calculate_tax(price, 0.08) AS tax
FROM products;
-- Conditional logic using if()
CREATE FUNCTION price_tier AS (price) ->
    if(price < 100, 'Budget',
       if(price < 500, 'Mid-range', 'Premium'));

SELECT
    product_name,
    price,
    price_tier(price) AS tier
FROM products;
-- String manipulation
CREATE FUNCTION format_phone AS (phone) ->
    concat('(', substring(phone, 1, 3), ') ',
           substring(phone, 4, 3), '-',
           substring(phone, 7, 4));

SELECT format_phone('5551234567');
-- Result: (555) 123-4567

限制

  • 没有循环或复杂的控制流
  • 无法修改数据 (INSERT/UPDATE/DELETE)
  • 不允许递归函数

请参阅 CREATE FUNCTION 以获取完整的语法。

可执行 UDF

对于更复杂的逻辑,请使用调用外部程序的的可执行 UDF

<!-- /etc/clickhouse-server/sentiment_analysis_function.xml -->
<functions>
    <function>
        <type>executable</type>
        <name>sentiment_score</name>
        <return_type>Float32</return_type>
        <argument>
            <type>String</type>
        </argument>
        <format>TabSeparated</format>
        <command>python3 /opt/scripts/sentiment.py</command>
    </function>
</functions>
-- Use the executable UDF
SELECT
    review_text,
    sentiment_score(review_text) AS score
FROM customer_reviews;

可执行 UDF 可以在任何语言(Python、Node.js、Go 等)中实现任意逻辑。

请参阅 可执行 UDF 以获取详细信息。

参数化视图

参数化视图充当返回数据集的函数。它们非常适合具有动态筛选的重用查询

示例数据示例
-- Create the sales table
CREATE TABLE sales (
  date Date,
  product_id UInt32,
  product_name String,
  category String,
  quantity UInt32,
  revenue Decimal(10, 2),
  sales_amount Decimal(10, 2)
)
ENGINE = MergeTree()
ORDER BY (date, product_id);

-- Insert sample data
INSERT INTO sales VALUES
('2024-01-05', 12345, 'Laptop Pro', 'Electronics', 2, 1799.98, 1799.98),
('2024-01-06', 12345, 'Laptop Pro', 'Electronics', 1, 899.99, 899.99),
('2024-01-10', 12346, 'Wireless Mouse', 'Electronics', 5, 124.95, 124.95),
('2024-01-15', 12347, 'USB-C Cable', 'Accessories', 10, 125.00, 125.00),
('2024-01-20', 12345, 'Laptop Pro', 'Electronics', 3, 2699.97, 2699.97),
('2024-01-25', 12348, 'Monitor 4K', 'Electronics', 2, 598.00, 598.00),
('2024-02-01', 12345, 'Laptop Pro', 'Electronics', 1, 899.99, 899.99),
('2024-02-05', 12349, 'Keyboard Mechanical', 'Accessories', 4, 319.96, 319.96),
('2024-02-10', 12346, 'Wireless Mouse', 'Electronics', 8, 199.92, 199.92),
('2024-02-15', 12350, 'Webcam HD', 'Electronics', 3, 164.85, 164.85);
-- Create a parameterized view
CREATE VIEW sales_by_date AS
SELECT
    date,
    product_id,
    sum(quantity) AS total_quantity,
    sum(revenue) AS total_revenue
FROM sales
WHERE date BETWEEN {start_date:Date} AND {end_date:Date}
GROUP BY date, product_id;
-- Query the view with parameters
SELECT *
FROM sales_by_date(start_date='2024-01-01', end_date='2024-01-31')
WHERE product_id = 12345;

常见用例

-- More complex parameterized view
CREATE VIEW top_products_by_category AS
SELECT
    category,
    product_name,
    revenue,
    rank
FROM (
    SELECT
        category,
        product_name,
        revenue,
        rank() OVER (PARTITION BY category ORDER BY revenue DESC) AS rank
    FROM (
        SELECT
            category,
            product_name,
            sum(sales_amount) AS revenue
        FROM sales
        WHERE category = {category:String}
            AND date >= {min_date:Date}
        GROUP BY category, product_name
    )
)
WHERE rank <= {top_n:UInt32};

-- Use it
SELECT * FROM top_products_by_category(
    category='Electronics',
    min_date='2024-01-01',
    top_n=10
);

请参阅 参数化视图 部分以获取更多信息。

物化视图

物化视图非常适合预先计算传统上在存储过程中完成的昂贵聚合。如果您来自传统数据库,可以将物化视图视为一个 INSERT 触发器,它会在数据插入到源表时自动转换和聚合数据

-- Source table
CREATE TABLE page_views (
    user_id UInt64,
    page String,
    timestamp DateTime,
    session_id String
)
ENGINE = MergeTree()
ORDER BY (user_id, timestamp);

-- Materialized view that maintains aggregated statistics
CREATE MATERIALIZED VIEW daily_user_stats
ENGINE = SummingMergeTree()
ORDER BY (date, user_id)
AS SELECT
    toDate(timestamp) AS date,
    user_id,
    count() AS page_views,
    uniq(session_id) AS sessions,
    uniq(page) AS unique_pages
FROM page_views
GROUP BY date, user_id;

-- Insert sample data into source table
INSERT INTO page_views VALUES
(101, '/home', '2024-01-15 10:00:00', 'session_a1'),
(101, '/products', '2024-01-15 10:05:00', 'session_a1'),
(101, '/checkout', '2024-01-15 10:10:00', 'session_a1'),
(102, '/home', '2024-01-15 11:00:00', 'session_b1'),
(102, '/about', '2024-01-15 11:05:00', 'session_b1'),
(101, '/home', '2024-01-16 09:00:00', 'session_a2'),
(101, '/products', '2024-01-16 09:15:00', 'session_a2'),
(103, '/home', '2024-01-16 14:00:00', 'session_c1'),
(103, '/products', '2024-01-16 14:05:00', 'session_c1'),
(103, '/products', '2024-01-16 14:10:00', 'session_c1'),
(102, '/home', '2024-01-17 10:30:00', 'session_b2'),
(102, '/contact', '2024-01-17 10:35:00', 'session_b2');

-- Query pre-aggregated data
SELECT
    user_id,
    sum(page_views) AS total_views,
    sum(sessions) AS total_sessions
FROM daily_user_stats
WHERE date BETWEEN '2024-01-01' AND '2024-01-31'
GROUP BY user_id;

可刷新物化视图

用于计划的批量处理(例如夜间存储过程)

-- Automatically refresh every day at 2 AM
CREATE MATERIALIZED VIEW monthly_sales_report
REFRESH EVERY 1 DAY OFFSET 2 HOUR
AS SELECT
    toStartOfMonth(order_date) AS month,
    region,
    product_category,
    count() AS order_count,
    sum(amount) AS total_revenue,
    avg(amount) AS avg_order_value
FROM orders
WHERE order_date >= today() - INTERVAL 13 MONTH
GROUP BY month, region, product_category;

-- Query always has fresh data
SELECT * FROM monthly_sales_report
WHERE month = toStartOfMonth(today());

请参阅 级联物化视图 以获取高级模式。

外部编排

对于复杂的业务逻辑、ETL 工作流或多步骤流程,始终可以在 ClickHouse 外部使用语言客户端来实现逻辑。

使用应用程序代码

这是一个并排比较,显示了 MySQL 存储过程如何转换为带有 ClickHouse 的应用程序代码

DELIMITER $$

CREATE PROCEDURE process_order(
    IN p_order_id INT,
    IN p_customer_id INT,
    IN p_order_total DECIMAL(10,2),
    OUT p_status VARCHAR(50),
    OUT p_loyalty_points INT
)
BEGIN
    DECLARE v_customer_tier VARCHAR(20);
    DECLARE v_previous_orders INT;
    DECLARE v_discount DECIMAL(10,2);

    -- Start transaction
    START TRANSACTION;

    -- Get customer information
    SELECT tier, total_orders
    INTO v_customer_tier, v_previous_orders
    FROM customers
    WHERE customer_id = p_customer_id;

    -- Calculate discount based on tier
    IF v_customer_tier = 'gold' THEN
        SET v_discount = p_order_total * 0.15;
    ELSEIF v_customer_tier = 'silver' THEN
        SET v_discount = p_order_total * 0.10;
    ELSE
        SET v_discount = 0;
    END IF;

    -- Insert order record
    INSERT INTO orders (order_id, customer_id, order_total, discount, final_amount)
    VALUES (p_order_id, p_customer_id, p_order_total, v_discount,
            p_order_total - v_discount);

    -- Update customer statistics
    UPDATE customers
    SET total_orders = total_orders + 1,
        lifetime_value = lifetime_value + (p_order_total - v_discount),
        last_order_date = NOW()
    WHERE customer_id = p_customer_id;

    -- Calculate loyalty points (1 point per dollar)
    SET p_loyalty_points = FLOOR(p_order_total - v_discount);

    -- Insert loyalty points transaction
    INSERT INTO loyalty_points (customer_id, points, transaction_date, description)
    VALUES (p_customer_id, p_loyalty_points, NOW(),
            CONCAT('Order #', p_order_id));

    -- Check if customer should be upgraded
    IF v_previous_orders + 1 >= 10 AND v_customer_tier = 'bronze' THEN
        UPDATE customers SET tier = 'silver' WHERE customer_id = p_customer_id;
        SET p_status = 'ORDER_COMPLETE_TIER_UPGRADED_SILVER';
    ELSEIF v_previous_orders + 1 >= 50 AND v_customer_tier = 'silver' THEN
        UPDATE customers SET tier = 'gold' WHERE customer_id = p_customer_id;
        SET p_status = 'ORDER_COMPLETE_TIER_UPGRADED_GOLD';
    ELSE
        SET p_status = 'ORDER_COMPLETE';
    END IF;

    COMMIT;
END$$

DELIMITER ;

-- Call the stored procedure
CALL process_order(12345, 5678, 250.00, @status, @points);
SELECT @status, @points;

主要区别

  1. 控制流 - MySQL 存储过程使用 IF/ELSEWHILE 循环。在 ClickHouse 中,在您的应用程序代码(Python、Java 等)中实现此逻辑
  2. 事务 - MySQL 支持 BEGIN/COMMIT/ROLLBACK 以进行 ACID 事务。ClickHouse 是一种针对追加型工作负载优化的分析数据库,而不是事务更新
  3. 更新 - MySQL 使用 UPDATE 语句。ClickHouse 倾向于使用 ReplacingMergeTreeCollapsingMergeTree 进行可变数据
  4. 变量和状态 - MySQL 存储过程可以声明变量 (DECLARE v_discount)。使用 ClickHouse,在您的应用程序代码中管理状态
  5. 错误处理 - MySQL 支持 SIGNAL 和异常处理程序。在应用程序代码中,使用您语言的本机错误处理 (try/catch)
提示

何时使用哪种方法

  • OLTP 工作负载(订单、付款、用户帐户)→ 使用带有存储过程的 MySQL/PostgreSQL
  • 分析型工作负载(报告、聚合、时间序列)→ 使用带有应用程序编排的 ClickHouse
  • 混合架构 → 同时使用两者!将 OLTP 中的事务数据流式传输到 ClickHouse 进行分析

使用工作流编排工具

  • Apache Airflow - 计划和监控 ClickHouse 查询的复杂 DAG
  • dbt - 使用基于 SQL 的工作流转换数据
  • Prefect/Dagster - 现代 Python 基于的编排
  • 自定义调度程序 - Cron 作业、Kubernetes CronJobs 等。

外部编排的好处

  • 完整的编程语言功能
  • 更好的错误处理和重试逻辑
  • 与外部系统的集成(API、其他数据库)
  • 版本控制和测试
  • 监控和警报
  • 更灵活的调度

ClickHouse 中预处理语句的替代方案

虽然 ClickHouse 没有 RDBMS 意义上的传统“预处理语句”,但它提供了 查询参数,它们具有相同的目的:安全的参数化查询,可防止 SQL 注入。

语法

有两种定义查询参数的方法

方法 1:使用 SET

示例表和数据
-- Create the user_events table (ClickHouse syntax)
CREATE TABLE user_events (
    event_id UInt32,
    user_id UInt64,
    event_name String,
    event_date Date,
    event_timestamp DateTime
) ENGINE = MergeTree()
ORDER BY (user_id, event_date);

-- Insert sample data for multiple users and events
INSERT INTO user_events (event_id, user_id, event_name, event_date, event_timestamp) VALUES
(1, 12345, 'page_view', '2024-01-05', '2024-01-05 10:30:00'),
(2, 12345, 'page_view', '2024-01-05', '2024-01-05 10:35:00'),
(3, 12345, 'add_to_cart', '2024-01-05', '2024-01-05 10:40:00'),
(4, 12345, 'page_view', '2024-01-10', '2024-01-10 14:20:00'),
(5, 12345, 'add_to_cart', '2024-01-10', '2024-01-10 14:25:00'),
(6, 12345, 'purchase', '2024-01-10', '2024-01-10 14:30:00'),
(7, 12345, 'page_view', '2024-01-15', '2024-01-15 09:15:00'),
(8, 12345, 'page_view', '2024-01-15', '2024-01-15 09:20:00'),
(9, 12345, 'page_view', '2024-01-20', '2024-01-20 16:45:00'),
(10, 12345, 'add_to_cart', '2024-01-20', '2024-01-20 16:50:00'),
(11, 12345, 'purchase', '2024-01-25', '2024-01-25 11:10:00'),
(12, 12345, 'page_view', '2024-01-28', '2024-01-28 13:30:00'),
(13, 67890, 'page_view', '2024-01-05', '2024-01-05 11:00:00'),
(14, 67890, 'add_to_cart', '2024-01-05', '2024-01-05 11:05:00'),
(15, 67890, 'purchase', '2024-01-05', '2024-01-05 11:10:00'),
(16, 12345, 'page_view', '2024-02-01', '2024-02-01 10:00:00'),
(17, 12345, 'add_to_cart', '2024-02-01', '2024-02-01 10:05:00');
SET param_user_id = 12345;
SET param_start_date = '2024-01-01';
SET param_end_date = '2024-01-31';

SELECT
    event_name,
    count() AS event_count
FROM user_events
WHERE user_id = {user_id: UInt64}
    AND event_date BETWEEN {start_date: Date} AND {end_date: Date}
GROUP BY event_name;

方法 2:使用 CLI 参数

clickhouse-client \
    --param_user_id=12345 \
    --param_start_date='2024-01-01' \
    --param_end_date='2024-01-31' \
    --query="SELECT count() FROM user_events
             WHERE user_id = {user_id: UInt64}
             AND event_date BETWEEN {start_date: Date} AND {end_date: Date}"

参数语法

参数使用以下方式引用:{parameter_name: DataType}

  • parameter_name - 参数的名称(没有 param_ 前缀)
  • DataType - 要将参数转换为的 ClickHouse 数据类型

数据类型示例

示例表和数据
-- 1. Create a table for string and number tests
CREATE TABLE IF NOT EXISTS users (
    name String,
    age UInt8,
    salary Float64
) ENGINE = Memory;

INSERT INTO users VALUES
    ('John Doe', 25, 75000.50),
    ('Jane Smith', 30, 85000.75),
    ('Peter Jones', 20, 50000.00);

-- 2. Create a table for date and timestamp tests
CREATE TABLE IF NOT EXISTS events (
    event_date Date,
    event_timestamp DateTime
) ENGINE = Memory;

INSERT INTO events VALUES
    ('2024-01-15', '2024-01-15 14:30:00'),
    ('2024-01-15', '2024-01-15 15:00:00'),
    ('2024-01-16', '2024-01-16 10:00:00');

-- 3. Create a table for array tests
CREATE TABLE IF NOT EXISTS products (
    id UInt32,
    name String
) ENGINE = Memory;

INSERT INTO products VALUES (1, 'Laptop'), (2, 'Monitor'), (3, 'Mouse'), (4, 'Keyboard');

-- 4. Create a table for Map (struct-like) tests
CREATE TABLE IF NOT EXISTS accounts (
    user_id UInt32,
    status String,
    type String
) ENGINE = Memory;

INSERT INTO accounts VALUES
    (101, 'active', 'premium'),
    (102, 'inactive', 'basic'),
    (103, 'active', 'basic');

-- 5. Create a table for Identifier tests
CREATE TABLE IF NOT EXISTS sales_2024 (
    value UInt32
) ENGINE = Memory;

INSERT INTO sales_2024 VALUES (100), (200), (300);
SET param_name = 'John Doe';
SET param_age = 25;
SET param_salary = 75000.50;

SELECT name, age, salary FROM users
WHERE name = {name: String}
  AND age >= {age: UInt8}
  AND salary <= {salary: Float64};

有关在 语言客户端 中使用查询参数的信息,请参阅您感兴趣的特定语言客户端的文档。

查询参数的限制

查询参数不是通用的文本替换。它们具有特定的限制

  1. 它们主要用于 SELECT 语句 - 在 SELECT 查询中的支持最好
  2. 它们作为标识符或字面量起作用 - 它们不能替代任意 SQL 片段
  3. 它们具有有限的 DDL 支持 - 它们受支持在 CREATE TABLE 中,但不受支持在 ALTER TABLE

有效示例

-- ✓ Values in WHERE clause
SELECT * FROM users WHERE id = {user_id: UInt64};

-- ✓ Table/database names
SELECT * FROM {db: Identifier}.{table: Identifier};

-- ✓ Values in IN clause
SELECT * FROM products WHERE id IN {ids: Array(UInt32)};

-- ✓ CREATE TABLE
CREATE TABLE {table_name: Identifier} (id UInt64, name String) ENGINE = MergeTree() ORDER BY id;

无效示例

-- ✗ Column names in SELECT (use Identifier carefully)
SELECT {column: Identifier} FROM users;  -- Limited support

-- ✗ Arbitrary SQL fragments
SELECT * FROM users {where_clause: String};  -- NOT SUPPORTED

-- ✗ ALTER TABLE statements
ALTER TABLE {table: Identifier} ADD COLUMN new_col String;  -- NOT SUPPORTED

-- ✗ Multiple statements
{statements: String};  -- NOT SUPPORTED

安全最佳实践

始终对用户输入使用查询参数

# ✓ SAFE - Uses parameters
user_input = request.get('user_id')
result = client.query(
    "SELECT * FROM orders WHERE user_id = {uid: UInt64}",
    parameters={'uid': user_input}
)

# ✗ DANGEROUS - SQL injection risk!
user_input = request.get('user_id')
result = client.query(f"SELECT * FROM orders WHERE user_id = {user_input}")

验证输入类型

def get_user_orders(user_id: int, start_date: str):
    # Validate types before querying
    if not isinstance(user_id, int) or user_id <= 0:
        raise ValueError("Invalid user_id")

    # Parameters enforce type safety
    return client.query(
        """
        SELECT * FROM orders
        WHERE user_id = {uid: UInt64}
            AND order_date >= {start: Date}
        """,
        parameters={'uid': user_id, 'start': start_date}
    )

MySQL 协议预处理语句

ClickHouse 的 MySQL 接口 包含对预处理语句(COM_STMT_PREPARECOM_STMT_EXECUTECOM_STMT_CLOSE)的最小支持,主要目的是为了与 Tableau Online 等包装查询在预处理语句中的工具兼容。

主要限制

  • 不支持参数绑定 - 您无法使用 ? 占位符和绑定参数
  • 查询在 PREPARE 期间存储但未解析
  • 实现最少,专为特定的 BI 工具兼容性而设计

无效示例

-- This MySQL-style prepared statement with parameters does NOT work in ClickHouse
PREPARE stmt FROM 'SELECT * FROM users WHERE id = ?';
EXECUTE stmt USING @user_id;  -- Parameter binding not supported
提示

改用 ClickHouse 的本机查询参数。 它们提供完整的参数绑定支持、类型安全性和 SQL 注入预防,适用于所有 ClickHouse 接口

-- ClickHouse native query parameters (recommended)
SET param_user_id = 12345;
SELECT * FROM users WHERE id = {user_id: UInt64};

有关更多详细信息,请参阅 MySQL 接口文档关于 MySQL 支持的博客文章

摘要

ClickHouse 存储过程的替代方案

传统存储过程模式ClickHouse 替代方案
简单的计算和转换用户自定义函数 (UDF)
可重用的参数化查询参数化视图
预先计算的聚合物化视图
计划的批量处理可刷新物化视图
复杂的多步骤 ETL链式物化视图或外部编排(Python、Airflow、dbt)
具有控制流的业务逻辑应用程序代码

查询参数的使用

查询参数可用于

  • 防止 SQL 注入
  • 具有类型安全的参数化查询
  • 应用程序中的动态筛选
  • 可重用的查询模板
    © . This site is unofficial and not affiliated with ClickHouse, Inc.