概述

在了解CN的DML处理逻辑前，需要了解普通的一条SQL的处理过程，可参考polardb-x团队关于一条SQL解析的过程

从整体上看，CN 可以分为协议层、优化器、执行器三部分。“SQL 的一生”从协议层开始，协议层负责接受用户连接请求，建立连接上下文，将用户发来的数据包转换为 SQL 语句，交给优化器生成物理执行计划。物理执行计划中包含本地执行的算子和下发给 DN 的物理 SQL，执行器首先下发物理 SQL 到 DN，然后汇总 DN 返回的结果交给本地执行的算子处理，最后将处理结果返回给协议层，按照 MySQL 协议封装成数据包后发送给用户。

本文将结合SQL解析的过程，具体分析DML处理逻辑。

接下来将根据流程图，具体剖析解析、校验、优化器和执行的过程。

子过程分析

解析和校验

**SQL解析是第一步，**连接PolarDB-X的CN后，执行一条Insert语句，PolarDB-X接收到该字符串语句后，开始执行该SQL，可见TConnection#executeSQL。

public ResultSet executeSQL(ByteString sql, Parameters params, TStatement stmt,
                                ExecutionContext executionContext) throws SQLException {
      //设置了上下文
        OptimizerContext.setContext(this.dataSource.getConfigHolder().getOptimizerContext());
  }

准备执行该SQL语句，ExecutionContext会保留该Sql执行的参数、配置、等上下文信息，该变量会一直陪伴该Sql经过解析、校验、优化器、执行器，直到下发给polardbx-engine(DN)。 PolarDB-X执行该SQL时，需要先获取执行计划，可见代码TConnection#executeQuery

ExecutionContext会保留该Sql执行的参数、配置、等上下文信息，该变量会一直陪伴该Sql经过解析、校验、优化器、执行器，直到下发给DN

为了避免类似的SQL重复解析，CN还做了对SQL的“模板化”处理，类似于占位符的方式对SQL进行预校验。

先将字符串通过FastsqlParser解析成抽象语法树，检查有没有语法错误等，生成SqlNode，本条SQL是Insert语句，解析成SqlInsert类，然后继续根据抽象语法树获取执行计划。

至此，完成了字符串SQL语句到SqlNode的转变，即完成了解析部分。

在解析的基础上，对sqlNode进行校验。

大体流程检查两个部分：

(1)首先，检查insert into xxx语句是否正确；

(2)然后检查SqlInsert.source部分是否有效。

本条SQL是Values，所以检查Values是否有效，如果是Insert ...Select语句，source是SqlSelect，需要检查Select语句是否有效。没有报错，则说明SQL语句语义没有错误，校验通过。

优化器

在经过优化器之前，还需要将SqlNode(SqlInsert)转成RelNode，大体含义就是将sql语法树转成关系表达式，入口在Planner#getPlan：

RelNode relNode = converter.toRel(validatedNode, plannerContext);

具体转换过程在SqlConverter#toRel：

...
final SqlToRelConverter sqlToRelConverter = new TddlSqlToRelConverter(...);
RelRoot root = sqlToRelConverter.convertQuery(validatedNode, false, true);
...

TddlSqlToRelConverter类是PolarDB-X的转换器，继承Calcite的SqlToRelConverter类，转换SqlInsert的执行过程在TddlSqlToRelConverter#convertInsert(SqlInsert call)：

RelNode relNode = super.convertInsert(call);
if (relNode instanceof TableModify) {
    ...
}

可以发现，会调用SqlToRelConverter#convertInsert，在该方法中，会将SqlInsert转成LogicalTableModify，该类的内容如下：

可以注意到几个变量：operation：操作类型；input：输入来源，本条sql是values； PolarDB-X内部还有新的自己的RelNode，所以还会把RelNode再转成自己定义的RelNode，入口在Planner#getPlan：

ToDrdsRelVisitor toDrdsRelVisitor = new ToDrdsRelVisitor(validatedNode, plannerContext);
RelNode drdsRelNode = relNode.accept(toDrdsRelVisitor);

还有一步将RelNode再转成自己定义的RelNode，此处不展开

然后便是经过优化器阶段，优化器执行过程代码在Planner#sqlRewriteAndPlanEnumerate：

private RelNode sqlRewriteAndPlanEnumerate(RelNode input, PlannerContext plannerContext) {
    CalcitePlanOptimizerTrace.getOptimizerTracer().get().addSnapshot("Start", input, plannerContext);
    //RBO优化
    RelNode logicalOutput = optimizeBySqlWriter(input, plannerContext);
    CalcitePlanOptimizerTrace.getOptimizerTracer().get()
        .addSnapshot("PlanEnumerate", logicalOutput, plannerContext);

    //CBO优化
    RelNode bestPlan = optimizeByPlanEnumerator(logicalOutput, plannerContext);

    // finally we should clear the planner to release memory
    bestPlan.getCluster().getPlanner().clear();
    bestPlan.getCluster().invalidateMetadataQuery();
    return bestPlan;
}

nsert的优化器主要在RBO过程，定义了一些规则，CBO规则对Insert几乎没有改变。可以重点关注RBO的OptimizeLogicalInsertRule规则，会根据GMS（PolarDB-X的元数据管理）的信息来判断该SQL的执行计划，可能会将LogicalInsert转变成其它的RelNode去执行，方便区分不同的SQL执行方式，首先会确定该SQL的执行策略，主要分为三种：

public enum ExecutionStrategy { 
    /**
     * Foreach row, exists only one target partition.
     * Pushdown origin statement, with function call not pushable (like sequence call) replaced by RexCallParam.
     * Typical for single table and partitioned table without gsi.
     */
    PUSHDOWN,
    /**
     * Foreach row, might exists more than one target partition.
     * Pushdown origin statement, with nondeterministic function call replaced by RexCallParam.
     * Typical for broadcast table.
     */
    DETERMINISTIC_PUSHDOWN,
    /**
     * Foreach row, might exists more than one target partition, and data in different target partitions might be different.
     * Select then execute, with all function call replaced by RexCallParam.
     * Typical for table with gsi or table are doing scale out.
     */
    LOGICAL;
};

普通简单insert语句，策略是PUSHDOWN，处理过程也比较简单，生成InsertWriter，该类负责生成下发到DN的SQL语句，保存在LogicalInsert中。

经过优化器后，还是LogicalInsert类的RelNode，至此，意味着优化器执行完毕。最终会生成执行计划，在PlanCache#getFromCache。

在这个执行计划中，还涉及到结合分片键计算路由分片的逻辑（与dbproxy类似）。

Insert语句包含拆分键和auto_increment列，只需要根据拆分键就能确定下发到DN的哪一个分片，在CN端无需更多操作，所以会简化执行计划，在BuildFinalPlanVisitor#buildSingleTableInsert转成SingleTableOperation，并保存了分库分表规则。

执行器

SQL语句生成执行计划后，将由执行器进行执行，执行入口在TConnection#executeQuery：

ResultCursor resultCursor = executor.execute(plan, executionContext);

然后会由ExecutorHelper#execute方法执行ExecutionPlan.plan，也就是前面的SingleTableOperation，执行策略有CURSOR、TP_LOCAL、AP_LOCAL、MPP，Insert类型基本都是走CURSOR，接着根据执行计划拿对应的Handler进行处理，具体可查看CommandHandlerFactoryMyImp类，例如：SingleTableOperation是MySingleTableModifyHandler，LogicalInsert是LogicalInsertHandler。会在对应的Handler里面进行执行，一般会返回一个Cursor，Cursor里面会调用真正的执行过程，调用Cursor.next便会获取结果，Insert语句的结果是affect Rows，本条SQL会创建一个 MyPhyTableModifyCursor，入口在MySingleTableModifyHandler#handleInner：

...
MyPhyTableModifyCursor modifyCursor = (MyPhyTableModifyCursor) repo.getCursorFactory().repoCursor(executionContext, logicalPlan);
...
affectRows = modifyCursor.batchUpdate();
...

根据ExecutionContext和SingleTableOperation创建一个MyPhyTableModifyCursor，然后直接执行：

public int[] batchUpdate() {
    try {
        return handler.executeUpdate(this.plan);
    } catch (SQLException e) {
        throw GeneralUtil.nestedException(e);
    }
}

这里的this.plan就是SingleTableOperation，handler是PolarDB-X的CN与DN间交互的MyJdbcHandler，可以认为是执行物理计划的handler，会根据plan生成真正的物理SQL，下发到DN执行。

物理执行

PolarDB-X中CN与DN的交互都在MyJdbcHandler中，以SingleTableOperation为例，看看具体交互过程：

public int[] executeUpdate(BaseQueryOperation phyTableModify) throws SQLException {
 ...
    //获取物理执行计划的库名和参数
    Pair<String, Map<Integer, ParameterContext>> dbIndexAndParam =
            phyTableModify.getDbIndexAndParam(executionContext.getParams() == null ? null : executionContext.getParams()
                .getCurrentParameter(), executionContext);
 ...
    //根据库名获取连接
    connection = getPhyConnection(transaction, rw, groupName);
 ...
     //根据参数组成字符串SQL
     String sql = buildSql(sqlAndParam.sql, executionContext);
 ...
     //根据连接创建prepareStatement
     ps = prepareStatement(sql, connection, executionContext, isInsert, false);
 ...
     //设置参数
     ParameterMethod.setParameters(ps, sqlAndParam.param);
 ...
     //执行
     affectRow = ((PreparedStatement) ps).executeUpdate();
 ...
}

将物理执行计划发送到DN执行，执行完成后，根据affectRow返回到执行器，最终会把结果返回给用户，至此，一条完整的DML SQL就执行完成。

概述

在了解CN的DML处理逻辑前，需要了解普通的一条SQL的处理过程，可参考polardb-x团队关于一条SQL解析的过程

从整体上看，CN 可以分为协议层、优化器、执行器三部分。“SQL 的一生”从协议层开始，协议层负责接受用户连接请求，建立连接上下文，将用户发来的数据包转换为 SQL 语句，交给优化器生成物理执行计划。物理执行计划中包含本地执行的算子和下发给 DN 的物理 SQL，执行器首先下发物理 SQL 到 DN，然后汇总 DN 返回的结果交给本地执行的算子处理，最后将处理结果返回给协议层，按照 MySQL 协议封装成数据包后发送给用户。

本文将结合SQL解析的过程，具体分析DML处理逻辑。

接下来将根据流程图，具体剖析解析、校验、优化器和执行的过程。

子过程分析

解析和校验

**SQL解析是第一步，**连接PolarDB-X的CN后，执行一条Insert语句，PolarDB-X接收到该字符串语句后，开始执行该SQL，可见TConnection#executeSQL。

public ResultSet executeSQL(ByteString sql, Parameters params, TStatement stmt,
                                ExecutionContext executionContext) throws SQLException {
      //设置了上下文
        OptimizerContext.setContext(this.dataSource.getConfigHolder().getOptimizerContext());
  }

ExecutionContext会保留该Sql执行的参数、配置、等上下文信息，该变量会一直陪伴该Sql经过解析、校验、优化器、执行器，直到下发给DN

为了避免类似的SQL重复解析，CN还做了对SQL的“模板化”处理，类似于占位符的方式对SQL进行预校验。

至此，完成了字符串SQL语句到SqlNode的转变，即完成了解析部分。

在解析的基础上，对sqlNode进行校验。

大体流程检查两个部分：

(1)首先，检查insert into xxx语句是否正确；

(2)然后检查SqlInsert.source部分是否有效。

优化器

在经过优化器之前，还需要将SqlNode(SqlInsert)转成RelNode，大体含义就是将sql语法树转成关系表达式，入口在Planner#getPlan：

RelNode relNode = converter.toRel(validatedNode, plannerContext);

具体转换过程在SqlConverter#toRel：

...
final SqlToRelConverter sqlToRelConverter = new TddlSqlToRelConverter(...);
RelRoot root = sqlToRelConverter.convertQuery(validatedNode, false, true);
...

TddlSqlToRelConverter类是PolarDB-X的转换器，继承Calcite的SqlToRelConverter类，转换SqlInsert的执行过程在TddlSqlToRelConverter#convertInsert(SqlInsert call)：

RelNode relNode = super.convertInsert(call);
if (relNode instanceof TableModify) {
    ...
}

可以发现，会调用SqlToRelConverter#convertInsert，在该方法中，会将SqlInsert转成LogicalTableModify，该类的内容如下：

ToDrdsRelVisitor toDrdsRelVisitor = new ToDrdsRelVisitor(validatedNode, plannerContext);
RelNode drdsRelNode = relNode.accept(toDrdsRelVisitor);

还有一步将RelNode再转成自己定义的RelNode，此处不展开

然后便是经过优化器阶段，优化器执行过程代码在Planner#sqlRewriteAndPlanEnumerate：

private RelNode sqlRewriteAndPlanEnumerate(RelNode input, PlannerContext plannerContext) {
    CalcitePlanOptimizerTrace.getOptimizerTracer().get().addSnapshot("Start", input, plannerContext);
    //RBO优化
    RelNode logicalOutput = optimizeBySqlWriter(input, plannerContext);
    CalcitePlanOptimizerTrace.getOptimizerTracer().get()
        .addSnapshot("PlanEnumerate", logicalOutput, plannerContext);

    //CBO优化
    RelNode bestPlan = optimizeByPlanEnumerator(logicalOutput, plannerContext);

    // finally we should clear the planner to release memory
    bestPlan.getCluster().getPlanner().clear();
    bestPlan.getCluster().invalidateMetadataQuery();
    return bestPlan;
}

public enum ExecutionStrategy { 
    /**
     * Foreach row, exists only one target partition.
     * Pushdown origin statement, with function call not pushable (like sequence call) replaced by RexCallParam.
     * Typical for single table and partitioned table without gsi.
     */
    PUSHDOWN,
    /**
     * Foreach row, might exists more than one target partition.
     * Pushdown origin statement, with nondeterministic function call replaced by RexCallParam.
     * Typical for broadcast table.
     */
    DETERMINISTIC_PUSHDOWN,
    /**
     * Foreach row, might exists more than one target partition, and data in different target partitions might be different.
     * Select then execute, with all function call replaced by RexCallParam.
     * Typical for table with gsi or table are doing scale out.
     */
    LOGICAL;
};

普通简单insert语句，策略是PUSHDOWN，处理过程也比较简单，生成InsertWriter，该类负责生成下发到DN的SQL语句，保存在LogicalInsert中。

经过优化器后，还是LogicalInsert类的RelNode，至此，意味着优化器执行完毕。最终会生成执行计划，在PlanCache#getFromCache。

在这个执行计划中，还涉及到结合分片键计算路由分片的逻辑（与dbproxy类似）。

执行器

SQL语句生成执行计划后，将由执行器进行执行，执行入口在TConnection#executeQuery：

ResultCursor resultCursor = executor.execute(plan, executionContext);

...
MyPhyTableModifyCursor modifyCursor = (MyPhyTableModifyCursor) repo.getCursorFactory().repoCursor(executionContext, logicalPlan);
...
affectRows = modifyCursor.batchUpdate();
...

根据ExecutionContext和SingleTableOperation创建一个MyPhyTableModifyCursor，然后直接执行：

public int[] batchUpdate() {
    try {
        return handler.executeUpdate(this.plan);
    } catch (SQLException e) {
        throw GeneralUtil.nestedException(e);
    }
}

物理执行

PolarDB-X中CN与DN的交互都在MyJdbcHandler中，以SingleTableOperation为例，看看具体交互过程：

public int[] executeUpdate(BaseQueryOperation phyTableModify) throws SQLException {
 ...
    //获取物理执行计划的库名和参数
    Pair<String, Map<Integer, ParameterContext>> dbIndexAndParam =
            phyTableModify.getDbIndexAndParam(executionContext.getParams() == null ? null : executionContext.getParams()
                .getCurrentParameter(), executionContext);
 ...
    //根据库名获取连接
    connection = getPhyConnection(transaction, rw, groupName);
 ...
     //根据参数组成字符串SQL
     String sql = buildSql(sqlAndParam.sql, executionContext);
 ...
     //根据连接创建prepareStatement
     ps = prepareStatement(sql, connection, executionContext, isInsert, false);
 ...
     //设置参数
     ParameterMethod.setParameters(ps, sqlAndParam.param);
 ...
     //执行
     affectRow = ((PreparedStatement) ps).executeUpdate();
 ...
}

将物理执行计划发送到DN执行，执行完成后，根据affectRow返回到执行器，最终会把结果返回给用户，至此，一条完整的DML SQL就执行完成。

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

polardb-X DML执行过程源码分析

概述

子过程分析

解析和校验

优化器

执行器

物理执行

polardb-X DML执行过程源码分析

概述

子过程分析

解析和校验

优化器

执行器

物理执行

活动

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

polardb-X DML执行过程源码分析

概述

子过程分析

解析和校验

优化器

执行器

物理执行

polardb-X DML执行过程源码分析

概述

子过程分析

解析和校验

优化器

执行器

物理执行