searchusermenu
  • 发布文章
  • 消息中心
点赞
收藏
评论
分享
原创

openflow流表实践与分析

2023-03-29 09:42:21
173
0

# - 多网络访问场景流表设计概述

多网络访问场景控制器是在云平台控制器上配置网络资源的时候,比如创建子网,创建网口,attach网口等等的操作,会被dpu上的云平台控制器响应去配置对应的流表。

     多网络访问场景流表设计类似linux内核网络模块,具备l2,l3层转发,访问本机服务的能力,且具备类似netfilter框架钩子,实现端口&&mac绑定、出入方向限速,安全组(支持带状态),ACL等。**也支持ipv6等协议**

 

## - 流表pipeline结构

多网络访问场景流表设计的pipeline可以大致的分成:

phase1:根据不同源端口分别处理,vxlan类型和本地port类型

phase2:Egress 流控,包含BPS, PPS

phase3:port绑定检查

phase4:Egress 安全组(支持带状态)

phase5:访问本地其他服务,如dhcp, arp欺骗havip,metadata svc等

phase6: 二层转发 到本地或走隧道转发出去

phase7: 三层转发 访问网关,查路由,类似路由器三层转发对包修改

phase8: 二层转发 查完路由后,换了vni和子网

phase9: Ingress 安全组

phase10: Ingress 流控,BPS,PPS

phase11: Output 最终的转发出接口

----

## 关键fileds说明

##### 寄存器说明

1.  reg3: ct*label*
2.  reg5: tun*id  大二层隧道id*
3.  reg6: inport ofport num 
4.  reg7: outport ofport num 
5.  reg8: 未知
6.  reg9: vm id
7.  reg11: ct*mark  可用来实现状态防火墙等*
8.  metadata: vpc+subnet

**其他寄存器暂且没有太看到**

 

> 实际语法中,actions居多会用NXM*NX*REG和OXM*OF*前缀的寄存器字段和fields,他们只是不同厂商对openflow协议的扩展,比如reg5 等同于 NXM*NX*REG5,  in*port 等同于 OXM*OF*IN*PORT等等

 

##### ovs conntrack状态

> ct状态需要说明的是最常用到的几种状态和ovs的实现。ovs通过match域匹配ct*state来匹配状态,通过ct动作里完成 相关的操作。 ct*state是通过bit位来标志的某个状态是否置位

1.  + trk   但凡进入到ct模块,就置位
2.  + new  进入CT后,查不到已有连接,就新建,与+trk一起置位
3.  + est  同一个方向来去方向都有包后,置该位,与-new互斥
4.  + rel 跟其他已存在的会话有关联,比如icmp unreachable,或ftp,iperf的控制会话与数据面传输会话
5.  + rpl  回程包,反向的回程包等
6.  + inv 无效的ct状态

 

##### ovs conntrack action

1.  table  ct动作完成后,最后跳的目的table
2.  commit  对+trk的包匹配ct*state后,完成commit操作; 即将会话由unconfirm表放到到confirm*table
3.  zone  ct的上下文环境,会话在zone之间是完全隔离的
4.  nat 做正向snat, dnat等, 也可做反向nat
5.  exec([action][,action…])  执行对ct会话的一些修改,比如设置ct*mark*
6.  force  强制commit,重建会话

----

## 流表实现分析

###### vpc实现

> 采用vxlan overlay实现大二层,arp洪泛采用本地arp代答,通过metadata和reg5在流表中实现子网,vpc隔离等

 

###### L2层访问

> 大二层访问,对跨节点arp请求采用本地代答的方式(包括网关的mac请求),通过本地寻址的方式,判定出口是走隧道还是送往本节点代表口。目的mac是本节点,则送往对应的代表口。  
> 如果目的mac不是本地,则根据对应的vtep和tunid封装后从vxlan口发出去。

 

###### L3层访问

> 三层访问,在二层查询后发现目的mac是子网网关的mac地址,然后预检查ttl是否该丢弃, 然后看是否有acl,没有则通过查询目的网段,匹配源vni, 然后修改其相关的tunid为新vni,修改源mac成新vni子网的网关mac,目的mac为目的ip的mac,重新跳到L2表寻址。 返程类似

 

###### 流控

> 通过meters实例实现了出入方向BPS/PPS的限速。

 

###### 安全组(支持带状态)

> 根据端口粒度实现不带状态的安全组,也支持利用ct状态实现带状态的

 

###### ACL

> 根据源,目,协议号等方式匹配,然后选择放行或拒绝

 

###### NAT

> 访问本机的一些服务,用到了nat。 ovs的nat是基于ct动作不同参数实现

 

----

## 主要的访问场景

-  同vpc下同子网同宿主机
-  同vpc下同子网跨宿主机
-  同vpc下跨子网同宿主机
-  同vpc下跨子网跨宿主机
-  跨vpc三层访问 

 

# 访问场景实例

### 同subnet跨节点访问

##### 拓扑

###### **Node1(10.23.10.6) 访问 Node2(10.23.10.4)**

###### Node1:

    
    IP: xx.xx.10.6
    MAC: fa:16:3e:17:4b:9d
    代表口:port-xxxxxq2py2
    vtep: 10.24.40.67

 

###### Node2:

    IP: xx.xx.10.4
    MAC: fa:16:3e:46:09:25
    代表口:port-yyyyyv66x7
    vtep: xx.xx.40.70

 

###### 流表**Node1(发送)**

###### arp处理

**table=xx** 

arp均采用代答的方式(**后面不再分析arp**),实现原理:修改sha, spa, tha ,tpa,arp*op等实现*

    cookie=0x170a30c1320ce4af, table=xx, priority=100,arp,metadata=0x47d100000000,arp_tpa=xx.xx.100.7,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],set_field:fa:16:3e:0c:02:73->eth_src,set_field:2->arp_op,set_field:xx.xx.100.7->arp_spa,set_field:fa:16:3e:0c:02:73->arp_sha,IN_PORT

 

###### ip处理

**table=0** 流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port number,reg9是vmid,metadata是tunid*subnetid*

    cookie=0x170a32cec8c3e4c1, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1

 

**table=1** ip报文都跳到限速处理

    cookie=0x170a380bc44c5c23, table=1, priority=50,actions=goto_table:5_

 

**table=5**  Egress BPS限速,无限速规则不涉及

    cookie=0x170a380bc44c5b6b, table=5, priority=100 actions=goto_table:6_

 

**table=6** Egress PPS限速,无限速规则

    cookie=0x170a380bc44c5b7f, table=6, priority=100 actions=goto_table:10_

 

**table=10** Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip

    cookie=0x170a32cec8c3e501, table=10, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20

 

**table=20** Egress Pre-CT; icmp报文进入到ct,zone由源端口ofport number区分

    cookie=0x170a380bc44c5c9f, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])

 

**table=25**  Egress匹配ct 状态,根据port号筛选zone,匹配ct状态:+new+trk;zone和状态正确,则commit确认ct状态

    cookie=0x170a32cec8c3e4ed, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])

 

**table=30** 是否访问本机服务,服务请查看30表全部流表,本次icmp不涉及

    cookie=0x170a3983468be7c9, table=30, priority=50 actions=goto_table:60

 

**table=60**  根据reg5(vni)、目的mac匹配走哪个隧道封装,并设置出接口为vxlan1; 可以通过ovs-ofctl show br-int查到关系

    cookie=0x170a3983468be931, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80

 

**table=80** 根据出接口,存下vmid,

    cookie=0x170a3983468beaf9, table=80, priority=1000,reg7=0x4 actions=set_field:0x64->reg9,goto_table:81

 

**table=81** svc probe, 不涉及,跳过

    cookie=0x170a3983468be853, table=81, priority=100 actions=goto_table:85_

 

**table=85** Ingress BPS, 不涉及

    cookie=0x170a3983468be84b, table=85, priority=100 actions=goto_table:86

 

**table=86** Ingress PPS, 不涉及

    cookie=0x170a3983468be8db, table=86, priority=100 actions=goto_table:90

 

**table=90** 从出接口发出去, 本case是从vxlan口发出去

    cookie=0x170a3983468be837, table=90, priority=1000 actions=output:NXM_NX_REG7[]

 

 

###### 流表**node2(接收)**

**table=0** 跨节点接收,都是从vxlan口收到包

    cookie=0x170a277782792425, priority=1000,in_port=vxlan1 actions=goto_table:50

**table=50** 匹配隧道,目的mac, 设置接收端的pipline里的寄存器,metadata是tunid*subnetid*, reg=vni,

    cookie=0x170a2777827925f9, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd00000001->metadata,set_field:0xd54dd->reg5,resubmit(,30)

**table=30** 是否访问本服务,不涉及

    cookie=0x170a2777827923e7, table=30, priority=50 actions=goto_table:60

**table=60** 匹配tunid和mac,二层转发查询

    cookie=0x170a277782792601, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:0x6->reg7,goto_table:70

**table=70**

    cookie=0x170a277782792393, table=70, priority=58000,icmp actions=ct(table=75,zone=NXM_NX_REG7[0..15])

**table=75** 匹配ct状态,首包匹配+trk+new, 后续包匹配+trk+est

    **首包,匹配+trk+new**
    cookie=0x170a2777827925eb, duration=33902.506s, table=75, n_packets=9144, n_bytes=895680, idle_age=4, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])**table=**
    
    **后续包匹配+trk+est**
    cookie=0x170a2777827923ed, table=75, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:80 
    

**table=80** 匹配出接口,从某个口发出去,也就从ofport=6的接口发出去

    cookie=0x170a27778279262d, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81

**table=81** probe svc,不涉及

    cookie=0x170a2777827924a3, table=81, priority=100 actions=goto_table:85

**table=85**

    cookie=0x170a277782792431, table=85, priority=100 actions=goto_table:86

**table=86**

    cookie=0x170a27778279241d, table=86, priority=100 actions=goto_table:90

**table=90**

    cookie=0x170a27778279239d, table=90, priority=1000 actions=output:NXM_NX_REG7[]

 

 

### 场景跨子网访问

##### 拓扑

###### **Node1(xx.xx.10.6) 访问 Node2(xx.xx.11.4)**

 

###### Node1:

    
    IP: xx.xx.10.6
    MAC: fa:16:3e:17:4b:9d
    下一跳:xx.xx.10.1(fa:16:3e:ec:22:0d)
    代表口:port-xxxxxq2py2 (ens4的代表口)
    vtep: xx.xx.40.67

 

###### Node2:

    IP: xx.xx.11.4
    MAC: fa:16:3e:74:20:c6
    下一跳:xx.xx.11.1(fa:16:3e:c4:ed:57)
    代表口:port-2zbgfw4f26
    vtep: xx.xx.40.70

 

###### 流表**Node1(发送)**

**table=0**   流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port

    cookie=0x170a48c6658cc733, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1

**table=1**   ip报文都跳到限速处理

    cookie=0x170a48c6658cc42d, table=1, priority=50 actions=goto_table:5

**table=5**  Egress BPS限速,无限速规则不涉及

    cookie=0x170a48c6658cc435, table=5, priority=100 actions=goto_table:6

**table=6** Egress PPS限速,无限速规则

    cookie=0x170a48c6658cc4d5, table=6, priority=100 actions=goto_table:10

**table=10**  Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip

    cookie=0x170a48c6658cc74d, duration=1362.918s, table=10, n_packets=13516, n_bytes=1469823, idle_age=1, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20

**table=20** 

    cookie=0x170a48c6658cc4b5, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])

**table=25** 

    **首包**
    cookie=0x170a48c6658cc73b, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])
    
    **后续包**
    cookie=0x170a48c6658cc4fb, table=25, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:30

**table=30** 

    cookie=0x170a48c6658cc4a5, table=30, priority=50 actions=goto_table:60

**table=60** 

    cookie=0x170a48c6658cc5fb, table=60, priority=100,metadata=0xd54dd00000001,dl_dst=fa:16:3e:ec:22:0d actions=goto_table:100

**table=100** 三层转发入口,目的ip不是本机,则到pre routing

    cookie=0x170a48c6658cc4e1, table=100, priority=50,ip actions=goto_table:110

**table=110**  路由前,查ttl若为0或1则丢包,否则继续

    cookie=0x170a4b2b174d5e1d, table=110, priority=100 actions=goto_table:120

**table=120** 匹配acl,没有规则,跳过 

    cookie=0x170a4b2b174d5e1f, table=120, priority=50 actions=goto_table:130

**table=130**  查目的网段是xx.xx.0.0/16,则去查精细路由

    cookie=0x170a4b2b174d5f1d, duration=180.589s, table=130, n_packets=4860, n_bytes=733806, idle_age=1, priority=10016,ip,metadata=0xd54dd00000001,nw_dst=xx.xx.0.0/16 actions=goto_table:140

**table=140**   查精细路由,根据reg5筛选大二层,根据目的ip查到具体路由,通过修改大二层metadata, 修改目的ip的mac未目的mac(原先是网关mac),修改源mac为目的网段网关mac,ttl减1, 跳到postrouting

    cookie=0x170a4b2b174d5ed3, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160

**table=160**  postrouting没有动作,跳过

    cookie=0x170a4b2b174d5d6b, table=160, priority=50 actions=resubmit(,170)

**table=170**  查完路由后,重新二层转发,也就是根据目的mac查找出接口

    cookie=0x170a4b2b174d5db3, table=170, priority=50 actions=resubmit(,30)

**table=30**  不访问本地服务,直接查mac表

    cookie=0x170a4c013de71d19, table=30, priority=50 actions=goto_table:60

**table=60**  根据大二层vni和目的mac(目的ip的实际mac),进行隧道封装,注意这里的reg7的赋值,他是出接口

    cookie=0x170a4c013de71e81, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80

**table=80** 

    cookie=0x170a4c013de71d77, table=80, priority=1000,reg7=0x2 actions=output:vxlan1

 

 

###### 流表 **Node2(接收)**

**table=0**

    cookie=0x170d7535048f9acb, priority=1000,in_port=vxlan1 actions=goto_table:50

**table=50** l3 lookup

    cookie=0x170d7535048f9cc3, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:c4:ed:57 actions=set_field:0xd54dd00000002->metadata,set_field:0xd54dd->reg5,goto_table:140

**table=140**   查询目的网关

    ookie=0x170d7535048f9cd9, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160

**table=160** postrouting

    cookie=0x170d7535048f9acf, table=160, priority=50 actions=resubmit(,170)

**table=170** ingress acl 

    cookie=0x170d7535048f9b29, table=170, priority=50 actions=resubmit(,30)

**table=30**

    cookie=0x170d7535048f9afd, table=30, priority=50 actions=goto_table:60

**table=60**

     cookie=0x170d7535048f9e0d, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:0x7->reg7,goto_table:70

**table=70**

    cookie=0x170d7535048f9bd5, table=70, priority=58000,tcp actions=ct(table=75,zone=NXM_NX_REG7[0..15])

**table=75**

    cookie=0x170d7535048f9df5, table=75, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])

**table=80**

    cookie=0x170d7535048f9dfb, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81

**table=81**

    cookie=0x170d7535048f9b91, table=81, priority=100 actions=goto_table:85

**table=85**

    cookie=0x170d7535048f9be1, table=85, priority=100 actions=goto_table:86

**table=86**

    cookie=0x170d7535048f9b49, table=86, priority=100 actions=goto_table:90

**table=90**

    cookie=0x170d7535048f9bef, table=90, priority=1000 actions=output:NXM_NX_REG7[]

 

### 场景 访问本地服务

**访问** 访问metadata接口

    cookie=0x170b616612e8b709, table=10, priority=2000,tcp,nw_dst=169.254.169.254,tp_dst=8000 actions=goto_table:30
    cookie=0x170b616612e8b781, table=45, priority=100,tcp,nw_dst=169.254.169.254,tp_dst=8000 actions=set_field:fa:16:3e:25:fd:7e->eth_dst,set_field:8111->tcp_dst,goto_table:46
    cookie=0x170b616612e8b759, table=46, priority=2000,tcp,nw_dst=169.254.169.254,tp_dst=8111 actions=move:NXM_NX_REG6[]->NXM_OF_IP_SRC[],set_field:128.0.0.0/16->ip_src,output:1
    

**返程**

    cookie=0x170b616612e8b859, priority=100,in_port=1 actions=goto_table:47
    cookie=0x170b616612e8bd73, table=47, priority=100,tcp,nw_dst=128.0.0.2,tp_src=8111 actions=set_field:169.254.169.254->ip_src,set_field:xx.xx.100.5->ip_dst,set_field:8000->tcp_src,output:"port-17icfhnrgo"

 

### 场景 安全组 基于port不带状态安全组

**table=70**

    cookie=0x170b616612e8bdc1, table=70, priority=39800,ip,reg7=0x4 actions=set_field:0x46->reg8,goto_table:200
    >  cookie=0x170b616612e8bdc5, table=70, priority=39800,ipv6,reg7=0x4 actions=set_field:0x46->reg8,goto_table:200

 

### 场景 基于port带状态安全组

    cookie=0x170b616612e8becb, table=75, priority=39800,ct_state=+new-est-rel-inv+trk,ip,reg7=0xd actions=set_field:0x4b->reg8,goto_table:200
    >  cookie=0x170b616612e8bf11, table=75, priority=39800,ct_state=+new-est-rel-inv+trk,ipv6,reg7=0xd actions=set_field:0x4b->reg8,goto_table:200

 

### 场景 nat

    cookie=0x170b616612e8bf69, table=44, priority=2000,ct_state=+new-est-rel-inv+trk,tcp,reg6=0x9,tp_dst=20048 actions=encap(tcp_option(tlv(254,0x0a156b03000047d1))),ct(commit,table=80,nat(src=xx.xx.9.207,random))
    cookie=0x170b616612e8c0f1, table=44, priority=2000,ct_state=+new-est-rel-inv+trk,tcp6,reg6=0xa,tp_dst=20048 actions=encap(tcp_option(tlv(254,0x010000000007000000123d2100080041))),ct(commit,table=80,nat(src=240e:108:4:200:1:2:0:70f,random))

 

### 场景 流控

    table=85, priority=1000,reg9=0x64 actions=meter:101,goto_table:86
    table=86, priority=1000,reg9=0x64 actions=meter:102,goto_table:90

 

### 场景 ACL

    table=170,priority=55533,icmp,metadata=0x2076370000000a,nw_src=10.2.2.11,nw_dst=10.2.1.11 actions=resubmit(,30)  
    table=170, priority=24535,icmp6,metadata=0x1cc23500000000 actions=set_field:0xaa->reg8,goto_table:200
    

# 访问场景实例

### 同subnet跨节点访问

##### 拓扑

###### **Node1(xx.xx.10.6) 访问 Node2(xx.xx.10.4)**

###### Node1:

    
    IP: xx.xx.10.6
    MAC: fa:16:3e:17:4b:9d
    代表口:port-xxxxxq2py2
    vtep: xx.xx.40.67

 

###### Node2:

    IP: xx.xx.10.4
    MAC: fa:16:3e:46:09:25
    代表口:port-yyyyyv66x7
    vtep: xx.xx.40.70

 

###### 流表**Node1(发送)**

###### arp处理

**table=35** 

arp均采用代答的方式(**后面不再分析arp**),实现原理:修改sha, spa, tha ,tpa,arp*op等实现*

    cookie=0x170a30c1320ce4af, table=35, priority=100,arp,metadata=0x47d100000000,arp_tpa=xx.xx.100.7,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],set_field:fa:16:3e:0c:02:73->eth_src,set_field:2->arp_op,set_field:xx.xx.100.7->arp_spa,set_field:fa:16:3e:0c:02:73->arp_sha,IN_PORT

 

###### ip处理

**table=0** 流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port number,reg9是vmid,metadata是tunid*subnetid*

    cookie=0x170a32cec8c3e4c1, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1

 

**table=1** ip报文都跳到限速处理

    cookie=0x170a380bc44c5c23, table=1, priority=50,actions=goto_table:5_

 

**table=5**  Egress BPS限速,无限速规则不涉及

    cookie=0x170a380bc44c5b6b, table=5, priority=100 actions=goto_table:6_

 

**table=6** Egress PPS限速,无限速规则

    cookie=0x170a380bc44c5b7f, table=6, priority=100 actions=goto_table:10_

 

**table=10** Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip

    cookie=0x170a32cec8c3e501, table=10, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20

 

**table=20** Egress Pre-CT; icmp报文进入到ct,zone由源端口ofport number区分

    cookie=0x170a380bc44c5c9f, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])

 

**table=25**  Egress匹配ct 状态,根据port号筛选zone,匹配ct状态:+new+trk;zone和状态正确,则commit确认ct状态

    cookie=0x170a32cec8c3e4ed, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])

 

**table=30** 是否访问本机服务,服务请查看30表全部流表,本次icmp不涉及

    cookie=0x170a3983468be7c9, table=30, priority=50 actions=goto_table:60

 

**table=60**  根据reg5(vni)、目的mac匹配走哪个隧道封装,并设置出接口为vxlan1; 可以通过ovs-ofctl show br-int查到关系

    cookie=0x170a3983468be931, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80

 

**table=80** 根据出接口,存下vmid,

    cookie=0x170a3983468beaf9, table=80, priority=1000,reg7=0x4 actions=set_field:0x64->reg9,goto_table:81

 

**table=81** svc probe, 不涉及,跳过

    cookie=0x170a3983468be853, table=81, priority=100 actions=goto_table:85_

 

**table=85** Ingress BPS, 不涉及

    cookie=0x170a3983468be84b, table=85, priority=100 actions=goto_table:86

 

**table=86** Ingress PPS, 不涉及

    cookie=0x170a3983468be8db, table=86, priority=100 actions=goto_table:90

 

**table=90** 从出接口发出去, 本case是从vxlan口发出去

    cookie=0x170a3983468be837, table=90, priority=1000 actions=output:NXM_NX_REG7[]

 

 

###### 流表**node2(接收)**

**table=0** 跨节点接收,都是从vxlan口收到包

    cookie=0x170a277782792425, priority=1000,in_port=vxlan1 actions=goto_table:50

**table=50** 匹配隧道,目的mac, 设置接收端的pipline里的寄存器,metadata是tunid*subnetid*, reg=vni,

    cookie=0x170a2777827925f9, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd00000001->metadata,set_field:0xd54dd->reg5,resubmit(,30)

**table=30** 是否访问本服务,不涉及

    cookie=0x170a2777827923e7, table=30, priority=50 actions=goto_table:60

**table=60** 匹配tunid和mac,二层转发查询

    cookie=0x170a277782792601, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:0x6->reg7,goto_table:70

**table=70**

    cookie=0x170a277782792393, table=70, priority=58000,icmp actions=ct(table=75,zone=NXM_NX_REG7[0..15])

**table=75** 匹配ct状态,首包匹配+trk+new, 后续包匹配+trk+est

    **首包,匹配+trk+new**
    cookie=0x170a2777827925eb, duration=33902.506s, table=75, n_packets=9144, n_bytes=895680, idle_age=4, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])**table=**
    
    **后续包匹配+trk+est**
    cookie=0x170a2777827923ed, table=75, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:80 
    

**table=80** 匹配出接口,从某个口发出去,也就从ofport=6的接口发出去

    cookie=0x170a27778279262d, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81

**table=81** probe svc,不涉及

    cookie=0x170a2777827924a3, table=81, priority=100 actions=goto_table:85

**table=85**

    cookie=0x170a277782792431, table=85, priority=100 actions=goto_table:86

**table=86**

    cookie=0x170a27778279241d, table=86, priority=100 actions=goto_table:90

**table=90**

    cookie=0x170a27778279239d, table=90, priority=1000 actions=output:NXM_NX_REG7[]

 

 

### 场景跨子网访问

##### 拓扑

###### **Node1(xx.xx.10.6) 访问 Node2(xx.xx.11.4)**

 

###### Node1:

    
    IP: xx.xx.10.6
    MAC: fa:16:3e:17:4b:9d
    下一跳:xx.xx.10.1(fa:16:3e:ec:22:0d)
    代表口:port-xxxxxq2py2 (ens4的代表口)
    vtep: xx.xx.40.67

 

###### Node2:

    IP: xx.xx.11.4
    MAC: fa:16:3e:74:20:c6
    下一跳:xx.xx.11.1(fa:16:3e:c4:ed:57)
    代表口:port-2zbgfw4f26
    vtep: xx.xx.40.70

 

###### 流表**Node1(发送)**

**table=0**   流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port

    cookie=0x170a48c6658cc733, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1

**table=1**   ip报文都跳到限速处理

    cookie=0x170a48c6658cc42d, table=1, priority=50 actions=goto_table:5

**table=5**  Egress BPS限速,无限速规则不涉及

    cookie=0x170a48c6658cc435, table=5, priority=100 actions=goto_table:6

**table=6** Egress PPS限速,无限速规则

    cookie=0x170a48c6658cc4d5, table=6, priority=100 actions=goto_table:10

**table=10**  Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip

    cookie=0x170a48c6658cc74d, duration=1362.918s, table=10, n_packets=13516, n_bytes=1469823, idle_age=1, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20

**table=20** 

    cookie=0x170a48c6658cc4b5, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])

**table=25** 

    **首包**
    cookie=0x170a48c6658cc73b, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])
    
    **后续包**
    cookie=0x170a48c6658cc4fb, table=25, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:30

**table=30** 

    cookie=0x170a48c6658cc4a5, table=30, priority=50 actions=goto_table:60

**table=60** 

    cookie=0x170a48c6658cc5fb, table=60, priority=100,metadata=0xd54dd00000001,dl_dst=fa:16:3e:ec:22:0d actions=goto_table:100

**table=100** 三层转发入口,目的ip不是本机,则到pre routing

    cookie=0x170a48c6658cc4e1, table=100, priority=50,ip actions=goto_table:110

**table=110**  路由前,查ttl若为0或1则丢包,否则继续

    cookie=0x170a4b2b174d5e1d, table=110, priority=100 actions=goto_table:120

**table=120** 匹配acl,没有规则,跳过 

    cookie=0x170a4b2b174d5e1f, table=120, priority=50 actions=goto_table:130

**table=130**  查目的网段是xx.xx.0.0/16,则去查精细路由

    cookie=0x170a4b2b174d5f1d, duration=180.589s, table=130, n_packets=4860, n_bytes=733806, idle_age=1, priority=10016,ip,metadata=0xd54dd00000001,nw_dst=xx.xx.0.0/16 actions=goto_table:140

**table=140**   查精细路由,根据reg5筛选大二层,根据目的ip查到具体路由,通过修改大二层metadata, 修改目的ip的mac未目的mac(原先是网关mac),修改源mac为目的网段网关mac,ttl减1, 跳到postrouting

    cookie=0x170a4b2b174d5ed3, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160

**table=160**  postrouting没有动作,跳过

    cookie=0x170a4b2b174d5d6b, table=160, priority=50 actions=resubmit(,170)

**table=170**  查完路由后,重新二层转发,也就是根据目的mac查找出接口

    cookie=0x170a4b2b174d5db3, table=170, priority=50 actions=resubmit(,30)

**table=30**  不访问本地服务,直接查mac表

    cookie=0x170a4c013de71d19, table=30, priority=50 actions=goto_table:60

**table=60**  根据大二层vni和目的mac(目的ip的实际mac),进行隧道封装,注意这里的reg7的赋值,他是出接口

    cookie=0x170a4c013de71e81, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80

**table=80** 

    cookie=0x170a4c013de71d77, table=80, priority=1000,reg7=0x2 actions=output:vxlan1

 

 

###### 流表 **Node2(接收)**

**table=0**

    cookie=0x170d7535048f9acb, priority=1000,in_port=vxlan1 actions=goto_table:50

**table=50** l3 lookup

    cookie=0x170d7535048f9cc3, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:c4:ed:57 actions=set_field:0xd54dd00000002->metadata,set_field:0xd54dd->reg5,goto_table:140

**table=140**   查询目的网关

    ookie=0x170d7535048f9cd9, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160

**table=160** postrouting

    cookie=0x170d7535048f9acf, table=160, priority=50 actions=resubmit(,170)

**table=170** ingress acl 

    cookie=0x170d7535048f9b29, table=170, priority=50 actions=resubmit(,30)

**table=30**

    cookie=0x170d7535048f9afd, table=30, priority=50 actions=goto_table:60

**table=60**

     cookie=0x170d7535048f9e0d, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:0x7->reg7,goto_table:70

**table=70**

    cookie=0x170d7535048f9bd5, table=70, priority=58000,tcp actions=ct(table=75,zone=NXM_NX_REG7[0..15])

**table=75**

    cookie=0x170d7535048f9df5, table=75, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])

**table=80**

    cookie=0x170d7535048f9dfb, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81

**table=81**

    cookie=0x170d7535048f9b91, table=81, priority=100 actions=goto_table:85

**table=85**

    cookie=0x170d7535048f9be1, table=85, priority=100 actions=goto_table:86

**table=86**

    cookie=0x170d7535048f9b49, table=86, priority=100 actions=goto_table:90

**table=90**

    cookie=0x170d7535048f9bef, table=90, priority=1000 actions=output:NXM_NX_REG7[]

0条评论
0 / 1000
y****n
2文章数
1粉丝数
y****n
2 文章 | 1 粉丝
y****n
2文章数
1粉丝数
y****n
2 文章 | 1 粉丝
原创

openflow流表实践与分析

2023-03-29 09:42:21
173
0

# - 多网络访问场景流表设计概述

多网络访问场景控制器是在云平台控制器上配置网络资源的时候,比如创建子网,创建网口,attach网口等等的操作,会被dpu上的云平台控制器响应去配置对应的流表。

     多网络访问场景流表设计类似linux内核网络模块,具备l2,l3层转发,访问本机服务的能力,且具备类似netfilter框架钩子,实现端口&&mac绑定、出入方向限速,安全组(支持带状态),ACL等。**也支持ipv6等协议**

 

## - 流表pipeline结构

多网络访问场景流表设计的pipeline可以大致的分成:

phase1:根据不同源端口分别处理,vxlan类型和本地port类型

phase2:Egress 流控,包含BPS, PPS

phase3:port绑定检查

phase4:Egress 安全组(支持带状态)

phase5:访问本地其他服务,如dhcp, arp欺骗havip,metadata svc等

phase6: 二层转发 到本地或走隧道转发出去

phase7: 三层转发 访问网关,查路由,类似路由器三层转发对包修改

phase8: 二层转发 查完路由后,换了vni和子网

phase9: Ingress 安全组

phase10: Ingress 流控,BPS,PPS

phase11: Output 最终的转发出接口

----

## 关键fileds说明

##### 寄存器说明

1.  reg3: ct*label*
2.  reg5: tun*id  大二层隧道id*
3.  reg6: inport ofport num 
4.  reg7: outport ofport num 
5.  reg8: 未知
6.  reg9: vm id
7.  reg11: ct*mark  可用来实现状态防火墙等*
8.  metadata: vpc+subnet

**其他寄存器暂且没有太看到**

 

> 实际语法中,actions居多会用NXM*NX*REG和OXM*OF*前缀的寄存器字段和fields,他们只是不同厂商对openflow协议的扩展,比如reg5 等同于 NXM*NX*REG5,  in*port 等同于 OXM*OF*IN*PORT等等

 

##### ovs conntrack状态

> ct状态需要说明的是最常用到的几种状态和ovs的实现。ovs通过match域匹配ct*state来匹配状态,通过ct动作里完成 相关的操作。 ct*state是通过bit位来标志的某个状态是否置位

1.  + trk   但凡进入到ct模块,就置位
2.  + new  进入CT后,查不到已有连接,就新建,与+trk一起置位
3.  + est  同一个方向来去方向都有包后,置该位,与-new互斥
4.  + rel 跟其他已存在的会话有关联,比如icmp unreachable,或ftp,iperf的控制会话与数据面传输会话
5.  + rpl  回程包,反向的回程包等
6.  + inv 无效的ct状态

 

##### ovs conntrack action

1.  table  ct动作完成后,最后跳的目的table
2.  commit  对+trk的包匹配ct*state后,完成commit操作; 即将会话由unconfirm表放到到confirm*table
3.  zone  ct的上下文环境,会话在zone之间是完全隔离的
4.  nat 做正向snat, dnat等, 也可做反向nat
5.  exec([action][,action…])  执行对ct会话的一些修改,比如设置ct*mark*
6.  force  强制commit,重建会话

----

## 流表实现分析

###### vpc实现

> 采用vxlan overlay实现大二层,arp洪泛采用本地arp代答,通过metadata和reg5在流表中实现子网,vpc隔离等

 

###### L2层访问

> 大二层访问,对跨节点arp请求采用本地代答的方式(包括网关的mac请求),通过本地寻址的方式,判定出口是走隧道还是送往本节点代表口。目的mac是本节点,则送往对应的代表口。  
> 如果目的mac不是本地,则根据对应的vtep和tunid封装后从vxlan口发出去。

 

###### L3层访问

> 三层访问,在二层查询后发现目的mac是子网网关的mac地址,然后预检查ttl是否该丢弃, 然后看是否有acl,没有则通过查询目的网段,匹配源vni, 然后修改其相关的tunid为新vni,修改源mac成新vni子网的网关mac,目的mac为目的ip的mac,重新跳到L2表寻址。 返程类似

 

###### 流控

> 通过meters实例实现了出入方向BPS/PPS的限速。

 

###### 安全组(支持带状态)

> 根据端口粒度实现不带状态的安全组,也支持利用ct状态实现带状态的

 

###### ACL

> 根据源,目,协议号等方式匹配,然后选择放行或拒绝

 

###### NAT

> 访问本机的一些服务,用到了nat。 ovs的nat是基于ct动作不同参数实现

 

----

## 主要的访问场景

-  同vpc下同子网同宿主机
-  同vpc下同子网跨宿主机
-  同vpc下跨子网同宿主机
-  同vpc下跨子网跨宿主机
-  跨vpc三层访问 

 

# 访问场景实例

### 同subnet跨节点访问

##### 拓扑

###### **Node1(10.23.10.6) 访问 Node2(10.23.10.4)**

###### Node1:

    
    IP: xx.xx.10.6
    MAC: fa:16:3e:17:4b:9d
    代表口:port-xxxxxq2py2
    vtep: 10.24.40.67

 

###### Node2:

    IP: xx.xx.10.4
    MAC: fa:16:3e:46:09:25
    代表口:port-yyyyyv66x7
    vtep: xx.xx.40.70

 

###### 流表**Node1(发送)**

###### arp处理

**table=xx** 

arp均采用代答的方式(**后面不再分析arp**),实现原理:修改sha, spa, tha ,tpa,arp*op等实现*

    cookie=0x170a30c1320ce4af, table=xx, priority=100,arp,metadata=0x47d100000000,arp_tpa=xx.xx.100.7,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],set_field:fa:16:3e:0c:02:73->eth_src,set_field:2->arp_op,set_field:xx.xx.100.7->arp_spa,set_field:fa:16:3e:0c:02:73->arp_sha,IN_PORT

 

###### ip处理

**table=0** 流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port number,reg9是vmid,metadata是tunid*subnetid*

    cookie=0x170a32cec8c3e4c1, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1

 

**table=1** ip报文都跳到限速处理

    cookie=0x170a380bc44c5c23, table=1, priority=50,actions=goto_table:5_

 

**table=5**  Egress BPS限速,无限速规则不涉及

    cookie=0x170a380bc44c5b6b, table=5, priority=100 actions=goto_table:6_

 

**table=6** Egress PPS限速,无限速规则

    cookie=0x170a380bc44c5b7f, table=6, priority=100 actions=goto_table:10_

 

**table=10** Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip

    cookie=0x170a32cec8c3e501, table=10, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20

 

**table=20** Egress Pre-CT; icmp报文进入到ct,zone由源端口ofport number区分

    cookie=0x170a380bc44c5c9f, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])

 

**table=25**  Egress匹配ct 状态,根据port号筛选zone,匹配ct状态:+new+trk;zone和状态正确,则commit确认ct状态

    cookie=0x170a32cec8c3e4ed, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])

 

**table=30** 是否访问本机服务,服务请查看30表全部流表,本次icmp不涉及

    cookie=0x170a3983468be7c9, table=30, priority=50 actions=goto_table:60

 

**table=60**  根据reg5(vni)、目的mac匹配走哪个隧道封装,并设置出接口为vxlan1; 可以通过ovs-ofctl show br-int查到关系

    cookie=0x170a3983468be931, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80

 

**table=80** 根据出接口,存下vmid,

    cookie=0x170a3983468beaf9, table=80, priority=1000,reg7=0x4 actions=set_field:0x64->reg9,goto_table:81

 

**table=81** svc probe, 不涉及,跳过

    cookie=0x170a3983468be853, table=81, priority=100 actions=goto_table:85_

 

**table=85** Ingress BPS, 不涉及

    cookie=0x170a3983468be84b, table=85, priority=100 actions=goto_table:86

 

**table=86** Ingress PPS, 不涉及

    cookie=0x170a3983468be8db, table=86, priority=100 actions=goto_table:90

 

**table=90** 从出接口发出去, 本case是从vxlan口发出去

    cookie=0x170a3983468be837, table=90, priority=1000 actions=output:NXM_NX_REG7[]

 

 

###### 流表**node2(接收)**

**table=0** 跨节点接收,都是从vxlan口收到包

    cookie=0x170a277782792425, priority=1000,in_port=vxlan1 actions=goto_table:50

**table=50** 匹配隧道,目的mac, 设置接收端的pipline里的寄存器,metadata是tunid*subnetid*, reg=vni,

    cookie=0x170a2777827925f9, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd00000001->metadata,set_field:0xd54dd->reg5,resubmit(,30)

**table=30** 是否访问本服务,不涉及

    cookie=0x170a2777827923e7, table=30, priority=50 actions=goto_table:60

**table=60** 匹配tunid和mac,二层转发查询

    cookie=0x170a277782792601, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:0x6->reg7,goto_table:70

**table=70**

    cookie=0x170a277782792393, table=70, priority=58000,icmp actions=ct(table=75,zone=NXM_NX_REG7[0..15])

**table=75** 匹配ct状态,首包匹配+trk+new, 后续包匹配+trk+est

    **首包,匹配+trk+new**
    cookie=0x170a2777827925eb, duration=33902.506s, table=75, n_packets=9144, n_bytes=895680, idle_age=4, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])**table=**
    
    **后续包匹配+trk+est**
    cookie=0x170a2777827923ed, table=75, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:80 
    

**table=80** 匹配出接口,从某个口发出去,也就从ofport=6的接口发出去

    cookie=0x170a27778279262d, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81

**table=81** probe svc,不涉及

    cookie=0x170a2777827924a3, table=81, priority=100 actions=goto_table:85

**table=85**

    cookie=0x170a277782792431, table=85, priority=100 actions=goto_table:86

**table=86**

    cookie=0x170a27778279241d, table=86, priority=100 actions=goto_table:90

**table=90**

    cookie=0x170a27778279239d, table=90, priority=1000 actions=output:NXM_NX_REG7[]

 

 

### 场景跨子网访问

##### 拓扑

###### **Node1(xx.xx.10.6) 访问 Node2(xx.xx.11.4)**

 

###### Node1:

    
    IP: xx.xx.10.6
    MAC: fa:16:3e:17:4b:9d
    下一跳:xx.xx.10.1(fa:16:3e:ec:22:0d)
    代表口:port-xxxxxq2py2 (ens4的代表口)
    vtep: xx.xx.40.67

 

###### Node2:

    IP: xx.xx.11.4
    MAC: fa:16:3e:74:20:c6
    下一跳:xx.xx.11.1(fa:16:3e:c4:ed:57)
    代表口:port-2zbgfw4f26
    vtep: xx.xx.40.70

 

###### 流表**Node1(发送)**

**table=0**   流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port

    cookie=0x170a48c6658cc733, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1

**table=1**   ip报文都跳到限速处理

    cookie=0x170a48c6658cc42d, table=1, priority=50 actions=goto_table:5

**table=5**  Egress BPS限速,无限速规则不涉及

    cookie=0x170a48c6658cc435, table=5, priority=100 actions=goto_table:6

**table=6** Egress PPS限速,无限速规则

    cookie=0x170a48c6658cc4d5, table=6, priority=100 actions=goto_table:10

**table=10**  Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip

    cookie=0x170a48c6658cc74d, duration=1362.918s, table=10, n_packets=13516, n_bytes=1469823, idle_age=1, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20

**table=20** 

    cookie=0x170a48c6658cc4b5, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])

**table=25** 

    **首包**
    cookie=0x170a48c6658cc73b, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])
    
    **后续包**
    cookie=0x170a48c6658cc4fb, table=25, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:30

**table=30** 

    cookie=0x170a48c6658cc4a5, table=30, priority=50 actions=goto_table:60

**table=60** 

    cookie=0x170a48c6658cc5fb, table=60, priority=100,metadata=0xd54dd00000001,dl_dst=fa:16:3e:ec:22:0d actions=goto_table:100

**table=100** 三层转发入口,目的ip不是本机,则到pre routing

    cookie=0x170a48c6658cc4e1, table=100, priority=50,ip actions=goto_table:110

**table=110**  路由前,查ttl若为0或1则丢包,否则继续

    cookie=0x170a4b2b174d5e1d, table=110, priority=100 actions=goto_table:120

**table=120** 匹配acl,没有规则,跳过 

    cookie=0x170a4b2b174d5e1f, table=120, priority=50 actions=goto_table:130

**table=130**  查目的网段是xx.xx.0.0/16,则去查精细路由

    cookie=0x170a4b2b174d5f1d, duration=180.589s, table=130, n_packets=4860, n_bytes=733806, idle_age=1, priority=10016,ip,metadata=0xd54dd00000001,nw_dst=xx.xx.0.0/16 actions=goto_table:140

**table=140**   查精细路由,根据reg5筛选大二层,根据目的ip查到具体路由,通过修改大二层metadata, 修改目的ip的mac未目的mac(原先是网关mac),修改源mac为目的网段网关mac,ttl减1, 跳到postrouting

    cookie=0x170a4b2b174d5ed3, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160

**table=160**  postrouting没有动作,跳过

    cookie=0x170a4b2b174d5d6b, table=160, priority=50 actions=resubmit(,170)

**table=170**  查完路由后,重新二层转发,也就是根据目的mac查找出接口

    cookie=0x170a4b2b174d5db3, table=170, priority=50 actions=resubmit(,30)

**table=30**  不访问本地服务,直接查mac表

    cookie=0x170a4c013de71d19, table=30, priority=50 actions=goto_table:60

**table=60**  根据大二层vni和目的mac(目的ip的实际mac),进行隧道封装,注意这里的reg7的赋值,他是出接口

    cookie=0x170a4c013de71e81, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80

**table=80** 

    cookie=0x170a4c013de71d77, table=80, priority=1000,reg7=0x2 actions=output:vxlan1

 

 

###### 流表 **Node2(接收)**

**table=0**

    cookie=0x170d7535048f9acb, priority=1000,in_port=vxlan1 actions=goto_table:50

**table=50** l3 lookup

    cookie=0x170d7535048f9cc3, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:c4:ed:57 actions=set_field:0xd54dd00000002->metadata,set_field:0xd54dd->reg5,goto_table:140

**table=140**   查询目的网关

    ookie=0x170d7535048f9cd9, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160

**table=160** postrouting

    cookie=0x170d7535048f9acf, table=160, priority=50 actions=resubmit(,170)

**table=170** ingress acl 

    cookie=0x170d7535048f9b29, table=170, priority=50 actions=resubmit(,30)

**table=30**

    cookie=0x170d7535048f9afd, table=30, priority=50 actions=goto_table:60

**table=60**

     cookie=0x170d7535048f9e0d, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:0x7->reg7,goto_table:70

**table=70**

    cookie=0x170d7535048f9bd5, table=70, priority=58000,tcp actions=ct(table=75,zone=NXM_NX_REG7[0..15])

**table=75**

    cookie=0x170d7535048f9df5, table=75, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])

**table=80**

    cookie=0x170d7535048f9dfb, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81

**table=81**

    cookie=0x170d7535048f9b91, table=81, priority=100 actions=goto_table:85

**table=85**

    cookie=0x170d7535048f9be1, table=85, priority=100 actions=goto_table:86

**table=86**

    cookie=0x170d7535048f9b49, table=86, priority=100 actions=goto_table:90

**table=90**

    cookie=0x170d7535048f9bef, table=90, priority=1000 actions=output:NXM_NX_REG7[]

 

### 场景 访问本地服务

**访问** 访问metadata接口

    cookie=0x170b616612e8b709, table=10, priority=2000,tcp,nw_dst=169.254.169.254,tp_dst=8000 actions=goto_table:30
    cookie=0x170b616612e8b781, table=45, priority=100,tcp,nw_dst=169.254.169.254,tp_dst=8000 actions=set_field:fa:16:3e:25:fd:7e->eth_dst,set_field:8111->tcp_dst,goto_table:46
    cookie=0x170b616612e8b759, table=46, priority=2000,tcp,nw_dst=169.254.169.254,tp_dst=8111 actions=move:NXM_NX_REG6[]->NXM_OF_IP_SRC[],set_field:128.0.0.0/16->ip_src,output:1
    

**返程**

    cookie=0x170b616612e8b859, priority=100,in_port=1 actions=goto_table:47
    cookie=0x170b616612e8bd73, table=47, priority=100,tcp,nw_dst=128.0.0.2,tp_src=8111 actions=set_field:169.254.169.254->ip_src,set_field:xx.xx.100.5->ip_dst,set_field:8000->tcp_src,output:"port-17icfhnrgo"

 

### 场景 安全组 基于port不带状态安全组

**table=70**

    cookie=0x170b616612e8bdc1, table=70, priority=39800,ip,reg7=0x4 actions=set_field:0x46->reg8,goto_table:200
    >  cookie=0x170b616612e8bdc5, table=70, priority=39800,ipv6,reg7=0x4 actions=set_field:0x46->reg8,goto_table:200

 

### 场景 基于port带状态安全组

    cookie=0x170b616612e8becb, table=75, priority=39800,ct_state=+new-est-rel-inv+trk,ip,reg7=0xd actions=set_field:0x4b->reg8,goto_table:200
    >  cookie=0x170b616612e8bf11, table=75, priority=39800,ct_state=+new-est-rel-inv+trk,ipv6,reg7=0xd actions=set_field:0x4b->reg8,goto_table:200

 

### 场景 nat

    cookie=0x170b616612e8bf69, table=44, priority=2000,ct_state=+new-est-rel-inv+trk,tcp,reg6=0x9,tp_dst=20048 actions=encap(tcp_option(tlv(254,0x0a156b03000047d1))),ct(commit,table=80,nat(src=xx.xx.9.207,random))
    cookie=0x170b616612e8c0f1, table=44, priority=2000,ct_state=+new-est-rel-inv+trk,tcp6,reg6=0xa,tp_dst=20048 actions=encap(tcp_option(tlv(254,0x010000000007000000123d2100080041))),ct(commit,table=80,nat(src=240e:108:4:200:1:2:0:70f,random))

 

### 场景 流控

    table=85, priority=1000,reg9=0x64 actions=meter:101,goto_table:86
    table=86, priority=1000,reg9=0x64 actions=meter:102,goto_table:90

 

### 场景 ACL

    table=170,priority=55533,icmp,metadata=0x2076370000000a,nw_src=10.2.2.11,nw_dst=10.2.1.11 actions=resubmit(,30)  
    table=170, priority=24535,icmp6,metadata=0x1cc23500000000 actions=set_field:0xaa->reg8,goto_table:200
    

# 访问场景实例

### 同subnet跨节点访问

##### 拓扑

###### **Node1(xx.xx.10.6) 访问 Node2(xx.xx.10.4)**

###### Node1:

    
    IP: xx.xx.10.6
    MAC: fa:16:3e:17:4b:9d
    代表口:port-xxxxxq2py2
    vtep: xx.xx.40.67

 

###### Node2:

    IP: xx.xx.10.4
    MAC: fa:16:3e:46:09:25
    代表口:port-yyyyyv66x7
    vtep: xx.xx.40.70

 

###### 流表**Node1(发送)**

###### arp处理

**table=35** 

arp均采用代答的方式(**后面不再分析arp**),实现原理:修改sha, spa, tha ,tpa,arp*op等实现*

    cookie=0x170a30c1320ce4af, table=35, priority=100,arp,metadata=0x47d100000000,arp_tpa=xx.xx.100.7,arp_op=1 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],set_field:fa:16:3e:0c:02:73->eth_src,set_field:2->arp_op,set_field:xx.xx.100.7->arp_spa,set_field:fa:16:3e:0c:02:73->arp_sha,IN_PORT

 

###### ip处理

**table=0** 流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port number,reg9是vmid,metadata是tunid*subnetid*

    cookie=0x170a32cec8c3e4c1, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1

 

**table=1** ip报文都跳到限速处理

    cookie=0x170a380bc44c5c23, table=1, priority=50,actions=goto_table:5_

 

**table=5**  Egress BPS限速,无限速规则不涉及

    cookie=0x170a380bc44c5b6b, table=5, priority=100 actions=goto_table:6_

 

**table=6** Egress PPS限速,无限速规则

    cookie=0x170a380bc44c5b7f, table=6, priority=100 actions=goto_table:10_

 

**table=10** Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip

    cookie=0x170a32cec8c3e501, table=10, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20

 

**table=20** Egress Pre-CT; icmp报文进入到ct,zone由源端口ofport number区分

    cookie=0x170a380bc44c5c9f, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])

 

**table=25**  Egress匹配ct 状态,根据port号筛选zone,匹配ct状态:+new+trk;zone和状态正确,则commit确认ct状态

    cookie=0x170a32cec8c3e4ed, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])

 

**table=30** 是否访问本机服务,服务请查看30表全部流表,本次icmp不涉及

    cookie=0x170a3983468be7c9, table=30, priority=50 actions=goto_table:60

 

**table=60**  根据reg5(vni)、目的mac匹配走哪个隧道封装,并设置出接口为vxlan1; 可以通过ovs-ofctl show br-int查到关系

    cookie=0x170a3983468be931, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80

 

**table=80** 根据出接口,存下vmid,

    cookie=0x170a3983468beaf9, table=80, priority=1000,reg7=0x4 actions=set_field:0x64->reg9,goto_table:81

 

**table=81** svc probe, 不涉及,跳过

    cookie=0x170a3983468be853, table=81, priority=100 actions=goto_table:85_

 

**table=85** Ingress BPS, 不涉及

    cookie=0x170a3983468be84b, table=85, priority=100 actions=goto_table:86

 

**table=86** Ingress PPS, 不涉及

    cookie=0x170a3983468be8db, table=86, priority=100 actions=goto_table:90

 

**table=90** 从出接口发出去, 本case是从vxlan口发出去

    cookie=0x170a3983468be837, table=90, priority=1000 actions=output:NXM_NX_REG7[]

 

 

###### 流表**node2(接收)**

**table=0** 跨节点接收,都是从vxlan口收到包

    cookie=0x170a277782792425, priority=1000,in_port=vxlan1 actions=goto_table:50

**table=50** 匹配隧道,目的mac, 设置接收端的pipline里的寄存器,metadata是tunid*subnetid*, reg=vni,

    cookie=0x170a2777827925f9, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd00000001->metadata,set_field:0xd54dd->reg5,resubmit(,30)

**table=30** 是否访问本服务,不涉及

    cookie=0x170a2777827923e7, table=30, priority=50 actions=goto_table:60

**table=60** 匹配tunid和mac,二层转发查询

    cookie=0x170a277782792601, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:46:09:25 actions=set_field:0xd54dd->tun_id,set_field:0x6->reg7,goto_table:70

**table=70**

    cookie=0x170a277782792393, table=70, priority=58000,icmp actions=ct(table=75,zone=NXM_NX_REG7[0..15])

**table=75** 匹配ct状态,首包匹配+trk+new, 后续包匹配+trk+est

    **首包,匹配+trk+new**
    cookie=0x170a2777827925eb, duration=33902.506s, table=75, n_packets=9144, n_bytes=895680, idle_age=4, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])**table=**
    
    **后续包匹配+trk+est**
    cookie=0x170a2777827923ed, table=75, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:80 
    

**table=80** 匹配出接口,从某个口发出去,也就从ofport=6的接口发出去

    cookie=0x170a27778279262d, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81

**table=81** probe svc,不涉及

    cookie=0x170a2777827924a3, table=81, priority=100 actions=goto_table:85

**table=85**

    cookie=0x170a277782792431, table=85, priority=100 actions=goto_table:86

**table=86**

    cookie=0x170a27778279241d, table=86, priority=100 actions=goto_table:90

**table=90**

    cookie=0x170a27778279239d, table=90, priority=1000 actions=output:NXM_NX_REG7[]

 

 

### 场景跨子网访问

##### 拓扑

###### **Node1(xx.xx.10.6) 访问 Node2(xx.xx.11.4)**

 

###### Node1:

    
    IP: xx.xx.10.6
    MAC: fa:16:3e:17:4b:9d
    下一跳:xx.xx.10.1(fa:16:3e:ec:22:0d)
    代表口:port-xxxxxq2py2 (ens4的代表口)
    vtep: xx.xx.40.67

 

###### Node2:

    IP: xx.xx.11.4
    MAC: fa:16:3e:74:20:c6
    下一跳:xx.xx.11.1(fa:16:3e:c4:ed:57)
    代表口:port-2zbgfw4f26
    vtep: xx.xx.40.70

 

###### 流表**Node1(发送)**

**table=0**   流量入口,根据 入接口分流,设置相关寄存器;reg5是vni,reg6是入接口port

    cookie=0x170a48c6658cc733, priority=100,in_port="port-xxxxxq2py2" actions=set_field:0xd54dd->reg5,set_field:0x4->reg6,set_field:0x64->reg9,write_metadata:0xd54dd00000001,goto_table:1

**table=1**   ip报文都跳到限速处理

    cookie=0x170a48c6658cc42d, table=1, priority=50 actions=goto_table:5

**table=5**  Egress BPS限速,无限速规则不涉及

    cookie=0x170a48c6658cc435, table=5, priority=100 actions=goto_table:6

**table=6** Egress PPS限速,无限速规则

    cookie=0x170a48c6658cc4d5, table=6, priority=100 actions=goto_table:10

**table=10**  Bind port and mac;reg6是port的ofport number,mac是host侧ip源ip

    cookie=0x170a48c6658cc74d, duration=1362.918s, table=10, n_packets=13516, n_bytes=1469823, idle_age=1, priority=1000,ip,reg6=0x4,dl_src=fa:16:3e:17:4b:9d actions=goto_table:20

**table=20** 

    cookie=0x170a48c6658cc4b5, table=20, priority=58000,icmp actions=ct(table=25,zone=NXM_NX_REG6[0..15])

**table=25** 

    **首包**
    cookie=0x170a48c6658cc73b, table=25, priority=39799,ct_state=+new-est-rel-inv+trk,ip,reg6=0x4 actions=ct(commit,table=30,zone=NXM_NX_REG6[0..15])
    
    **后续包**
    cookie=0x170a48c6658cc4fb, table=25, priority=60000,ct_state=-new+est-rel-inv+trk actions=goto_table:30

**table=30** 

    cookie=0x170a48c6658cc4a5, table=30, priority=50 actions=goto_table:60

**table=60** 

    cookie=0x170a48c6658cc5fb, table=60, priority=100,metadata=0xd54dd00000001,dl_dst=fa:16:3e:ec:22:0d actions=goto_table:100

**table=100** 三层转发入口,目的ip不是本机,则到pre routing

    cookie=0x170a48c6658cc4e1, table=100, priority=50,ip actions=goto_table:110

**table=110**  路由前,查ttl若为0或1则丢包,否则继续

    cookie=0x170a4b2b174d5e1d, table=110, priority=100 actions=goto_table:120

**table=120** 匹配acl,没有规则,跳过 

    cookie=0x170a4b2b174d5e1f, table=120, priority=50 actions=goto_table:130

**table=130**  查目的网段是xx.xx.0.0/16,则去查精细路由

    cookie=0x170a4b2b174d5f1d, duration=180.589s, table=130, n_packets=4860, n_bytes=733806, idle_age=1, priority=10016,ip,metadata=0xd54dd00000001,nw_dst=xx.xx.0.0/16 actions=goto_table:140

**table=140**   查精细路由,根据reg5筛选大二层,根据目的ip查到具体路由,通过修改大二层metadata, 修改目的ip的mac未目的mac(原先是网关mac),修改源mac为目的网段网关mac,ttl减1, 跳到postrouting

    cookie=0x170a4b2b174d5ed3, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160

**table=160**  postrouting没有动作,跳过

    cookie=0x170a4b2b174d5d6b, table=160, priority=50 actions=resubmit(,170)

**table=170**  查完路由后,重新二层转发,也就是根据目的mac查找出接口

    cookie=0x170a4b2b174d5db3, table=170, priority=50 actions=resubmit(,30)

**table=30**  不访问本地服务,直接查mac表

    cookie=0x170a4c013de71d19, table=30, priority=50 actions=goto_table:60

**table=60**  根据大二层vni和目的mac(目的ip的实际mac),进行隧道封装,注意这里的reg7的赋值,他是出接口

    cookie=0x170a4c013de71e81, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:xx.xx.40.70->tun_dst,set_field:0x2->reg7,goto_table:80

**table=80** 

    cookie=0x170a4c013de71d77, table=80, priority=1000,reg7=0x2 actions=output:vxlan1

 

 

###### 流表 **Node2(接收)**

**table=0**

    cookie=0x170d7535048f9acb, priority=1000,in_port=vxlan1 actions=goto_table:50

**table=50** l3 lookup

    cookie=0x170d7535048f9cc3, table=50, priority=50,tun_id=0xd54dd,dl_dst=fa:16:3e:c4:ed:57 actions=set_field:0xd54dd00000002->metadata,set_field:0xd54dd->reg5,goto_table:140

**table=140**   查询目的网关

    ookie=0x170d7535048f9cd9, table=140, priority=100,ip,reg5=0xd54dd,nw_dst=xx.xx.11.4 actions=set_field:0xd54dd00000002->metadata,set_field:fa:16:3e:74:20:c6->eth_dst,set_field:fa:16:3e:c4:ed:57->eth_src,dec_ttl,goto_table:160

**table=160** postrouting

    cookie=0x170d7535048f9acf, table=160, priority=50 actions=resubmit(,170)

**table=170** ingress acl 

    cookie=0x170d7535048f9b29, table=170, priority=50 actions=resubmit(,30)

**table=30**

    cookie=0x170d7535048f9afd, table=30, priority=50 actions=goto_table:60

**table=60**

     cookie=0x170d7535048f9e0d, table=60, priority=100,reg5=0xd54dd,dl_dst=fa:16:3e:74:20:c6 actions=set_field:0xd54dd->tun_id,set_field:0x7->reg7,goto_table:70

**table=70**

    cookie=0x170d7535048f9bd5, table=70, priority=58000,tcp actions=ct(table=75,zone=NXM_NX_REG7[0..15])

**table=75**

    cookie=0x170d7535048f9df5, table=75, priority=39819,ct_state=+new-est-rel-inv+trk,ip,reg7=0x6 actions=ct(commit,table=80,zone=NXM_NX_REG7[0..15])

**table=80**

    cookie=0x170d7535048f9dfb, table=80, priority=1000,reg7=0x6 actions=set_field:0x64->reg9,goto_table:81

**table=81**

    cookie=0x170d7535048f9b91, table=81, priority=100 actions=goto_table:85

**table=85**

    cookie=0x170d7535048f9be1, table=85, priority=100 actions=goto_table:86

**table=86**

    cookie=0x170d7535048f9b49, table=86, priority=100 actions=goto_table:90

**table=90**

    cookie=0x170d7535048f9bef, table=90, priority=1000 actions=output:NXM_NX_REG7[]

文章来自个人专栏
ovs
2 文章 | 1 订阅
0条评论
0 / 1000
请输入你的评论
0
0