searchusermenu
  • 发布文章
  • 消息中心
点赞
收藏
评论
分享
原创

探究Openresty中ngx.re与Lua string.re两种正则的选择

2024-11-13 09:32:06
1
0

0. 背景

openresty中存在2套正则API,即ngx.re与 lua语言的string库,都可以实现正则匹配查找等功能,那么,这2个API有什么区别,又如何选择呢?

1. 性能测试

1.1 简单loop测试

a) 短字符串&正则串

local http_range = 'bytes=10-65535'
local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
local ngx_re_p    = '^bytes=([\\d]*)?\\-([\\d]*)$'
local loop = 1000000

local t0 =  get_t()
for i = 1, loop do
    local _, _ = string_match(http_range, string_re_p)
end

local t1 =  get_t()
for i = 1, loop do
    local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
end
local t2 =  get_t()

Result: 0.247 vs. 0.32

b) 长字符串&复杂正则串

local http_range = 'dsfds65465fwef bytes=12345757860-4465458586465 ewfsd65sd4fg65fsd'
local string_re_p = '.*bytes=([%d]*)%-([%d]*) .+'
local ngx_re_p    = '.*bytes=([\\d]*)?\\-([\\d]*) .+'
local loop = 1000000

Result: 1.16 vs. 0.526

由测试结果可以看出,对于字符串/正则规则越复杂,ngx-re的性能是有优势的

1.2. 加入jit扰动

a) 对照组:ipairs不破坏jit (短串正则)

local http_range = 'bytes=10-65535'
local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
local ngx_re_p    = '^bytes=([\\d]*)?\\-([\\d]*)$'
local loop = 1000000

local t0 =  get_t()
for i = 1, loop do
    for k, v in ipairs({1,2}) do end
    local _, _ = string_match(http_range, string_re_p)
end

local t1 =  get_t()
for i = 1, loop do
    for k, v in ipairs({1,2}) do end
    local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
end
local t2 =  get_t()

jit-on: 0.369 - 0.326
jit-off: 0.38 - 3.265

b) pairs 破坏jit (短串正则)

local http_range = 'bytes=10-65535'
local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
local ngx_re_p    = '^bytes=([\\d]*)?\\-([\\d]*)$'
local loop = 1000000

local t0 =  get_t()
for i = 1, loop do
    for k, v in pairs({a=1,b=2}) do end
    local _, _ = string_match(http_range, string_re_p)
end

local t1 =  get_t()
for i = 1, loop do
    for k, v in pairs({a=1,b=2}) do end
    local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
end
local t2 =  get_t()

jit-off: 0.395 - 3.216
jit-on: 0.394 - 1.04

c) pairs + 长复杂串

local http_range = 'dsfds65465fwef bytes=12345757860-4465458586465 ewfsd65sd4fg65fsd'
local string_re_p = '.*bytes=([%d]*)%-([%d]*) .+'
local ngx_re_p    = '.*bytes=([\\d]*)?\\-([\\d]*) .+'
local loop = 1000000

jit-on: 1.31 - 1.30
jit-off: 1.307 - 2.94

超长串 + jit-on:

local http_range = 'dsfds6546vsdvsdfdsfsdfsdfwaasdasdasdas5fwef bytes=12354345345345757860-4465453453453453453453453458586465 ewfsd65safdknsalk;nlkasdnflksdajfhkldashjnfkl;ashfgjklahfg;jlsasd4fg65fsd'

结果: 2.775 - 1.739

1.3测试结果汇总

string.match ngx.re.match 备注
短串正则 0.247 秒 0.32 秒 jit-hit
短串正则 带ipirs 0.369 0.326 jit-hit
短串正则 带pairs 0.394 1.04
长串正则 带pairs 2.775 1.739
短串正则 带pairs+jit-off 0.395 3.216 jit-off
短串正则 带ipairs+jit-off 0.38 3.265 jit-off

2. 结论

由测试结果可知:
1)在一般情况下,nginx-re正则库更能适应复杂字符串与复杂正则规则的情况,一般情况下比较推荐使用
2)在极简单字符串的情况下,二者差别不大,string正则稍带优势,可以按照方便的写法来写;
3)nginx-re正则受JIT的影响更大,在关闭jit或使用pairs等情况下,可能会有拖累;

0条评论
0 / 1000
王****淋
5文章数
0粉丝数
王****淋
5 文章 | 0 粉丝
原创

探究Openresty中ngx.re与Lua string.re两种正则的选择

2024-11-13 09:32:06
1
0

0. 背景

openresty中存在2套正则API,即ngx.re与 lua语言的string库,都可以实现正则匹配查找等功能,那么,这2个API有什么区别,又如何选择呢?

1. 性能测试

1.1 简单loop测试

a) 短字符串&正则串

local http_range = 'bytes=10-65535'
local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
local ngx_re_p    = '^bytes=([\\d]*)?\\-([\\d]*)$'
local loop = 1000000

local t0 =  get_t()
for i = 1, loop do
    local _, _ = string_match(http_range, string_re_p)
end

local t1 =  get_t()
for i = 1, loop do
    local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
end
local t2 =  get_t()

Result: 0.247 vs. 0.32

b) 长字符串&复杂正则串

local http_range = 'dsfds65465fwef bytes=12345757860-4465458586465 ewfsd65sd4fg65fsd'
local string_re_p = '.*bytes=([%d]*)%-([%d]*) .+'
local ngx_re_p    = '.*bytes=([\\d]*)?\\-([\\d]*) .+'
local loop = 1000000

Result: 1.16 vs. 0.526

由测试结果可以看出,对于字符串/正则规则越复杂,ngx-re的性能是有优势的

1.2. 加入jit扰动

a) 对照组:ipairs不破坏jit (短串正则)

local http_range = 'bytes=10-65535'
local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
local ngx_re_p    = '^bytes=([\\d]*)?\\-([\\d]*)$'
local loop = 1000000

local t0 =  get_t()
for i = 1, loop do
    for k, v in ipairs({1,2}) do end
    local _, _ = string_match(http_range, string_re_p)
end

local t1 =  get_t()
for i = 1, loop do
    for k, v in ipairs({1,2}) do end
    local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
end
local t2 =  get_t()

jit-on: 0.369 - 0.326
jit-off: 0.38 - 3.265

b) pairs 破坏jit (短串正则)

local http_range = 'bytes=10-65535'
local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
local ngx_re_p    = '^bytes=([\\d]*)?\\-([\\d]*)$'
local loop = 1000000

local t0 =  get_t()
for i = 1, loop do
    for k, v in pairs({a=1,b=2}) do end
    local _, _ = string_match(http_range, string_re_p)
end

local t1 =  get_t()
for i = 1, loop do
    for k, v in pairs({a=1,b=2}) do end
    local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
end
local t2 =  get_t()

jit-off: 0.395 - 3.216
jit-on: 0.394 - 1.04

c) pairs + 长复杂串

local http_range = 'dsfds65465fwef bytes=12345757860-4465458586465 ewfsd65sd4fg65fsd'
local string_re_p = '.*bytes=([%d]*)%-([%d]*) .+'
local ngx_re_p    = '.*bytes=([\\d]*)?\\-([\\d]*) .+'
local loop = 1000000

jit-on: 1.31 - 1.30
jit-off: 1.307 - 2.94

超长串 + jit-on:

local http_range = 'dsfds6546vsdvsdfdsfsdfsdfwaasdasdasdas5fwef bytes=12354345345345757860-4465453453453453453453453458586465 ewfsd65safdknsalk;nlkasdnflksdajfhkldashjnfkl;ashfgjklahfg;jlsasd4fg65fsd'

结果: 2.775 - 1.739

1.3测试结果汇总

string.match ngx.re.match 备注
短串正则 0.247 秒 0.32 秒 jit-hit
短串正则 带ipirs 0.369 0.326 jit-hit
短串正则 带pairs 0.394 1.04
长串正则 带pairs 2.775 1.739
短串正则 带pairs+jit-off 0.395 3.216 jit-off
短串正则 带ipairs+jit-off 0.38 3.265 jit-off

2. 结论

由测试结果可知:
1)在一般情况下,nginx-re正则库更能适应复杂字符串与复杂正则规则的情况,一般情况下比较推荐使用
2)在极简单字符串的情况下,二者差别不大,string正则稍带优势,可以按照方便的写法来写;
3)nginx-re正则受JIT的影响更大,在关闭jit或使用pairs等情况下,可能会有拖累;

文章来自个人专栏
Kingforder
5 文章 | 1 订阅
0条评论
0 / 1000
请输入你的评论
0
0