0. 背景
openresty中存在2套正则API,即ngx.re与 lua语言的string库,都可以实现正则匹配查找等功能,那么,这2个API有什么区别,又如何选择呢?
1. 性能测试
1.1 简单loop测试
a) 短字符串&正则串
local http_range = 'bytes=10-65535'
local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
local ngx_re_p = '^bytes=([\\d]*)?\\-([\\d]*)$'
local loop = 1000000
local t0 = get_t()
for i = 1, loop do
local _, _ = string_match(http_range, string_re_p)
end
local t1 = get_t()
for i = 1, loop do
local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
end
local t2 = get_t()
Result: 0.247 vs. 0.32
b) 长字符串&复杂正则串
local http_range = 'dsfds65465fwef bytes=12345757860-4465458586465 ewfsd65sd4fg65fsd'
local string_re_p = '.*bytes=([%d]*)%-([%d]*) .+'
local ngx_re_p = '.*bytes=([\\d]*)?\\-([\\d]*) .+'
local loop = 1000000
Result: 1.16 vs. 0.526
由测试结果可以看出,对于字符串/正则规则越复杂,ngx-re的性能是有优势的
1.2. 加入jit扰动
a) 对照组:ipairs不破坏jit (短串正则)
local http_range = 'bytes=10-65535'
local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
local ngx_re_p = '^bytes=([\\d]*)?\\-([\\d]*)$'
local loop = 1000000
local t0 = get_t()
for i = 1, loop do
for k, v in ipairs({1,2}) do end
local _, _ = string_match(http_range, string_re_p)
end
local t1 = get_t()
for i = 1, loop do
for k, v in ipairs({1,2}) do end
local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
end
local t2 = get_t()
jit-on: 0.369 - 0.326
jit-off: 0.38 - 3.265
b) pairs 破坏jit (短串正则)
local http_range = 'bytes=10-65535'
local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
local ngx_re_p = '^bytes=([\\d]*)?\\-([\\d]*)$'
local loop = 1000000
local t0 = get_t()
for i = 1, loop do
for k, v in pairs({a=1,b=2}) do end
local _, _ = string_match(http_range, string_re_p)
end
local t1 = get_t()
for i = 1, loop do
for k, v in pairs({a=1,b=2}) do end
local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
end
local t2 = get_t()
jit-off: 0.395 - 3.216
jit-on: 0.394 - 1.04
c) pairs + 长复杂串
local http_range = 'dsfds65465fwef bytes=12345757860-4465458586465 ewfsd65sd4fg65fsd'
local string_re_p = '.*bytes=([%d]*)%-([%d]*) .+'
local ngx_re_p = '.*bytes=([\\d]*)?\\-([\\d]*) .+'
local loop = 1000000
jit-on: 1.31 - 1.30
jit-off: 1.307 - 2.94
超长串 + jit-on:
local http_range = 'dsfds6546vsdvsdfdsfsdfsdfwaasdasdasdas5fwef bytes=12354345345345757860-4465453453453453453453453458586465 ewfsd65safdknsalk;nlkasdnflksdajfhkldashjnfkl;ashfgjklahfg;jlsasd4fg65fsd'
结果: 2.775 - 1.739
1.3测试结果汇总
string.match | ngx.re.match | 备注 | |
---|---|---|---|
短串正则 | 0.247 秒 | 0.32 秒 | jit-hit |
短串正则 带ipirs | 0.369 | 0.326 | jit-hit |
短串正则 带pairs | 0.394 | 1.04 | |
长串正则 带pairs | 2.775 | 1.739 | |
短串正则 带pairs+jit-off | 0.395 | 3.216 | jit-off |
短串正则 带ipairs+jit-off | 0.38 | 3.265 | jit-off |
2. 结论
由测试结果可知:
1)在一般情况下,nginx-re正则库更能适应复杂字符串与复杂正则规则的情况,一般情况下比较推荐使用
2)在极简单字符串的情况下,二者差别不大,string正则稍带优势,可以按照方便的写法来写;
3)nginx-re正则受JIT的影响更大,在关闭jit或使用pairs等情况下,可能会有拖累;