和MySQL使用 LIMIT 关键字返回只有一页的结果一样,Elasticsearch接受 from 和 size 参数:
size: 结果数,默认10
from: 跳过开始的结果数,默认0
如果想每页显示5个结果,页码从1到3,那请求如下:
GET /_search?size=5
GET /_search?size=5&from=5
GET /_search?size=5&from=10
注意:应该多注意分页太深或者一次请求太多的结果。结果在返回前会被排序。但是记住一个搜索请求常常涉及多个分片。每个分片生成自己排好序的结果,它们接着需要集中起来排序以确保整体排序正确。
当前数据库中的数据如下:
GET http://127.0.0.1:9200/test/user/_search?size=1
#响应结果
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "test",
"_type": "user",
"_id": "QHi1UoIBpyNh4YQ4T1Sq",
"_score": 1.0,
"_source": {
"id": 1001,
"name": "张三",
"age": 20,
"sex": "男"
}
}
]
}
}
GET http://127.0.0.1:9200/test/user/_search?size=2&from=1
# 响应数据
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "test",
"_type": "user",
"_id": "1002",
"_score": 1.0,
"_source": {
"id": 1002,
"name": "李四",
"age": 23,
"sex": "女"
}
},
{
"_index": "test",
"_type": "user",
"_id": "1003",
"_score": 1.0,
"_source": {
"id": 1003,
"name": "王五",
"age": 27,
"sex": "男"
}
}
]
}
}
# 当数据量不够时
GET http://127.0.0.1:9200/test/user/_search?size=6&from=1
#响应数据
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "test",
"_type": "user",
"_id": "1002",
"_score": 1.0,
"_source": {
"id": 1002,
"name": "李四",
"age": 23,
"sex": "女"
}
},
{
"_index": "test",
"_type": "user",
"_id": "1003",
"_score": 1.0,
"_source": {
"id": 1003,
"name": "王五",
"age": 27,
"sex": "男"
}
},
{
"_index": "test",
"_type": "user",
"_id": "1004",
"_score": 1.0,
"_source": {
"id": 1004,
"name": "赵六",
"age": 29,
"sex": "女"
}
}
]
}
}
在集群系统中深度分页
为了理解为什么深度分页是有问题的,让我们假设在一个有5个主分片的索引中搜索。当请求结果的第一 页(结果1到10)时,每个分片产生自己最顶端10个结果然后返回它们给请求节点(requesting node),它再排序这所有的50个结果以选出顶端的10个结果。
现在假设我们请求第1000页——结果10001到10010。工作方式都相同,不同的是每个分片都必须产生顶端的 10010个结果。然后请求节点排序这50050个结果并丢弃50040个!
可以看到在分布式系统中,排序结果的花费随着分页的深入而成倍增长。这也是为什么网络搜索引擎中任何语句不能返回多于1000个结果的原因。如微博前端页面返回的页数只有50页,用工具是可以拿到50页以后的数据的。