ElasticSearch单字段去重详见博文:ElasticSearch单字段查询去重详解_IT之一小佬的博客-CSDN博客
ElasticSearch多字段去重详见博文:ElasticSearch多字段查询去重过滤详解_IT之一小佬的博客-CSDN博客
本博文将详细介绍使用elasticsearch_dsl进行多字段进行去重。本文示例数据详见上文单字段博文数据。
1、对条件进行查询
示例代码:
from elasticsearch_dsl import connections, Search, A, Q
# 连接es
es = connections.create_connection(hosts=['192.168.124.49:9200'], timeout=20)
print(es)
s = Search(using=es, index='person_info')
q = Q('match', provience='北京')
res = s.query(q)
for data in res:
print(data.to_dict())
print("共查到%d条数据" % res.count())
运行结果:
2、使用script_fields脚本多字段去重
示例代码:
from elasticsearch_dsl import connections, Search, Q
# 连接es
es = connections.create_connection(hosts=['192.168.124.49:9200'], timeout=20)
print(es)
s = Search(using=es, index='person_info')
q = Q('match', provience='北京')
# res = s.query(q).script_fields(age_gender_aggs={'script': {'lang': 'painless', 'source': "doc['age'].value + doc['gender'].value"}})
res = s.query(q).script_fields(age_gender_aggs={'script': {'lang': 'painless', 'source': "'age:' + doc['age'].value + ',gender:' + doc['gender'].value"}})
count = 0
for data in res:
print(data.to_dict(), type(data.to_dict()))
count += 1
print("共查到%d条数据" % count)
运行结果:
3、使用script_fields脚本多字段去重并显示需要的字段
示例代码:
from elasticsearch_dsl import connections, Search, Q
# 连接es
es = connections.create_connection(hosts=['192.168.124.49:9200'], timeout=20)
print(es)
s = Search(using=es, index='person_info')
q = Q('match', provience='北京')
res = s.query(q)\
.script_fields(age_gender_aggs={'script': {'lang': 'painless', 'source': "'age:' + doc['age'].value + ',gender:' + doc['gender'].value"}})\
.source(['name', 'age', 'gender', 'address'])
count = 0
for data in res:
print(data.to_dict(), type(data.to_dict()))
count += 1
print("共查到%d条数据" % count)
运行结果:
4、使用script_fields脚本多字段去重并显示所有字段
示例代码:
from elasticsearch_dsl import connections, Search, Q
# 连接es
es = connections.create_connection(hosts=['192.168.124.49:9200'], timeout=20)
print(es)
s = Search(using=es, index='person_info')
q = Q('match', provience='北京')
res = s.query(q)\
.script_fields(age_gender_aggs={'script': {'lang': 'painless', 'source': "'age:' + doc['age'].value + ',gender:' + doc['gender'].value"}})\
.source([])\
.execute() # 这一行可写可不写
count = 0
for data in res:
print(data.to_dict(), type(data.to_dict()))
count += 1
print("共查到%d条数据" % count)
运行结果:
5、使用script_fields脚本多字段去重统计数量
示例代码:
from elasticsearch_dsl import connections, Search, Q
# 连接es
es = connections.create_connection(hosts=['192.168.124.49:9200'], timeout=20)
print(es)
s = Search(using=es, index='person_info')
q = Q('match', provience='北京')
res = s.query(q).script_fields(age_gender_aggs={'script': {'lang': 'painless', 'source': "doc['age'].value + doc['gender'].value"}})
lst = []
for data in res:
print(data.to_dict(), type(data.to_dict()))
lst.append(str(data.to_dict()))
print(set(lst))
print("共查到%d条数据" % len(set(lst)))
运行结果:
6、使用聚合中script脚本多字段去重统计数量
示例代码:
from elasticsearch_dsl import connections, Search, Q, A
# 连接es
es = connections.create_connection(hosts=['192.168.124.49:9200'], timeout=20)
print(es)
s = Search(using=es, index='person_info')
q = Q('match', provience='北京')
search = s.query(q)
search.aggs.bucket('age_gender_agg',
A('cardinality', script={'lang': 'painless', 'source': "doc['age'].value + doc['gender'].value"}))
ret = search.execute()
print(ret)
print(ret.aggregations.age_gender_agg)
print(ret.aggregations.age_gender_agg.value)
运行结果:
参考博文:
Retrieve selected fields from a search | Elasticsearch Guide [8.5] | Elastic
API Documentation — Elasticsearch DSL 7.2.0 documentation