场景说明

通过给集群配置自定义词库，将“智能手机”设置为主词，“是”设置为停词，“开心”和“高兴”设置为同义词。使用配置好的集群，对文本内容“智能手机是很好用”进行关键词搜索，查看关键词查询效果；对文本内容“我今天获奖了我很开心”进行同义词搜索，查看同义词查询效果。

配置自定义词库

1.准备词库文件（UTF-8无BOM格式编码的文本文件），上传到对应OBS路径下。

主词词库文件中包含词语“智能手机”；停词词库文件中包含词语“是”；同义词词库文件中包含一组同义词“开心”和“高兴”。


                    说明
                    由于系统默认词库的停用词包含了“是”、“的”等常用词，此类停用词可以不用上传。

2.在云搜索服务管理控制台，单击左侧导航栏的“集群管理”。

3.在“集群管理”页面，单击需要配置自定义词库的集群名称，进入集群基本信息页面。

4.在左侧导航栏，选择“自定义词库”，参考配置自定义词库为集群配置1准备好的词库文件。

5.待词库配置信息生效后，返回集群列表。单击集群操作列的“Kibana”接入集群。

6.在Kibana界面，单击左侧导航栏的“Dev Tools”，进入操作页面。

7.执行如下命令，查看自定义词库的不同分词策略的分词效果。

−使用ik_smart分词策略对文本内容“智能手机是很好用”进行分词。

示例代码：

POST /_analyze 
{ 
  "analyzer":"ik_smart", 
  "text":"智能手机是很好用" 
}

运行结束后，查看分词效果：

{ 
  "tokens": [ 
    { 
      "token": "智能手机", 
      "start_offset": 0, 
      "end_offset": 4, 
      "type": "CN_WORD", 
      "position": 0 
    }, 
    { 
      "token": "很好用", 
      "start_offset": 5, 
      "end_offset": 8, 
      "type": "CN_WORD", 
      "position": 1 
    } 
  ] 
}

−使用ik_max_word分词策略对文本内容“智能手机是很好用”进行分词。

示例代码：

POST /_analyze 
{ 
  "analyzer":"ik_max_word", 
  "text":"智能手机是很好用" 
}

运行结束后，查看分词效果：

{ 
  "tokens" : [ 
    { 
      "token" : "智能手机", 
      "start_offset" : 0, 
      "end_offset" : 4, 
      "type" : "CN_WORD", 
      "position" : 0 
    }, 
    { 
      "token" : "智能", 
      "start_offset" : 0, 
      "end_offset" : 2, 
      "type" : "CN_WORD", 
      "position" : 1 
    }, 
    { 
      "token" : "智", 
      "start_offset" : 0, 
      "end_offset" : 1, 
      "type" : "CN_WORD", 
      "position" : 2 
    }, 
    { 
      "token" : "能手", 
      "start_offset" : 1, 
      "end_offset" : 3, 
      "type" : "CN_WORD", 
      "position" : 3 
    }, 
    { 
      "token" : "手机", 
      "start_offset" : 2, 
      "end_offset" : 4, 
      "type" : "CN_WORD", 
      "position" : 4 
    }, 
    { 
      "token" : "机", 
      "start_offset" : 3, 
      "end_offset" : 4, 
      "type" : "CN_WORD", 
      "position" : 5 
    }, 
    { 
      "token" : "很好用", 
      "start_offset" : 5, 
      "end_offset" : 8, 
      "type" : "CN_WORD", 
      "position" : 6 
    }, 
    { 
      "token" : "很好", 
      "start_offset" : 5, 
      "end_offset" : 7, 
      "type" : "CN_WORD", 
      "position" : 7 
    }, 
    { 
      "token" : "好用", 
      "start_offset" : 6, 
      "end_offset" : 8, 
      "type" : "CN_WORD", 
      "position" : 8 
    }, 
    { 
      "token" : "用", 
      "start_offset" : 7, 
      "end_offset" : 8, 
      "type" : "CN_WORD", 
      "position" : 9 
    } 
  ] 
}

使用关键词搜索

Elasticsearch 7.x之前的版本和之后的版本，命令有差别，所以分开举例。

7.x之前的版本

1.创建索引“book”，配置分词策略。

示例中“analyzer”和“search_analyzer”可以根据实际需要“ik_max_word”或“ik_smart”分词策略，此处以“ik_max_word”为例。

PUT /book 
{ 
    "settings": { 
        "number_of_shards": 2, 
        "number_of_replicas": 1 
    }, 
    "mappings": { 
        "type1": { 
            "properties": { 
                "content": { 
                    "type": "text", 
                    "analyzer": "ik_max_word", 
                    "search_analyzer": "ik_max_word" 
                } 
            } 
        } 
    } 
}

2.导入数据，将文本信息导入“book”索引中。

PUT /book/type1/1 
{ 
  "content":"智能手机是很好用" 
}

3.使用关键词“智能手机”进行文本搜索，并查看搜索结果。

GET /book/type1/_search 
{ 
  "query": { 
    "match": { 
      "content": "智能手机" 
    } 
  } 
}

搜索结果：

{ 
  "took" : 20, 
  "timed_out" : false, 
  "_shards" : { 
    "total" : 2, 
    "successful" : 2, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : { 
    "total" : 1, 
    "max_score" : 1.1507283, 
    "hits" : [ 
      { 
        "_index" : "book", 
        "_type" : "type1", 
        "_id" : "1", 
        "_score" : 1.1507283, 
        "_source" : { 
          "content" : "智能手机是很好用" 
        } 
      } 
    ] 
  } 
}

7.x及之后的版本

1.创建索引“book”，配置分词策略。

示例中“analyzer”和“search_analyzer”可以根据实际需要“ik_max_word”或“ik_smart”分词策略，此处以“ik_max_word”为例。

PUT /book 
{ 
    "settings": { 
        "number_of_shards": 2, 
        "number_of_replicas": 1 
    }, 
    "mappings": { 
        "properties": { 
            "content": { 
                "type": "text", 
                "analyzer": "ik_max_word", 
                "search_analyzer": "ik_max_word" 
            } 
        } 
    } 
}

2.导入数据，将文本信息导入“book”索引中。

PUT /book/_doc/1  
{  
  "content":"智能手机是很好用"  
}

3.使用关键词“智能手机”进行文本搜索，并查看搜索结果。

GET /book/_doc/_search 
{ 
  "query": { 
    "match": { 
      "content": "智能手机" 
    } 
  } 
}

搜索结果：

{ 
  "took" : 16, 
  "timed_out" : false, 
  "_shards" : { 
    "total" : 2, 
    "successful" : 2, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : { 
    "total" : { 
      "value" : 1, 
      "relation" : "eq" 
    }, 
    "max_score" : 1.7260926, 
    "hits" : [ 
      { 
        "_index" : "book", 
        "_type" : "_doc", 
        "_id" : "1", 
        "_score" : 1.7260926, 
        "_source" : { 
          "content" : "智能手机是很好用" 
        } 
      } 
    ] 
  } 
}

使用同义词搜索

Elasticsearch 7.x之前的版本和之后的版本，命令有差别，所以分开举例。

7.x之前的版本

1.创建索引“myindex”，配置分词策略。

PUT myindex 
{ 
  "settings": { 
    "analysis": { 
      "filter": { 
        "my_synonym": { 
          "type": "dynamic_synonym" 
        } 
      }, 
      "analyzer": { 
        "ik_synonym": { 
          "filter": [ 
            "my_synonym" 
          ], 
          "type": "custom", 
          "tokenizer": "ik_smart" 
        } 
      } 
    } 
  }, 
  "mappings": { 
    "mytype" :{ 
      "properties": { 
        "desc": { 
          "type": "text", 
          "analyzer": "ik_synonym" 
        } 
      } 
    } 
  } 
}

2.导入数据，将文本信息导入“myindex”索引中。

PUT /myindex/mytype/1 
{ 
    "desc": "我今天获奖了我很开心" 
}

3.使用同义词“高兴”进行文本搜索，并查看搜索结果。

GET /myindex/_search 
{ 
  "query": { 
    "match": { 
      "desc": "高兴" 
    } 
  } 
}

搜索结果：

{ 
  "took" : 2, 
  "timed_out" : false, 
  "_shards" : { 
    "total" : 5, 
    "successful" : 5, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : { 
    "total" : 1, 
    "max_score" : 0.49445358, 
    "hits" : [ 
      { 
        "_index" : "myindex", 
        "_type" : "mytype", 
        "_id" : "1", 
        "_score" : 0.49445358, 
        "_source" : { 
          "desc" : "我今天获奖了我很开心" 
        } 
      } 
    ] 
  } 
}

7.x及之后的版本

1.创建索引“myindex”，配置分词策略。

PUT myindex 
{ 
    "settings": { 
        "analysis": { 
            "filter": { 
                "my_synonym": { 
                    "type": "dynamic_synonym" 
                } 
            }, 
            "analyzer": { 
                "ik_synonym": { 
                    "filter": [ 
                        "my_synonym" 
                    ], 
                    "type": "custom", 
                    "tokenizer": "ik_smart" 
                } 
            } 
        } 
    }, 
    "mappings": { 
        "properties": { 
            "desc": { 
                "type": "text", 
                "analyzer": "ik_synonym" 
            } 
        } 
    } 
}

2.导入数据，将文本信息导入“myindex”索引中。

PUT /myindex/_doc/1 
{ 
    "desc": "我今天获奖了我很开心" 
}

3.使用同义词“高兴”进行文本搜索，并查看搜索结果。

GET /myindex/_search 
{ 
  "query": { 
    "match": { 
      "desc": "高兴" 
    } 
  } 
}

搜索结果：

{ 
  "took" : 1, 
  "timed_out" : false, 
  "_shards" : { 
    "total" : 1, 
    "successful" : 1, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : { 
    "total" : { 
      "value" : 1, 
      "relation" : "eq" 
    }, 
    "max_score" : 0.1519955, 
    "hits" : [ 
      { 
        "_index" : "myindex", 
        "_type" : "_doc", 
        "_id" : "1", 
        "_score" : 0.1519955, 
        "_source" : { 
          "desc" : "我今天获奖了我很开心" 
        } 
      } 
    ] 
  } 
}

息壤智算

应用商城

定价

合作伙伴

开发者

支持与服务

了解天翼云

云搜索服务

云搜索服务

场景说明

配置自定义词库

使用关键词搜索

使用同义词搜索

活动

息壤智算

应用商城

定价

合作伙伴

开发者

支持与服务

了解天翼云

云搜索服务

云搜索服务

场景说明

配置自定义词库

使用关键词搜索

使用同义词搜索