前言

Elasticsearch并不支持HDFS作为原生的存储介质，（支持的store类型：https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index-modules-store.html#file-system）基于HDFS实现冷热分离的方案，实现原理是用Elasticsearch创建一个基于HDFS的索引仓库，然后通过索引快照的方式把冷数据存储到HDFS。

注意，这种快照目前是不支持搜索的。可搜索快照是企业版才有的功能。

另一种冷热分离的方案是，SSD机器作为热节点，HDD机器作为冷节点，都部署在一个集群内，通过node attribute/node role来区分。然后通过ILM来实现索引在冷热节点之间的根据policy自动进行迁移。

基于HDFS实现冷热分离方案

hdfs插件安装

wget https://artifacts.elastic.co/downloads/elasticsearch-plugins/repository-hdfs/repository-hdfs-7.10.2.zip

bin/elasticsearch-plugin install file:///home/elasticsearch/repository-hdfs-7.10.2.zip

[elasticsearch@es2 plugins]$ ll

total 4

drwxr-xr-x 3 elasticsearch elasticsearch 244 Jun 19 17:09 analysis-ik

drwxr-xr-x 2 root root 4096 Jul 10 15:38 repository-hdfs

装好了。

Hadoop安装

下载：

https://www.apache.org/dyn/closer.cgi/hadoop/common/

解压

tar zxvf hadoop-3.3.6.tar.gz

配置java和免密

[root@es1 hadoop]# java -version

openjdk version "1.8.0_372"

OpenJDK Runtime Environment BiSheng (build 1.8.0_372-b11)

OpenJDK 64-Bit Server VM BiSheng (build 25.372-b11, mixed mode)

[root@es1 hadoop]# ssh es1

Last login: Fri May 26 19:29:13 2023 from 192.168.56.105

[root@es1 ~]#

格式化文件系统

bin/hdfs namenode -format

修改$HADOOP_HOME/etc/Hadoop/hadoop-env.sh

export JAVA_HOME=/usr/bisheng-jdk1.8.0_372

export HDFS_NAMENODE_USER=root

export HDFS_DATANODE_USER=root

export HDFS_SECONDARYNAMENODE_USER=root

启动NameNode和DataNode

sbin/start-dfs.sh

访问：http://192.168.56.104:9870/

配置ES与HDFS的连接

hadoop创建目录

hdfs dfs -mkdir /es_snapshots

将core-site.xml的配置改成：

<name>fs.defaultFS</name>

</property>

</configuration>

改一下目录权限：

./bin/hdfs dfs -chmod -R 777 /es_snapshots

es创建快照

curl -X PUT "localhost:9200/_snapshot/my_hdfs_repository" -H 'Content-Type: application/json' -d'

{

"type": "hdfs",

"settings": {

"uri": "hdfs://es1:9000",

"path": "/es_snapshots",

"conf.dfs.client.read.shortcircuit": "false"

}

{"acknowledged":true}

将索引存储到hdfs

创建测试index：

curl -X PUT "localhost:9200/my_index" -H 'Content-Type: application/json' -d'

{

"settings": {

"number_of_shards": 1,

"number_of_replicas": 0

"mappings": {

"properties": {

"name": {

"type": "text"

"age": {

"type": "integer"

}

插入数据：

curl -X POST "localhost:9200/my_index/_doc" -H 'Content-Type: application/json' -d'

{

"name": "John Doe",

"age": 30

curl -X POST "localhost:9200/my_index/_doc" -H 'Content-Type: application/json' -d'

{

"name": "Jane Doe",

"age": 25

创建快照：

curl -X PUT "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot?wait_for_completion=true" -H 'Content-Type: application/json' -d'

{

"indices": "my_index",

"ignore_unavailable": true,

"include_global_state": false

检查快照状态：

curl -X GET "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot?pretty"

查看hdfs，可以看到快照信息：

[root@es1 bin]# ./hdfs dfs -ls /es_snapshots

还原索引：

curl -X POST "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot/_restore" -H 'Content-Type: application/json' -d'

{

"indices": "my_index",

"ignore_unavailable": true,

"include_global_state": false,

"rename_pattern": "my_index",

"rename_replacement": "restored_my_index"

基于ILM实现冷热分离方案

3台的es集群，在1台的elasticsearch.yml里配置

node.attr.data: hot

另外1台配置

node.attr.data: warm

另外一台配置

node.attr.data: cold

重启集群。

创建policy

curl -X PUT "localhost:9200/_ilm/policy/test_policy" -H 'Content-Type: application/json' -d'

{

"policy": {

"phases": {

"hot": {

"actions": {

"rollover": {

"max_size":"10kb",

"max_age":"10m",

"max_docs": 20

}

"warm": {

"min_age": "0m",

"actions": {

"allocate": {

"require": {

"data": "warm"

}

"cold": {

"min_age": "20m",

"actions": {

"freeze": {},

"allocate": {

"require": {

"data": "cold"

}

"delete": {

"min_age": "1h",

"actions": {

"delete": {}

}

创建template

curl -X PUT "localhost:9200/_template/my_template" -H 'Content-Type: application/json' -d'

{

"index_patterns": ["test-*"],

"settings": {

"number_of_shards": 1,

"number_of_replicas": 0,

"index.routing.allocation.require.data": "hot",

"index.lifecycle.rollover_alias": "test",

"index.lifecycle.name": "test_policy"

}

创建索引：

curl -X PUT "localhost:9200/test-000001" -H 'Content-Type: application/json' -d'

{

"aliases": {

"test": {}

}

发现索引建在了hot节点上：

$ curl -XGET 'http://localhost:9200/_cat/shards?v'

index shard prirep state docs store ip node

test-000001 0 p STARTED 7 7.9kb localhost 008

过10分钟之后观察：

$ curl -XGET 'http://localhost:9200/_cat/shards?v'

index shard prirep state docs store ip node

test-000002 0 p STARTED 0 208b localhost 008

test-000001 0 p STARTED 10 11.8kb localhost 009

注意看，test-000001滚动到了009这个warm节点，在hot节点创建了一个新的test-000002。

前言

注意，这种快照目前是不支持搜索的。可搜索快照是企业版才有的功能。

基于HDFS实现冷热分离方案

hdfs插件安装

wget https://artifacts.elastic.co/downloads/elasticsearch-plugins/repository-hdfs/repository-hdfs-7.10.2.zip

bin/elasticsearch-plugin install file:///home/elasticsearch/repository-hdfs-7.10.2.zip

[elasticsearch@es2 plugins]$ ll

total 4

drwxr-xr-x 3 elasticsearch elasticsearch 244 Jun 19 17:09 analysis-ik

drwxr-xr-x 2 root root 4096 Jul 10 15:38 repository-hdfs

装好了。

Hadoop安装

下载：

https://www.apache.org/dyn/closer.cgi/hadoop/common/

解压

tar zxvf hadoop-3.3.6.tar.gz

配置java和免密

[root@es1 hadoop]# java -version

openjdk version "1.8.0_372"

OpenJDK Runtime Environment BiSheng (build 1.8.0_372-b11)

OpenJDK 64-Bit Server VM BiSheng (build 25.372-b11, mixed mode)

[root@es1 hadoop]# ssh es1

Last login: Fri May 26 19:29:13 2023 from 192.168.56.105

[root@es1 ~]#

格式化文件系统

bin/hdfs namenode -format

修改$HADOOP_HOME/etc/Hadoop/hadoop-env.sh

export JAVA_HOME=/usr/bisheng-jdk1.8.0_372

export HDFS_NAMENODE_USER=root

export HDFS_DATANODE_USER=root

export HDFS_SECONDARYNAMENODE_USER=root

启动NameNode和DataNode

sbin/start-dfs.sh

访问：http://192.168.56.104:9870/

配置ES与HDFS的连接

hadoop创建目录

hdfs dfs -mkdir /es_snapshots

将core-site.xml的配置改成：

<name>fs.defaultFS</name>

</property>

</configuration>

改一下目录权限：

./bin/hdfs dfs -chmod -R 777 /es_snapshots

es创建快照

curl -X PUT "localhost:9200/_snapshot/my_hdfs_repository" -H 'Content-Type: application/json' -d'

{

"type": "hdfs",

"settings": {

"uri": "hdfs://es1:9000",

"path": "/es_snapshots",

"conf.dfs.client.read.shortcircuit": "false"

}

{"acknowledged":true}

将索引存储到hdfs

创建测试index：

curl -X PUT "localhost:9200/my_index" -H 'Content-Type: application/json' -d'

{

"settings": {

"number_of_shards": 1,

"number_of_replicas": 0

"mappings": {

"properties": {

"name": {

"type": "text"

"age": {

"type": "integer"

}

插入数据：

curl -X POST "localhost:9200/my_index/_doc" -H 'Content-Type: application/json' -d'

{

"name": "John Doe",

"age": 30

curl -X POST "localhost:9200/my_index/_doc" -H 'Content-Type: application/json' -d'

{

"name": "Jane Doe",

"age": 25

创建快照：

curl -X PUT "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot?wait_for_completion=true" -H 'Content-Type: application/json' -d'

{

"indices": "my_index",

"ignore_unavailable": true,

"include_global_state": false

检查快照状态：

curl -X GET "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot?pretty"

查看hdfs，可以看到快照信息：

[root@es1 bin]# ./hdfs dfs -ls /es_snapshots

还原索引：

curl -X POST "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot/_restore" -H 'Content-Type: application/json' -d'

{

"indices": "my_index",

"ignore_unavailable": true,

"include_global_state": false,

"rename_pattern": "my_index",

"rename_replacement": "restored_my_index"

基于ILM实现冷热分离方案

3台的es集群，在1台的elasticsearch.yml里配置

node.attr.data: hot

另外1台配置

node.attr.data: warm

另外一台配置

node.attr.data: cold

重启集群。

创建policy

curl -X PUT "localhost:9200/_ilm/policy/test_policy" -H 'Content-Type: application/json' -d'

{

"policy": {

"phases": {

"hot": {

"actions": {

"rollover": {

"max_size":"10kb",

"max_age":"10m",

"max_docs": 20

}

"warm": {

"min_age": "0m",

"actions": {

"allocate": {

"require": {

"data": "warm"

}

"cold": {

"min_age": "20m",

"actions": {

"freeze": {},

"allocate": {

"require": {

"data": "cold"

}

"delete": {

"min_age": "1h",

"actions": {

"delete": {}

}

创建template

curl -X PUT "localhost:9200/_template/my_template" -H 'Content-Type: application/json' -d'

{

"index_patterns": ["test-*"],

"settings": {

"number_of_shards": 1,

"number_of_replicas": 0,

"index.routing.allocation.require.data": "hot",

"index.lifecycle.rollover_alias": "test",

"index.lifecycle.name": "test_policy"

}

创建索引：

curl -X PUT "localhost:9200/test-000001" -H 'Content-Type: application/json' -d'

{

"aliases": {

"test": {}

}

发现索引建在了hot节点上：

$ curl -XGET 'http://localhost:9200/_cat/shards?v'

index shard prirep state docs store ip node

test-000001 0 p STARTED 7 7.9kb localhost 008

过10分钟之后观察：

$ curl -XGET 'http://localhost:9200/_cat/shards?v'

index shard prirep state docs store ip node

test-000002 0 p STARTED 0 208b localhost 008

test-000001 0 p STARTED 10 11.8kb localhost 009

注意看，test-000001滚动到了009这个warm节点，在hot节点创建了一个新的test-000002。

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

Elasticsearch冷热分离的两种方案实践

前言

基于HDFS实现冷热分离方案

hdfs插件安装

Hadoop安装

配置ES与HDFS的连接

将索引存储到hdfs

基于ILM实现冷热分离方案

Elasticsearch冷热分离的两种方案实践

前言

基于HDFS实现冷热分离方案

hdfs插件安装

Hadoop安装

配置ES与HDFS的连接

将索引存储到hdfs

基于ILM实现冷热分离方案

活动

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

Elasticsearch冷热分离的两种方案实践

前言

基于HDFS实现冷热分离方案

hdfs插件安装

Hadoop安装

配置ES与HDFS的连接

将索引存储到hdfs

基于ILM实现冷热分离方案

Elasticsearch冷热分离的两种方案实践

前言

基于HDFS实现冷热分离方案

hdfs插件安装

Hadoop安装

配置ES与HDFS的连接

将索引存储到hdfs

基于ILM实现冷热分离方案