searchusermenu
  • 发布文章
  • 消息中心
点赞
收藏
评论
分享
原创

Elasticsearch冷热分离的两种方案实践

2023-07-13 02:16:11
106
0

前言

Elasticsearch并不支持HDFS作为原生的存储介质,(支持的store类型:https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index-modules-store.html#file-system)基于HDFS实现冷热分离的方案,实现原理是用Elasticsearch创建一个基于HDFS的索引仓库,然后通过索引快照的方式把冷数据存储到HDFS。

注意,这种快照目前是不支持搜索的。可搜索快照是企业版才有的功能。

另一种冷热分离的方案是,SSD机器作为热节点,HDD机器作为冷节点,都部署在一个集群内,通过node attribute/node role来区分。然后通过ILM来实现索引在冷热节点之间的根据policy自动进行迁移。

 

基于HDFS实现冷热分离方案

hdfs插件安装

wget https://artifacts.elastic.co/downloads/elasticsearch-plugins/repository-hdfs/repository-hdfs-7.10.2.zip

 

bin/elasticsearch-plugin install file:///home/elasticsearch/repository-hdfs-7.10.2.zip

 

[elasticsearch@es2 plugins]$ ll

total 4

drwxr-xr-x 3 elasticsearch elasticsearch  244 Jun 19 17:09 analysis-ik

drwxr-xr-x 2 root          root          4096 Jul 10 15:38 repository-hdfs

装好了。

 

Hadoop安装

下载:

https://www.apache.org/dyn/closer.cgi/hadoop/common/

 

解压

tar zxvf hadoop-3.3.6.tar.gz

 

配置java和免密

[root@es1 hadoop]# java -version

openjdk version "1.8.0_372"

OpenJDK Runtime Environment BiSheng (build 1.8.0_372-b11)

OpenJDK 64-Bit Server VM BiSheng (build 25.372-b11, mixed mode)

 

[root@es1 hadoop]# ssh es1

Last login: Fri May 26 19:29:13 2023 from 192.168.56.105

[root@es1 ~]#

 

格式化文件系统

bin/hdfs namenode -format

 

修改$HADOOP_HOME/etc/Hadoop/hadoop-env.sh

export JAVA_HOME=/usr/bisheng-jdk1.8.0_372

export HDFS_NAMENODE_USER=root

export HDFS_DATANODE_USER=root

export HDFS_SECONDARYNAMENODE_USER=root

 

启动NameNode和DataNode

sbin/start-dfs.sh

 

访问:http://192.168.56.104:9870/

 

配置ES与HDFS的连接

hadoop创建目录

hdfs dfs -mkdir /es_snapshots

 

将core-site.xml的配置改成:

<configuration>

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://es1:9000</value>

    </property>

</configuration>

 

改一下目录权限:

./bin/hdfs dfs -chmod -R 777 /es_snapshots

 

es创建快照

curl -X PUT "localhost:9200/_snapshot/my_hdfs_repository" -H 'Content-Type: application/json' -d'

{

  "type": "hdfs",

  "settings": {

    "uri": "hdfs://es1:9000",

    "path": "/es_snapshots",

    "conf.dfs.client.read.shortcircuit": "false"

  }

}'

{"acknowledged":true}

 

将索引存储到hdfs

创建测试index:

curl -X PUT "localhost:9200/my_index" -H 'Content-Type: application/json' -d'

{

  "settings": {

    "number_of_shards": 1,

    "number_of_replicas": 0

  },

  "mappings": {

    "properties": {

      "name": {

        "type": "text"

      },

      "age": {

        "type": "integer"

      }

    }

  }

}'

 

插入数据:

curl -X POST "localhost:9200/my_index/_doc" -H 'Content-Type: application/json' -d'

{

  "name": "John Doe",

  "age": 30

}'

 

curl -X POST "localhost:9200/my_index/_doc" -H 'Content-Type: application/json' -d'

{

  "name": "Jane Doe",

  "age": 25

}'

 

创建快照:

curl -X PUT "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot?wait_for_completion=true" -H 'Content-Type: application/json' -d'

{

  "indices": "my_index",

  "ignore_unavailable": true,

  "include_global_state": false

}'

 

检查快照状态:

curl -X GET "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot?pretty"

 

查看hdfs,可以看到快照信息:

[root@es1 bin]# ./hdfs dfs -ls /es_snapshots

 

还原索引:

curl -X POST "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot/_restore" -H 'Content-Type: application/json' -d'

{

  "indices": "my_index",

  "ignore_unavailable": true,

  "include_global_state": false,

  "rename_pattern": "my_index",

  "rename_replacement": "restored_my_index"

}'

 

基于ILM实现冷热分离方案

3台的es集群,在1台的elasticsearch.yml里配置

node.attr.data: hot

另外1台配置

node.attr.data: warm

另外一台配置

node.attr.data: cold

重启集群。

 

创建policy

curl -X PUT "localhost:9200/_ilm/policy/test_policy" -H 'Content-Type: application/json' -d'

{

  "policy": {

    "phases": {

      "hot": {

        "actions": {

          "rollover": {

            "max_size":"10kb",

            "max_age":"10m",

                     "max_docs": 20

          }

        }

      },

      "warm": {

        "min_age": "0m",

        "actions": {

          "allocate": {

            "require": {

              "data": "warm"

            }

          }

        }

      },

      "cold": {

        "min_age": "20m",

        "actions": {

          "freeze": {},

          "allocate": {

            "require": {

              "data": "cold"

            }

          }

        }

      },

      "delete": {

        "min_age": "1h",

        "actions": {

          "delete": {}

        }

      }

    }

  }

}'

 

创建template

curl -X PUT "localhost:9200/_template/my_template" -H 'Content-Type: application/json' -d'

{

  "index_patterns": ["test-*"],

  "settings": {

    "number_of_shards": 1,

    "number_of_replicas": 0,

    "index.routing.allocation.require.data": "hot",

       "index.lifecycle.rollover_alias": "test",

    "index.lifecycle.name": "test_policy"

  }

}'

 

创建索引:

curl -X PUT "localhost:9200/test-000001" -H 'Content-Type: application/json' -d'

{

  "aliases": {

    "test": {}

  }

}'

 

发现索引建在了hot节点上:

$ curl -XGET 'http://localhost:9200/_cat/shards?v'

index                shard prirep state   docs store ip          node

test-000001          0     p      STARTED    7 7.9kb localhost  008

 

过10分钟之后观察:

$ curl -XGET 'http://localhost:9200/_cat/shards?v'

index                shard prirep state   docs  store ip          node

test-000002          0     p      STARTED    0   208b localhost  008

test-000001          0     p      STARTED   10 11.8kb localhost  009

 

注意看,test-000001滚动到了009这个warm节点,在hot节点创建了一个新的test-000002。

 

0条评论
0 / 1000
InnerPeace
5文章数
0粉丝数
InnerPeace
5 文章 | 0 粉丝
原创

Elasticsearch冷热分离的两种方案实践

2023-07-13 02:16:11
106
0

前言

Elasticsearch并不支持HDFS作为原生的存储介质,(支持的store类型:https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index-modules-store.html#file-system)基于HDFS实现冷热分离的方案,实现原理是用Elasticsearch创建一个基于HDFS的索引仓库,然后通过索引快照的方式把冷数据存储到HDFS。

注意,这种快照目前是不支持搜索的。可搜索快照是企业版才有的功能。

另一种冷热分离的方案是,SSD机器作为热节点,HDD机器作为冷节点,都部署在一个集群内,通过node attribute/node role来区分。然后通过ILM来实现索引在冷热节点之间的根据policy自动进行迁移。

 

基于HDFS实现冷热分离方案

hdfs插件安装

wget https://artifacts.elastic.co/downloads/elasticsearch-plugins/repository-hdfs/repository-hdfs-7.10.2.zip

 

bin/elasticsearch-plugin install file:///home/elasticsearch/repository-hdfs-7.10.2.zip

 

[elasticsearch@es2 plugins]$ ll

total 4

drwxr-xr-x 3 elasticsearch elasticsearch  244 Jun 19 17:09 analysis-ik

drwxr-xr-x 2 root          root          4096 Jul 10 15:38 repository-hdfs

装好了。

 

Hadoop安装

下载:

https://www.apache.org/dyn/closer.cgi/hadoop/common/

 

解压

tar zxvf hadoop-3.3.6.tar.gz

 

配置java和免密

[root@es1 hadoop]# java -version

openjdk version "1.8.0_372"

OpenJDK Runtime Environment BiSheng (build 1.8.0_372-b11)

OpenJDK 64-Bit Server VM BiSheng (build 25.372-b11, mixed mode)

 

[root@es1 hadoop]# ssh es1

Last login: Fri May 26 19:29:13 2023 from 192.168.56.105

[root@es1 ~]#

 

格式化文件系统

bin/hdfs namenode -format

 

修改$HADOOP_HOME/etc/Hadoop/hadoop-env.sh

export JAVA_HOME=/usr/bisheng-jdk1.8.0_372

export HDFS_NAMENODE_USER=root

export HDFS_DATANODE_USER=root

export HDFS_SECONDARYNAMENODE_USER=root

 

启动NameNode和DataNode

sbin/start-dfs.sh

 

访问:http://192.168.56.104:9870/

 

配置ES与HDFS的连接

hadoop创建目录

hdfs dfs -mkdir /es_snapshots

 

将core-site.xml的配置改成:

<configuration>

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://es1:9000</value>

    </property>

</configuration>

 

改一下目录权限:

./bin/hdfs dfs -chmod -R 777 /es_snapshots

 

es创建快照

curl -X PUT "localhost:9200/_snapshot/my_hdfs_repository" -H 'Content-Type: application/json' -d'

{

  "type": "hdfs",

  "settings": {

    "uri": "hdfs://es1:9000",

    "path": "/es_snapshots",

    "conf.dfs.client.read.shortcircuit": "false"

  }

}'

{"acknowledged":true}

 

将索引存储到hdfs

创建测试index:

curl -X PUT "localhost:9200/my_index" -H 'Content-Type: application/json' -d'

{

  "settings": {

    "number_of_shards": 1,

    "number_of_replicas": 0

  },

  "mappings": {

    "properties": {

      "name": {

        "type": "text"

      },

      "age": {

        "type": "integer"

      }

    }

  }

}'

 

插入数据:

curl -X POST "localhost:9200/my_index/_doc" -H 'Content-Type: application/json' -d'

{

  "name": "John Doe",

  "age": 30

}'

 

curl -X POST "localhost:9200/my_index/_doc" -H 'Content-Type: application/json' -d'

{

  "name": "Jane Doe",

  "age": 25

}'

 

创建快照:

curl -X PUT "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot?wait_for_completion=true" -H 'Content-Type: application/json' -d'

{

  "indices": "my_index",

  "ignore_unavailable": true,

  "include_global_state": false

}'

 

检查快照状态:

curl -X GET "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot?pretty"

 

查看hdfs,可以看到快照信息:

[root@es1 bin]# ./hdfs dfs -ls /es_snapshots

 

还原索引:

curl -X POST "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot/_restore" -H 'Content-Type: application/json' -d'

{

  "indices": "my_index",

  "ignore_unavailable": true,

  "include_global_state": false,

  "rename_pattern": "my_index",

  "rename_replacement": "restored_my_index"

}'

 

基于ILM实现冷热分离方案

3台的es集群,在1台的elasticsearch.yml里配置

node.attr.data: hot

另外1台配置

node.attr.data: warm

另外一台配置

node.attr.data: cold

重启集群。

 

创建policy

curl -X PUT "localhost:9200/_ilm/policy/test_policy" -H 'Content-Type: application/json' -d'

{

  "policy": {

    "phases": {

      "hot": {

        "actions": {

          "rollover": {

            "max_size":"10kb",

            "max_age":"10m",

                     "max_docs": 20

          }

        }

      },

      "warm": {

        "min_age": "0m",

        "actions": {

          "allocate": {

            "require": {

              "data": "warm"

            }

          }

        }

      },

      "cold": {

        "min_age": "20m",

        "actions": {

          "freeze": {},

          "allocate": {

            "require": {

              "data": "cold"

            }

          }

        }

      },

      "delete": {

        "min_age": "1h",

        "actions": {

          "delete": {}

        }

      }

    }

  }

}'

 

创建template

curl -X PUT "localhost:9200/_template/my_template" -H 'Content-Type: application/json' -d'

{

  "index_patterns": ["test-*"],

  "settings": {

    "number_of_shards": 1,

    "number_of_replicas": 0,

    "index.routing.allocation.require.data": "hot",

       "index.lifecycle.rollover_alias": "test",

    "index.lifecycle.name": "test_policy"

  }

}'

 

创建索引:

curl -X PUT "localhost:9200/test-000001" -H 'Content-Type: application/json' -d'

{

  "aliases": {

    "test": {}

  }

}'

 

发现索引建在了hot节点上:

$ curl -XGET 'http://localhost:9200/_cat/shards?v'

index                shard prirep state   docs store ip          node

test-000001          0     p      STARTED    7 7.9kb localhost  008

 

过10分钟之后观察:

$ curl -XGET 'http://localhost:9200/_cat/shards?v'

index                shard prirep state   docs  store ip          node

test-000002          0     p      STARTED    0   208b localhost  008

test-000001          0     p      STARTED   10 11.8kb localhost  009

 

注意看,test-000001滚动到了009这个warm节点,在hot节点创建了一个新的test-000002。

 

文章来自个人专栏
你知道为了搜索
5 文章 | 1 订阅
0条评论
0 / 1000
请输入你的评论
1
1