前言
Elasticsearch并不支持HDFS作为原生的存储介质,(支持的store类型:https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index-modules-store.html#file-system)基于HDFS实现冷热分离的方案,实现原理是用Elasticsearch创建一个基于HDFS的索引仓库,然后通过索引快照的方式把冷数据存储到HDFS。
注意,这种快照目前是不支持搜索的。可搜索快照是企业版才有的功能。
另一种冷热分离的方案是,SSD机器作为热节点,HDD机器作为冷节点,都部署在一个集群内,通过node attribute/node role来区分。然后通过ILM来实现索引在冷热节点之间的根据policy自动进行迁移。
基于HDFS实现冷热分离方案
hdfs插件安装
bin/elasticsearch-plugin install file:///home/elasticsearch/repository-hdfs-7.10.2.zip
[elasticsearch@es2 plugins]$ ll
total 4
drwxr-xr-x 3 elasticsearch elasticsearch 244 Jun 19 17:09 analysis-ik
drwxr-xr-x 2 root root 4096 Jul 10 15:38 repository-hdfs
装好了。
Hadoop安装
下载:
https://www.apache.org/dyn/closer.cgi/hadoop/common/
解压
tar zxvf hadoop-3.3.6.tar.gz
配置java和免密
[root@es1 hadoop]# java -version
openjdk version "1.8.0_372"
OpenJDK Runtime Environment BiSheng (build 1.8.0_372-b11)
OpenJDK 64-Bit Server VM BiSheng (build 25.372-b11, mixed mode)
[root@es1 hadoop]# ssh es1
Last login: Fri May 26 19:29:13 2023 from 192.168.56.105
[root@es1 ~]#
格式化文件系统
bin/hdfs namenode -format
修改$HADOOP_HOME/etc/Hadoop/hadoop-env.sh
export JAVA_HOME=/usr/bisheng-jdk1.8.0_372
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
启动NameNode和DataNode
sbin/start-dfs.sh
访问:http://192.168.56.104:9870/
配置ES与HDFS的连接
hadoop创建目录
hdfs dfs -mkdir /es_snapshots
将core-site.xml的配置改成:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://es1:9000</value>
</property>
</configuration>
改一下目录权限:
./bin/hdfs dfs -chmod -R 777 /es_snapshots
es创建快照
curl -X PUT "localhost:9200/_snapshot/my_hdfs_repository" -H 'Content-Type: application/json' -d'
{
"type": "hdfs",
"settings": {
"uri": "hdfs://es1:9000",
"path": "/es_snapshots",
"conf.dfs.client.read.shortcircuit": "false"
}
}'
{"acknowledged":true}
将索引存储到hdfs
创建测试index:
curl -X PUT "localhost:9200/my_index" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "integer"
}
}
}
}'
插入数据:
curl -X POST "localhost:9200/my_index/_doc" -H 'Content-Type: application/json' -d'
{
"name": "John Doe",
"age": 30
}'
curl -X POST "localhost:9200/my_index/_doc" -H 'Content-Type: application/json' -d'
{
"name": "Jane Doe",
"age": 25
}'
创建快照:
curl -X PUT "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot?wait_for_completion=true" -H 'Content-Type: application/json' -d'
{
"indices": "my_index",
"ignore_unavailable": true,
"include_global_state": false
}'
检查快照状态:
curl -X GET "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot?pretty"
查看hdfs,可以看到快照信息:
[root@es1 bin]# ./hdfs dfs -ls /es_snapshots
还原索引:
curl -X POST "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot/_restore" -H 'Content-Type: application/json' -d'
{
"indices": "my_index",
"ignore_unavailable": true,
"include_global_state": false,
"rename_pattern": "my_index",
"rename_replacement": "restored_my_index"
}'
基于ILM实现冷热分离方案
3台的es集群,在1台的elasticsearch.yml里配置
node.attr.data: hot
另外1台配置
node.attr.data: warm
另外一台配置
node.attr.data: cold
重启集群。
创建policy
curl -X PUT "localhost:9200/_ilm/policy/test_policy" -H 'Content-Type: application/json' -d'
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size":"10kb",
"max_age":"10m",
"max_docs": 20
}
}
},
"warm": {
"min_age": "0m",
"actions": {
"allocate": {
"require": {
"data": "warm"
}
}
}
},
"cold": {
"min_age": "20m",
"actions": {
"freeze": {},
"allocate": {
"require": {
"data": "cold"
}
}
}
},
"delete": {
"min_age": "1h",
"actions": {
"delete": {}
}
}
}
}
}'
创建template
curl -X PUT "localhost:9200/_template/my_template" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["test-*"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"index.routing.allocation.require.data": "hot",
"index.lifecycle.rollover_alias": "test",
"index.lifecycle.name": "test_policy"
}
}'
创建索引:
curl -X PUT "localhost:9200/test-000001" -H 'Content-Type: application/json' -d'
{
"aliases": {
"test": {}
}
}'
发现索引建在了hot节点上:
$ curl -XGET 'http://localhost:9200/_cat/shards?v'
index shard prirep state docs store ip node
test-000001 0 p STARTED 7 7.9kb localhost 008
过10分钟之后观察:
$ curl -XGET 'http://localhost:9200/_cat/shards?v'
index shard prirep state docs store ip node
test-000002 0 p STARTED 0 208b localhost 008
test-000001 0 p STARTED 10 11.8kb localhost 009
注意看,test-000001滚动到了009这个warm节点,在hot节点创建了一个新的test-000002。