memsql filesystem pipeline 试用
一些功能类似drill ,比如s3,file 。。。
创建file pipeline
- 准备file
mkdir -p /opt/db/ touch books.txt 内容如下: The Catcher in the Rye, J.D. Salinger, 1945 Pride and Prejudice, Jane Austen, 1813 Of Mice and Men, John Steinbeck, 1937 Frankenstein, Mary Shelley, 1818
- 创建表
memsql CREATE DATABASE books; USE books; CREATE TABLE classic_books ( title VARCHAR(255), author VARCHAR(255), date VARCHAR(255) );
- 创建pipeline
CREATE PIPELINE library AS LOAD DATA FS '/opt/db/*' INTO TABLE `classic_books` FIELDS TERMINATED BY ',';
启用pipeline
- 启动
START PIPELINE library;
- 查看状态
SHOW PIPELINES;
测试结果
几个问题
- Paused due to error. Run START PIPELINE or consider setting pipelines_stop_on_error to false
配置参数修改 SET GLOBAL pipelines_stop_on_error = false;
- 注意文件的权限,同时文件必须是每个节点的,不然一直看不到数据(我没注意一直在master,就有问题)
- 经常有以下提示
Data volume has significantly changed since the last time ANALYZE TABLE was run. Run <a target="_blank" href="https://www.ctyun.cn/portal/link.html?target=https%3A%2F%2Fdocs.memsql.com%2Fops-redir%2Fanalyze%2F%3Futm_source%3Dops%26amp%3Bamp%3Butm_medium%3Dlink%26amp%3Bamp%3Butm_campaign%3Dref" data-react ><b data-react>ANALYZE TABLE</b></a> on each table to improve query performance andrefresh schema.
解决方法
按照提示操作即可,这个可能和我的系统没有进行参数优化有关,具体的可以参考下面的资料的安装最佳实践
参考资料
https://docs.memsql.com/memsql-pipelines/v6.0/filesystem-pipelines-quickstart/
https://docs.memsql.com/memsql-pipelines/v6.0/filesystem-pipelines-overview/
https://docs.memsql.com/tutorials/v6.0/installation-best-practices/