Linux下Elasticsearch 1.7.0安装配置

共计 9564 个字符，预计需要花费 24 分钟才能阅读完成。

ElasticSearch 是一个基于 Lucene 的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于 RESTful web 接口。Elasticsearch 是用 Java 开发的，并作为 Apache 许可条款下的开放源码发布，是当前流行的企业级搜索引擎。设计用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。

使用案例：

维基百科使用 Elasticsearch 来进行全文搜做并高亮显示关键词，以及提供 search-as-you-type、did-you-mean 等搜索建议功能。
英国卫报使用 Elasticsearch 来处理访客日志，以便能将公众对不同文章的反应实时地反馈给各位编辑。
StackOverflow 将全文搜索与地理位置和相关信息进行结合，以提供 more-like-this 相关问题的展现。
GitHub 使用 Elasticsearch 来检索超过 1300 亿行代码。
每天，Goldman Sachs 使用它来处理 5TB 数据的索引，还有很多投行使用它来分析股票市场的变动。

ElasticSearch 的优缺点：

优点：

1、Elasticsearch 是分布式的。不需要其他组件，分发是实时的，被叫做”Push replication”。

2、Elasticsearch 完全支持 Apache Lucene 的接近实时的搜索。

3、处理多租户（multitenancy）不需要特殊配置，而 Solr 则需要更多的高级设置。

4、Elasticsearch 采用 Gateway 的概念，使得完备份更加简单。

5、各节点组成对等的网络结构，某些节点出现故障时会自动分配其他节点代替其进行工作。

缺点：

1、还不够自动

2、仅支持 json 文件格式。

华丽的分割线（以下进入正题）

————————————– 分割线 ————————————–

1、首先去 elasticsearch 官网下载软件包版本 1.7.0 版本.

#wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.0.tar.gz

2、解压 elasticsearch-1.7.0.tar.gz 软件包.

#tar zxf elasticsearch-1.7.0.tar.gz

3、es 配置文件参数解释(真正配置不全用的到)：

# 集群名称标识了你的集群，自动探查会用到它。

# 如果你在同一个网络中运行多个集群，那就要确保你的集群名称是独一无二的.

cluster.name: test-elasticsearch

# 节点名称会在启动的时候自动生成，所以你可以不用手动配置。你也可以给节点指定一个特定的名称.

node.name: “elsearch1”

# 允许这个节点被选举为一个主节点(默认为允许)

#node.master: true

# 允许这个节点存储数据(默认为允许)

# node.data: true

#You can exploit these settings to design advanced cluster topologies.

# 你可以利用这些设置设计高级的集群拓扑

# 1. You want this node to never become a master node, only to hold data.

# This will be the “workhorse” of your cluster.

# 1. 你不想让这个节点成为一个主节点，只想用来存储数据。

# 这个节点会成为你的集群的“负载器”

# node.master: false
# node.data: true

#You want this node to only serve as a master: to not store any data and

# to have free resources. This will be the “coordinator” of your cluster.

# 2. 你想让这个节点成为一个主节点，并且不用来存储任何数据，并且拥有空闲资源。

# 这个节点会成为你集群中的“协调器”

# node.master: true
# node.data: false

# Use the Cluster Health API [http://localhost:9200/_cluster/health], the

# Node Info API [http://localhost:9200/_nodes] or GUI tools

# 使用集群体检 API[http://localhost:9200/_cluster/health] ,

# 节点信息 API[http://localhost:9200/_cluster/nodes] 或者 GUI 工具例如：

# A node can have generic attributes associated with it, which can later be used

# for customized shard allocation filtering, or allocation awareness. An attribute

# is a simple key value pair, similar to node.key: value, here is an example:

# 一个节点可以附带一些普通的属性，这些属性可以在后面的自定义分片分配过滤或者 allocation awareness 中使用。

# 一个属性就是一个简单的键值对，类似于 node.key: value, 这里有一个例子：

# node.rack: rack314

# By default, multiple nodes are allowed to start from the same installation location

# to disable it, set the following:

# 默认的，多个节点允许从同一个安装位置启动。若想禁止这个特性，按照下面所示配置：

# node.max_local_storage_nodes: 1

# Set the number of shards (splits) of an index (5 by default):

# 设置一个索引的分片数量(默认为 5)

# index.number_of_shards: 5

# Set the number of replicas (additional copies) of an index (1 by default):

# 设置一个索引的副本数量(默认为 1)

# index.number_of_replicas: 1

# Note, that for development on a local machine, with small indices, it usually

# makes sense to “disable” the distributed features:

# 注意，为了使用小的索引在本地机器上开发，禁用分布式特性是合理的做法。

# index.number_of_shards: 1
# index.number_of_replicas: 0

# Path to directory containing configuration (this file and logging.yml):

# 包含配置 (这个文件和 logging.yml) 的目录的路径

# path.conf: /path/to/conf

# Path to directory where to store index data allocated for this node.

# 存储这个节点的索引数据的目录的路径

# path.data: /path/to/data

# Can optionally include more than one location, causing data to be striped across

# the locations (a la RAID 0) on a file level, favouring locations with most free

# space on creation. For example:

# 可以随意的包含不止一个位置，这样数据会在文件层跨越多个位置(a la RAID 0), 创建时会

# 优先选择大的剩余空间的位置

# path.data: /path/to/data1,/path/to/data2

# Path to temporary files:

# 临时文件的路径

# path.work: /path/to/work

# Path to log files:

# 日志文件的路径

# path.logs: /path/to/logs

# Path to where plugins are installed:

# 插件安装路径

# path.plugins: /path/to/plugins

# If a plugin listed here is not installed for current node, the node will not start.

# 如果当前结点没有安装下面列出的插件，结点不会启动

# plugin.mandatory: mapper-attachments,lang-groovy

# ElasticSearch performs poorly when JVM starts swapping: you should ensure that

# it _never_ swaps.

# 当 JVM 开始 swapping(换页)时 ElasticSearch 性能会低下，你应该保证它不会换页

# Set this property to true to lock the memory:

# 设置这个属性为 true 来锁定内存

# bootstrap.mlockall: true

# Make sure that the ES_MIN_MEM and ES_MAX_MEM environment variables are set

# to the same value, and that the machine has enough memory to allocate

# for ElasticSearch, leaving enough memory for the operating system itself.

# 确保 ES_MIN_MEM 和 ES_MAX_MEM 环境变量设置成了同一个值，确保机器有足够的内存来分配

# 给 ElasticSearch，并且保留足够的内存给操作系统

# You should also make sure that the ElasticSearch process is allowed to lock

# the memory, eg. by using `ulimit -l unlimited`.

# 你应该确保 ElasticSearch 的进程可以锁定内存，例如：使用 `ulimit -l unlimited`

# ElasticSearch, by default, binds itself to the 0.0.0.0 address, and listens

# on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node

# communication. (the range means that if the port is busy, it will automatically

# try the next port).

# 默认的 ElasticSearch 把自己和 0.0.0.0 地址绑定，HTTP 传输的监听端口在[9200-9300]，节点之间

# 通信的端口在[9300-9400]。(范围的意思是说如果一个端口已经被占用，它将会自动尝试下一个端口)

# Set the bind address specifically (IPv4 or IPv6):

# 设置一个特定的绑定地址(IPv4 or IPv6):

# network.bind_host: 192.168.0.1

# Set the address other nodes will use to communicate with this node. If not

# set, it is automatically derived. It must point to an actual IP address.

# 设置其他节点用来与这个节点通信的地址。如果没有设定，会自动获取。

# 必须是一个真实的 IP 地址。

# network.publish_host: 192.168.0.1

# Set both ‘bind_host’ and ‘publish_host’:

# ‘bind_host’ 和 ’publish_host’ 都设置

# network.host: 192.168.0.1

# Set a custom port for the node to node communication (9300 by default):

# 为节点之间的通信设置一个自定义端口(默认为 9300)

# transport.tcp.port: 9300

# Enable compression for all communication between nodes (disabled by default):

# 为所有的节点间的通信启用压缩(默认为禁用)

# transport.tcp.compress: true

# Set a custom port to listen for HTTP traffic:

# 设置一个监听 HTTP 传输的自定义端口

# http.port: 9200

# Set a custom allowed content length:

# 设置一个自定义的允许的内容长度

# http.max_content_length: 100mb

# Disable HTTP completely:

# 完全禁用 HTTP

# http.enabled: false

3、操作系统配置

1. 文件描述符

vim /etc/security/limits.conf 添加
* soft nofile 655350
* hard nofile 655350

退出当前用户重新 login 就会生效, 使用 ulimit - n 验证下。

2. 最大内存映射区数量, 禁用 swap 交换分区

vim /etc/sysctl.conf 增加
vm.max_map_count=262144
vm.swappiness=1

修改完成后 sysctl -p

jvm 参数配置

ES_HOME 的 bin 目录下有一个 elasticsearch.in.sh 文件, 修改

ES_MIN_MEM=256m
ES_MAX_MEM=1g

为合适的值

4、es 的插件安装：

Marvel 是 Elasticsearch 的管理和监控工具，对于开发使用免费的。它配备了一个叫做 Sense 的交互式控制台，方便通过浏览器直接与 Elasticsearch 交互。

Marvel 是一个插件，在 Elasticsearch 目录中运行以下代码来下载和安装：

./plugin -i elasticsearch/marvel/latest

elasticsearch-head 是一个 elasticsearch 的集群管理工具，它是完全由 html5 编写的独立网页程序，你可以通过插件把它集成到 es。

#./plugin -install mobz/elasticsearch-head

地址：http://172.16.2.24:25556/_plugin/head/

elasticsearch 插件 bigdesk 安装：

bigdesk 是 elasticsearch 的一个集群监控工具，可以通过它来查看 es 集群的各种状态，如：cpu、内存使用情况，索引数据、搜索情况，http 连接数等。

在 cmd 命令行中进入安装目录，再进入 bin 目录，运行以下命令：

#./plugin -install lukas-vlcek/bigdesk

在浏览器中输入:http://172.16.2.24:25556/_plugin/bigdesk 可以看到效果

注意：elasticsearch 分词 ik 的安装，如果不安装分词 ik 插件，根本建不了索引, 并且让访问 http://172.16.2.24:25556/_plugin/head/ 集群一片空白，点击 web 创建索引页没有反应。

注意：github https://github.com/medcl/elasticsearch-analysis-ik 给出了对应的 es 的 ik 版本，1.7.0 的 es 对应的 1.2.6 的版本，开始我这块装了 1.8 的 ik，创建索引失败，后台也是报 ik 的错误。

ik：1.2.6 版本的下载：https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v1.6.1

安装操作：

下载 zip 包解压到一个目录解压缩:

#unzip elasticsearch-analysis-ik-master.zip

安装 mavne 环境,apache 官网下载软件包设置环境变量：

#export PATH=$PATH:/usr/local/maven/bin

因为是源代码，此处需要使用 maven 打包，进入解压文件夹中，执行命令：

#cd elasticsearch-analysis-ik-master
#mvn clean package

# 在 es 的 plugin 目录下创建 ik 目录，并将 target 目录下的 elasticsearch-analysis-ik-1.2.6.jar copy 到 ik 目录下。

[root@localhost target]# cd /data/elasticsearch-1.7.0
[root@localhost elasticsearch-1.7.0]# ls
bin config data lib LICENSE.txt logs NOTICE.txt plugins README.textile
[root@localhost elasticsearch-1.7.0]# cd plugins/
[root@localhost plugins]# ls
bigdesk head ik marvel
[root@localhost plugins]# cd ik/
[root@localhost ik]# ls
elasticsearch-analysis-ik-1.2.6.jar

注意：如果是集群，可以将 jar 分别 copy 至其他几台机器。

es 配置文件需要添加入下行：

index:
analysis:
analyzer:
ik:
alias: [ik_analyzer]
type: org.elasticsearch.index.analysis.IkAnalyzerProvider
ik_max_word:
type: ik
use_smart: false
ik_smart:
type: ik
use_smart: true
marvel.agent.enabled: false

完整的 es 配置文件如下，三台同样的配置，除了 hostip 和 node.name 外.

# cat elasticsearch.yml
cluster.name: test-es-cluster
network.host: 172.16.2.24
node.name: “node24”
discovery.zen.ping.unicast.hosts: [“172.16.2.24:25555″,”172.16.2.21:25555″,”172.16.2.23:25555”]
index.number_of_shards: 5
discovery.zen.minimum_master_nodes: 2
script.groovy.sandbox.enabled: false
transport.tcp.port: 25555
http.port: 25556
script.inline: off
script.indexed: off
script.file: off
index:
analysis:
analyzer:
ik:
alias: [ik_analyzer]
type: org.elasticsearch.index.analysis.IkAnalyzerProvider
ik_max_word:
type: ik
use_smart: false
ik_smart:
type: ik
use_smart: true
marvel.agent.enabled: false