共计 7473 个字符,预计需要花费 19 分钟才能阅读完成。
Elasticsearch 是一个全文搜索引擎。安装 Elasticsearch 时需要先安装 Java。
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.2.zip
unzip elasticsearch-1.4.2.zip
cd elasticsearch-1.4.2
./bin/elasticsearch
安装 Marvel
Marvel 是一个管理和监控 Elasticsearch 的工具。它提供一个叫 Sense 的交互式接口方便通过浏览器与 Elasticsearch 交互。
bin/plugin -i elasticsearch/marvel/latest
如果不想使用 Marvel 监控本地集群,可以使用如下方式关闭 Marvel 监控
echo ‘marvel.agent.enabled: false’ >> ./config/elasticsearch.yml 可以通过前台的方式启动 Elasticsearch
bin]$ sudo ./elasticsearch
使用 - d 参数可以将 Elasticsearch 放到后台运行
bin]$ sudo ./elasticsearch -d
查看 Elasticsearch 中的数据
$ curl “http://localhost:9200/?pretty”
{
“status” : 200,
“name” : “xxx”,
“cluster_name” : “elasticsearch”,
“version” : {
“number” : “1.4.2”,
“build_hash” : “927caff6f05403e936c20bf4529f144f0c89fd8c”,
“build_timestamp” : “2014-12-16T14:11:12Z”,
“build_snapshot” : false,
“lucene_version” : “4.10.2”
},
“tagline” : “You Know, for Search”
}
通过 config/elasticsearch.yml 设置 cluster.name 和 node.name
可以通过以下方式关闭 Elasticsearch
curl -XPOST ‘http://localhost:9200/_shutdown’
Talking to Elasticsearch
根据是否使用 Java 语言,与 Elasticsearch 交互有几种方法,如果是 Java API 参见文档
http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/index.html
如果使用其他语言,则使用 Elasticsearch 提供的 RESTFUL API,或者可以直接使用 linux 命令 curl 访问
curl -X<VERB> ‘<PROTOCOL>://<HOST>:<PORT>/?<QUERY_STRING>’ -d ‘<BODY>’
VER HTTP 请求方式,GET,POST,PUT,HEAD 或 DELETE
PROTOCOL 使用 HTTP 或者 HTTPS
HOST Elasticsearch 集群中的任意一个 node 的主机名,如果是在 node 本机就直接使用 localhost
PORT Elasticsearch 运行 HTTP 服务的端口,默认是 9200
QUERY_STRING 查询参数
BODY JSON 格式的请求数据
$ curl -XGET ‘http://localhost:9200/_count?pretty’ -d ‘
{
“query”: {
“match_all”: {}
}
}
‘
{
“count” : 22692,
“_shards” : {
“total” : 3,
“successful” : 3,
“failed” : 0
}
}
$ curl -i -XGET ‘localhost:9200/’
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 334
{
“status” : 200,
“name” : “jidong”,
“cluster_name” : “elasticsearch”,
“version” : {
“number” : “1.4.2”,
“build_hash” : “927caff6f05403e936c20bf4529f144f0c89fd8c”,
“build_timestamp” : “2014-12-16T14:11:12Z”,
“build_snapshot” : false,
“lucene_version” : “4.10.2”
},
“tagline” : “You Know, for Search”
}
Relational DB Databases Tables Rows Columns
Elasticsearch Indices Types Documents Fields
通过 Marvel 的 Sense 接口访问 Elasticsearch
http://xxxx.com:9200/_plugin/marvel/sense/index.html
以下直接使用 GET 或 PUT 的简略形式,都是直接使用 Marvel 的 Sense 接口访问 Elasticsearch,可以点击“Copy as cURL”查看对应的 curl 命令写法
PUT /megacorp/employee/1
{
"first_name":"John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests":["sports","music"]
}
/megacorp/employee/1
这个路径包含三个信息
megacorp 索引名称,类似关系型数据库的数据库名称
employee 类型名称,类似关系型数据库的表名称
1 特定 employee 的 ID
PUT /megacorp/employee/2
{
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": ["music"]
}
PUT /megacorp/employee/3
{
"first_name": "Douglas",
"last_name": "Fir",
"age": 35,
"about": "I like to build cabinets",
"interests": ["forestry"]
}
在 Sense 中输入
GET /megacorp/employee/1
显示结果
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_version":1,
"found": true,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": ["sports","music"]
}
}
GET /megacorp/employee/_search
{
"took": 6,
"timed_out":false,
"_shards":{...},
"hits":{
"total": 3,
"max_score": 1,
"hits":[
{
"_index": "megacorp",
"_type": "employee",
"_id": "3",
"_score": 1,
"_source":{
"first_name": "Douglas",
"last_name": "Fir",
"age": 35,
"about": "I like to build cabinets",
"interests":["forestry"]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 1,
"_source":{
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests":["sports","music"]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 1,
"_source":{
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests":["music"]
}
}
]
}
}
GET /megacorp/employee/_search
?q=last_name:Smith
{
...
"hits":{
"total": 2,
"max_score": 0.30685282,
"hits":[
{
...
"_source":{
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests":["sports","music"]
}
},
{
...
"_source":{
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests":["music"]
}
}
]
}
}
Elasticsearch 提供了一个丰富的, 灵活的查询语言, 叫做 DSL.Domain-specific language(DSL) 使用特定的 JSON 请求。
GET /megacorp/employee/_search
{
"query":{
"match":{
"last_name":"Smith"
}
}
}
这里没有使用查询参数,使用 match 匹配查询条件。输出结果和上个例子相同。
查找所有 last name 为 Smith,年龄大于 30 的员工
[object Object][object Object]
{
...
"hits":{
"total": 1,
"max_score": 0.30685282,
"hits":[
{
...
"_source":{
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests":["music"]
}
}
]
}
}
Full-text search 全文搜索
搜索所有喜欢 rock climbing 的员工
在 Sense 中输入
GET /megacorp/employee/_search
{
"query":{
"match":{
"about":"rock climbing"
}
}
}
查看查询结果
[object Object][object Object]
默认情况下,Elasticsearch 根据匹配结果的 relevance score 进行排序,表示匹配程度。可以看到第二个匹配结果只包含 rock 也被显示出来。
如果想要完全匹配查询条件,可以使用短语搜索 phrase search
使用 match_phrase 进行条件匹配
GET /megacorp/employee/_search
{
"query":{
"match_phrase":{
"about":"rock climbing"
}
}
}
{
...
"hits":{
"total": 1,
"max_score": 0.23013961,
"hits":[
{
...
"_score": 0.23013961,
"_source":{
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests":["sports","music"]
}
}
]
}
}
现在就只有一条搜索结果
Highlight our searches 高亮显示查询结果
GET /megacorp/employee/_search
{
"query":{
"match_phrase":{
"about":"rock climbing"
}
},
"highlight":{
"fields":{
"about":{}
}
}
}
{
...
"hits":{
"total": 1,
"max_score": 0.23013961,
"hits":[
{
...
"_score": 0.23013961,
"_source":{
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests":["sports","music"]
},
"highlight":{
"about":[
"I love to go <em>rock</em> <em>climbing</em>"
]
}
}
]
}
}
使用 Elasticsearch 的聚合函数可以对数据进行复杂的分析。类似 SQL 语言的 GROUP BY 语句。
GET /megacorp/employee/_search
{
"aggs":{
"all_interests":{
"terms":{"field":"interests"}
}
}
}
{
...
"hits":{...},
"aggregations":{
"all_interests":{
"buckets":[
{
"key": "music",
"doc_count":2
},
{
"key": "forestry",
"doc_count":1
},
{
"key": "sports",
"doc_count":1
}
]
}
}
}
GET /megacorp/employee/_search
{
"query":{
"match":{
"last_name":"smith"
}
},
"aggs":{
"all_interests":{
"terms":{
"field":"interests"
}
}
}
}
...
"all_interests":{
"buckets":[
{
"key":"music",
"doc_count":2
},
{
"key":"sports",
"doc_count":1
}
]
}
Elasticsearch 可以横向扩展到几百台服务器,处理 PB 以上的数据。
分布式搜索 ElasticSearch 单机与服务器环境搭建 http://www.linuxidc.com/Linux/2012-05/60787.htm
ElasticSearch 的工作机制 http://www.linuxidc.com/Linux/2014-11/109922.htm
ElasticSearch 的详细介绍 :请点这里
ElasticSearch 的下载地址 :请点这里