编程

ElasticSearch 7.X 基本语法

勤劳的小蜜蜂 · 11月17日 · 2019年 ·

索引创建

非结构化创建索引

创建索引的时候,不定义 mapping 的格式

#  非结构化方式创建索引
  #  设置 settings 属性
PUT /employee
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  }
}

PUT /employee/_doc/1
{
  "name": "夹克",
  "age": 30
}

PUT 操作对应的 id 不存在,则 create 数据;

如存在,则更新数据。

PUT 是全量更新,需要将所有的字段(要修改的和不要修改的)都传过来,否则未列出字段将丢失。

执行结果:

ES 7 之后 _type 属性被废弃,使用 _doc 占位符

如下图可以看到,ES 会根据插入的值,自动推断出表结构

指定操作为创建,如果已存在,则失败,而不是更新数据。

#  强制指定创建,若已存在,则失败
POST /employee/_create/1
{
  "name": "123",
  "age": 30
}

执行结果:

结构化创建索引

#  使用结构化的方式创建索引
PUT /employee
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "age": { "type": "integer" }
    }
  }
}

执行结果:

employee 的 mapping 结构就会如我们所有设计的

如果插入的值没有与现有的索引结构冲突,ES 则为索引推断出新的字段,否则抛出 number_format_exception

索引更新

指定字段修改

#  指定字段修改
POST /employee/_update/1
{
  "doc": {
    "name": "夹克1"
  }
}

执行结果:

索引删除

#  删除索引
delete employee

#  删除某个文档
DElETE /employee/_doc/1

索引简单查询

查询某条文档

#  查询某条文档
GET /employee/_doc/1

执行结果:

查询全部文档

#  查询全部文档
GET /employee/_search

#  不带条件查询所有记录
GET /employee/_search
{
  "query": {
    "match_all": {}
  }
}

执行结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "肉丝",
          "age" : 30
        }
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "夹克",
          "age" : 30
        }
      }
    ]
  }
}

分页查询

#  分页查询
GET /employee/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0,
  "size": 1
}

form 从某页开始(第一页为 0),size 每页的记录数

默认以 id 为倒叙进行排序

执行结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "肉丝",
          "age" : 30
        }
      }
    ]
  }
}

索引复杂查询

带关键字条件的查询

#  带关键字条件的查询
GET /employee/_search
{
  "query": {
    "match": {
      "name": "肉丝"
    }
  }
}

match:默认分词

term:不会分词

执行结果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.3862944,
    "hits" : [
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.3862944,
        "_source" : {
          "name" : "肉丝",
          "age" : 30
        }
      }
    ]
  }
}

ES 默认分词器会将每个中文作为分词,所以只要包含 “肉” 和 “丝” 都会命中 “肉丝” 记录。

带排序

#  带排序
GET /employee/_search
{
  "query": {
    "match": { "name": "夹" }
  },
  "sort": [
    {"age": { "order": "desc" }}
  ]
}

执行结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "夹子",
          "age" : 31
        },
        "sort" : [
          31
        ]
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "夹克",
          "age" : 30
        },
        "sort" : [
          30
        ]
      }
    ]
  }
}

注意:_score 字段变为 null,因为我们使用了 sort 关键字进行定制化排序。

带 filter

#  带 filter
GET /employee/_search
{
  "query": {
    "bool": {
      "filter": [
        { "term": {"age": 30}}
      ]
    }
  }
}

filter:不打分

term:不进行分词的分析,直接去索引内查询

match:按照字段上的定义的分词分析后去索引内查询

执行结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "name" : "夹克",
          "age" : 30
        }
      }
    ]
  }
}

带聚合

#  带聚合
GET /employee/_search
{
  "query": {
    "match": {
      "name": "夹"
    }
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ],
  "aggs": {
    "goroup_by_age": {
      "terms": {
        "field": "age"
      }
    }
  }
}

goroup_by_age:为自定义字段

执行结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "夹子",
          "age" : 31
        },
        "sort" : [
          31
        ]
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "夹克",
          "age" : 30
        },
        "sort" : [
          30
        ]
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "夹克2",
          "age" : 30
        },
        "sort" : [
          30
        ]
      }
    ]
  },
  "aggregations" : {
    "goroup_by_age" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 30,
          "doc_count" : 2
        },
        {
          "key" : 31,
          "doc_count" : 1
        }
      ]
    }
  }
}

聚合操作多了 aggregations 字段

高级查询语法

我们先来看一个问题,如下的索引为什么搜索 eat 不能命中?

#  新建一个索引
PUT /movie/_doc/1
{
  "name": "Eating an apple a day & keeps the doctor away"
}

GET /movie/_search
{
  "query": {
    "match": {
      "name": "eat"
    }
  }
}

执行结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

hits 为空,结果未命中。

首先我们得了解 ES 的运行机制,如下图,我们可以看到 ES 实际上是根据索引的分词去命中结果。

这就说明来分词中没有 eat。我们也可以利用 analyze api 来查询分词的情况。

#  使用 analyze api 查看分词状态
GET /movie/_analyze
{
  "field": "name",
  "text": "Eating an apple a day & keeps the doctor away"
}

执行结果:

{
  "tokens" : [
    {
      "token" : "eating",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "an",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "apple",
      "start_offset" : 10,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "a",
      "start_offset" : 16,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "day",
      "start_offset" : 18,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "keeps",
      "start_offset" : 24,
      "end_offset" : 29,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "the",
      "start_offset" : 30,
      "end_offset" : 33,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "doctor",
      "start_offset" : 34,
      "end_offset" : 40,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "away",
      "start_offset" : 41,
      "end_offset" : 45,
      "type" : "<ALPHANUM>",
      "position" : 8
    }
  ]
}

果然如此,因此,当 ES 的自动分词系统不能完全满足我们的需求,那么我们必须得自己设置分词条件。


其实对于英文而言,ES 也有默认实现的分词器 english,创建索引时候字段指定分词器

PUT /movie
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "name": { "type": "text", "analyzer": "english"}
    }
  }
}
#  使用 analyze api 查看分词状态
GET /movie/_analyze
{
  "field": "name",
  "text": "Eating an apple a day & keeps the doctor away"
}

执行结果:

{
  "tokens" : [
    {
      "token" : "eat",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "appl",
      "start_offset" : 10,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "dai",
      "start_offset" : 18,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "keep",
      "start_offset" : 24,
      "end_offset" : 29,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "doctor",
      "start_offset" : 34,
      "end_offset" : 40,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "awai",
      "start_offset" : 41,
      "end_offset" : 45,
      "type" : "<ALPHANUM>",
      "position" : 8
    }
  ]
}

现在查询 eat,就会发现可以成功命中了。

GET /movie/_search
{
  "query": {
    "match": {
      "name": "eat"
    }
  }
}

//  执行结果
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "movie",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "Eating an apple a day & keeps the doctor away"
        }
      }
    ]
  }
}

自定义 Analyze

analyze = 分词的过程

分词步骤:

  1. 字符过滤器(上面的例子中 “&” 没有被作为分词处理)
  2. 字符处理(默认标准字符处理:以空格和标点符号做分割)
  3. 分词过滤(变小写)

未完待续。。。

分词 and、or

#  match 查询分词默认是 or
GET /movie/_search
{
  "query": {
    "match": {
      "title": "basketball with cartoom aliens"
    }
  }
}

//  添加关键词 operator: "and",分词之间关系是 and 关系,精确匹配
GET /movie/_search
{
  "query": {
    "match": {
      "title": {
        "query": "basketball with cartoom aliens",
        "operator": "and"
      }
    }
  }
}

最小词匹配项

“minimum_should_match”: 2

表示最少命中两个分词

#  最小词匹配项
GET /movie/_search
{
  "query": {
    "match": {
      "title": {
        "query": "basketball with cartoom aliens",
        "operator": "or",
        "minimum_should_match": 2
      }
    }
  }
}

短语查询

#  短语查询
GET /movie/_search
{
  "query": {
    "match_phrase": {
      "title": "steve zissou"
    }
  }
}

多字段查询

# 多字段查询
GET /movie/_search
{
  "query": {
    "multi_match": {
      "query": "basketball with cartoom aliens",
      "fields": ["title","overview"]
    }
  }
}
0 条回应