Mr Shang

我是一个勤劳的码农,每天在技术的牧场辛勤劳作


导航栏
 » 首页
 » 分类浏览
 » 关于我
 » 我的Github
 » RSS订阅
Group 59

ES系列之六探索你的数据(一)

29 Dec 2016 » java, elasticsearch

原文地址 https://www.elastic.co/guide/en/elasticsearch/reference/current/_exploring_your_data.html

样例数据集

如果你是从开始看到现在,那么对Elasticsearch应该有了最基础的了解。如果没有,建议从头读一遍。接下来工作将在接近实际环境的数据集上展开。这里准备了一份用户银行账号信息的虚构JSON文档。每个文档都有如下结构:

{
    "account_number": 0,
    "balance": 16623,
    "firstname": "Bradshaw",
    "lastname": "Mckenzie",
    "age": 29,
    "gender": "F",
    "address": "244 Columbus Place",
    "employer": "Euron",
    "email": "bradshawmckenzie@euron.com",
    "city": "Hobucken",
    "state": "CO"
}

你也许会感到奇怪,这些值和语法看起来很怪异,实际上它们是在www.json-generator.com上生成的,这些全都是随机值。

载入样例数据集

你可以从这里下载样例数据集。把它当到当前目录,然后按照下面的命令把它加载到簇中:

curl -XPOST 'localhost:9200/bank/account/_bulk?pretty&refresh' --data-binary "@accounts.json"
curl 'localhost:9200/_cat/indices?v'

结果如下:

health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customer aHuZLSakQNGcFgTMiyIXaA   5   1          2            0      7.4kb          7.4kb
yellow open   bank     4aK4OrQOSWexaj54oCE4ag   5   1       1000            0    640.2kb        640.2kb

这意味着我们刚刚将批量的为1000个文档放倒bank索引中,类型为account。

检索API

接下来我们开始一些简单的检索。有很多基本的方式去进行检索:一种是通过REST请求URI进行参数传递,另一种是通过REST请求消息体。消息体的方式更具有表现力更强,可以用可读性更好的JSON格式定义检索条件。URI传参只做一个演示,以后的讲解仅使用请求消息体的方式。

检索API可以通过_search路径进行访问。下面的例子会返回bank索引中所有的文档:

curl -XGET 'localhost:9200/bank/_search?q=*&sort=account_number:asc&pretty&pretty'

我们先来分析下检索所有。我们检索bank索引(_search路径),q=*参数告诉Elasticsearch去匹配索引中所有的文档。sort=account_number:asc参数控制检索结果使用account_number属性进行升序排列。pretty参数告诉Elasticsearch以格式化好的JSON形式返回结果。

响应如下:

{
  "took" : 88,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "0",
        "_score" : null,
        "_source" : {
          "account_number" : 0,
          "balance" : 16623,
          "firstname" : "Bradshaw",
          "lastname" : "Mckenzie",
          "age" : 29,
          "gender" : "F",
          "address" : "244 Columbus Place",
          "employer" : "Euron",
          "email" : "bradshawmckenzie@euron.com",
          "city" : "Hobucken",
          "state" : "CO"
        },
        "sort" : [
          0
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "account_number" : 1,
          "balance" : 39225,
          "firstname" : "Amber",
          "lastname" : "Duke",
          "age" : 32,
          "gender" : "M",
          "address" : "880 Holmes Lane",
          "employer" : "Pyrami",
          "email" : "amberduke@pyrami.com",
          "city" : "Brogan",
          "state" : "IL"
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "account_number" : 2,
          "balance" : 28838,
          "firstname" : "Roberta",
          "lastname" : "Bender",
          "age" : 22,
          "gender" : "F",
          "address" : "560 Kingsway Place",
          "employer" : "Chillium",
          "email" : "robertabender@chillium.com",
          "city" : "Bennett",
          "state" : "LA"
        },
        "sort" : [
          2
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "account_number" : 3,
          "balance" : 44947,
          "firstname" : "Levine",
          "lastname" : "Burks",
          "age" : 26,
          "gender" : "F",
          "address" : "328 Wilson Avenue",
          "employer" : "Amtap",
          "email" : "levineburks@amtap.com",
          "city" : "Cochranville",
          "state" : "HI"
        },
        "sort" : [
          3
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "account_number" : 4,
          "balance" : 27658,
          "firstname" : "Rodriquez",
          "lastname" : "Flores",
          "age" : 31,
          "gender" : "F",
          "address" : "986 Wyckoff Avenue",
          "employer" : "Tourmania",
          "email" : "rodriquezflores@tourmania.com",
          "city" : "Eastvale",
          "state" : "HI"
        },
        "sort" : [
          4
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "account_number" : 5,
          "balance" : 29342,
          "firstname" : "Leola",
          "lastname" : "Stewart",
          "age" : 30,
          "gender" : "F",
          "address" : "311 Elm Place",
          "employer" : "Diginetic",
          "email" : "leolastewart@diginetic.com",
          "city" : "Fairview",
          "state" : "NJ"
        },
        "sort" : [
          5
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "6",
        "_score" : null,
        "_source" : {
          "account_number" : 6,
          "balance" : 5686,
          "firstname" : "Hattie",
          "lastname" : "Bond",
          "age" : 36,
          "gender" : "M",
          "address" : "671 Bristol Street",
          "employer" : "Netagy",
          "email" : "hattiebond@netagy.com",
          "city" : "Dante",
          "state" : "TN"
        },
        "sort" : [
          6
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "7",
        "_score" : null,
        "_source" : {
          "account_number" : 7,
          "balance" : 39121,
          "firstname" : "Levy",
          "lastname" : "Richard",
          "age" : 22,
          "gender" : "M",
          "address" : "820 Logan Street",
          "employer" : "Teraprene",
          "email" : "levyrichard@teraprene.com",
          "city" : "Shrewsbury",
          "state" : "MO"
        },
        "sort" : [
          7
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "8",
        "_score" : null,
        "_source" : {
          "account_number" : 8,
          "balance" : 48868,
          "firstname" : "Jan",
          "lastname" : "Burns",
          "age" : 35,
          "gender" : "M",
          "address" : "699 Visitation Place",
          "employer" : "Glasstep",
          "email" : "janburns@glasstep.com",
          "city" : "Wakulla",
          "state" : "AZ"
        },
        "sort" : [
          8
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "9",
        "_score" : null,
        "_source" : {
          "account_number" : 9,
          "balance" : 24776,
          "firstname" : "Opal",
          "lastname" : "Meadows",
          "age" : 39,
          "gender" : "M",
          "address" : "963 Neptune Avenue",
          "employer" : "Cedward",
          "email" : "opalmeadows@cedward.com",
          "city" : "Olney",
          "state" : "OH"
        },
        "sort" : [
          9
        ]
      }
    ]
  }
}

结果主要有以下几个部分:

  • took - Elasticsearch执行检索花费的时间(毫秒)
  • timed_out - 表示检索是否超时
  • _shards - 表示检索了多少瓦片,同时包含成功/失败的次数
  • hits - 检索的结果
  • hits.total - 符合检索条件的文档总数
  • hits.hits - 实际检索结果的数组(默认显示前10条)
  • sort - 结果排序关键字(不写就会使用score)
  • _scoremax_score - 暂时先不深究

下面是使用请求体传参与上面检索条件完全相同的命令:

curl -XGET 'localhost:9200/bank/_search?pretty' -d'
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}'

主要的不同之处是不再URI中使用q=*,而是POST一个JSON形式的请求体。下一节会讨论JSON检索的使用方式。

一旦获取到检索结果,Elasticsearch就完成了请求的过程,不会在服务端维持任何资源或者在结果中有任何的开始标记。这与许多其他诸如SQL等平台有显著的不同,那些平台如果开始获取预先查询结果的一个子集,就必须继续向服务器发请求,通过某种服务端的状态标注来获取剩余的结果。

© Mr Shang - niceshang[#]outlook.com - Powered by Jekyll.