如果写过Elasticsearch的聚合操作DSL,都知道它非常的繁琐,很简单的业务就导致异常复杂的json。因为它的聚合操作是嵌套的,一个聚合的输出可以是另一个聚合的输入,并且聚合还支持pipeline,能引用父亲或者兄弟节点的聚合,所以导致其结构非常难以理解。本文将根据一个实际的例子来逐步的构建一个Elasticsearch的聚合DSL语句,来方便大家理解ES的聚合操作。
假设在index test-order中存储了用户的订单信息,name表示用户姓名,price表示订单价格。GET test-order/_search?filter_path=hits.hits._source
数据如下
{
"hits": {
"hits": [
{
"_source": {
"name": "Jack",
"price": 80
}
},
{
"_source": {
"name": "Ross",
"price": 70
}
},
{
"_source": {
"name": "Susan",
"price": 50
}
},
{
"_source": {
"name": "Ross",
"price": 40
}
},
{
"_source": {
"name": "Tom",
"price": 65
}
},
{
"_source": {
"name": "Tom",
"price": 85
}
}
]
}
}
现在有如下需求,首先规定消费总金额在100以上的用户为VIP,然后要计算系统中VIP的数量。这要是在传统的关系数据库中,是非常简单的,首先group by name,计算sum(price),然后用having语句过滤VIP,最后再count临时表,得到 VIP的数量。SQL语句如下
select count(*) from (
select sum(price), name from test-order
group by name
having sum(price) > 100
) as VIP
在ES中,我们也需要按照同样的顺序构建DSL。
group
首先按照name来分组,用Terms Aggregation来充当group。
GET test-order/_search
{
"size": 0,
"aggs": {
"userNames": {
"terms": {
"field": "name"
}
}
}
}
sum
将name分组后的结果作为输入,对price字段进行sum。可以看到下面是将一个sum聚合嵌套在了term聚合中。
GET test-order/_search
{
"size": 0,
"aggs": {
"userNames": {
"terms": {
"field": "name"
},
"aggs": {
"paymentSum": {
"sum": {
"field": "price"
}
}
}
}
}
}
看到paymentSum这个聚合是在此时已经完成了sum(price) group by name
。
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0,
"hits": []
},
"aggregations": {
"userNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Ross",
"doc_count": 2,
"paymentSum": {
"value": 110
}
},
{
"key": "Tom",
"doc_count": 2,
"paymentSum": {
"value": 150
}
},
{
"key": "Jack",
"doc_count": 1,
"paymentSum": {
"value": 80
}
},
{
"key": "Susan",
"doc_count": 1,
"paymentSum": {
"value": 50
}
}
]
}
}
}
having
下面要用bucket_selector来完成SQL中的having部分。
GET test-order/_search
{
"size": 0,
"aggs": {
"userNames": {
"terms": {
"field": "name"
},
"aggs": {
"paymentSum": {
"sum": {
"field": "price"
}
},
"sumFilter": {
"bucket_selector": {
"buckets_path": {
"userPaymentSum": "paymentSum"
},
"script": "params.userPaymentSum > 100"
}
}
}
}
}
}
上面sumFilter是一个bucket_selector,这是一个parent类型的pipeline,用来过滤上层聚合的结果。sumFilter中引用了paymentSum,用sum的结果进行过滤。可以看出,现在已经查出了VIP。
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0,
"hits": []
},
"aggregations": {
"userNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Ross",
"doc_count": 2,
"paymentSum": {
"value": 110
}
},
{
"key": "Tom",
"doc_count": 2,
"paymentSum": {
"value": 150
}
}
]
}
}
}
模拟子查询
上面已经查询出了VIP,需要进行一下count。这里利用stats_bucket来统计VIP的数量。下面的vip_count是一个stats_bucket,这是一个sibling类型的pipeline,用来统计其他聚合操作的数据。
GET test-order/_search
{
"size": 0,
"aggs": {
"userNames": {
"terms": {
"field": "name"
},
"aggs": {
"paymentSum": {
"sum": {
"field": "price"
}
},
"sumFilter": {
"bucket_selector": {
"buckets_path": {
"userPaymentSum": "paymentSum"
},
"script": "params.userPaymentSum > 100"
}
}
}
},
"vip_count": {
"stats_bucket": {
"buckets_path": "userNames>paymentSum"
}
}
}
}
最终结果中的vip_count中,count就是系统中VIP的数量。
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0,
"hits": []
},
"aggregations": {
"userNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Ross",
"doc_count": 2,
"paymentSum": {
"value": 110
}
},
{
"key": "Tom",
"doc_count": 2,
"paymentSum": {
"value": 150
}
}
]
},
"vip_count": {
"count": 2,
"min": 110,
"max": 150,
"avg": 130,
"sum": 260
}
}
}
结语
从
select count(*) from (
select sum(price), name from test-order
group by name
having sum(price) > 100
) as VIP
变成
GET test-order/_search
{
"size": 0,
"aggs": {
"userNames": {
"terms": {
"field": "name"
},
"aggs": {
"paymentSum": {
"sum": {
"field": "price"
}
},
"sumFilter": {
"bucket_selector": {
"buckets_path": {
"userPaymentSum": "paymentSum"
},
"script": "params.userPaymentSum > 100"
}
}
}
},
"vip_count": {
"stats_bucket": {
"buckets_path": "userNames>paymentSum"
}
}
}
}
代码行数膨胀为6倍,这就是ES DSL的威力。。。
共同学习,写下你的评论
评论加载中...
作者其他优质文章