首页猿问检查数组中的每个元素是否都符合条件

检查数组中的每个元素是否都符合条件

MongoDB

噜噜哒 2019-11-19 10:36:50

我有一些文件：date: Dateusers: [ { user: 1, group: 1 } { user: 5, group: 2 }]date: Dateusers: [ { user: 1, group: 1 } { user: 3, group: 2 }]我想查询该集合以查找所有文档，其中我的用户数组中的每个用户ID都位于另一个数组[1、5、7]中。在此示例中，仅第一个文档匹配。我一直能找到的最佳解决方案是：$where: function() { var ids = [1, 5, 7]; return this.users.every(function(u) { return ids.indexOf(u.user) !== -1; });}不幸的是，这似乎损害了性能，在$ where文档中指出：$ where评估JavaScript，无法利用索引。如何改善此查询？

查看完整描述

3 回答

阿波罗的战车

TA贡献1862条经验获得超6个赞

您想要的查询是这样的：

db.collection.find({"users":{"$not":{"$elemMatch":{"user":{$nin:[1,5,7]}}}}})

这就是说，找到所有没有元素不在列表1,5,7之外的文档。

反对回复 2019-11-19

森栏

TA贡献1810条经验获得超5个赞

我不知道更好，但是有几种不同的方法可以解决此问题，具体取决于您可用的MongoDB版本。

不太确定这是否符合您的意图，但是所示查询将与第一个文档示例匹配，因为在实现逻辑时，您正在匹配文档数组中必须包含在样本数组中的元素。

因此，如果您实际上希望文档包含所有这些元素，那么$all操作员将是显而易见的选择：

db.collection.find({ "users.user": { "$all": [ 1, 5, 7 ] } })

但是，在假设您的逻辑实际上是预期的前提下，至少根据建议，您可以通过与$in运算符组合来“过滤”这些结果，从而减少需要您处理的文档$where**条件在评估的JavaScript中：

db.collection.find({

"users.user": { "$in": [ 1, 5, 7 ] },

"$where": function() {

var ids = [1, 5, 7];

return this.users.every(function(u) {

return ids.indexOf(u.user) !== -1;

});

}

})

尽管实际扫描的结果将与匹配文档中数组中元素的数量相乘，但您会得到一个索引，但是与没有附加过滤器相比，它仍然更好。

甚至可能是你考虑的逻辑抽象$and结合使用，运营商$or也可能是$size根据您的实际情况数组操作：

db.collection.find({

"$or": [

{ "users.user": { "$all": [ 1, 5, 7 ] } },

{ "users.user": { "$all": [ 1, 5 ] } },

{ "users.user": { "$all": [ 1, 7 ] } },

{ "users": { "$size": 1 }, "users.user": 1 },

{ "users": { "$size": 1 }, "users.user": 5 },

{ "users": { "$size": 1 }, "users.user": 7 }

]

})

因此，这是匹配条件所有可能排列的产物，但是性能可能会根据可用的安装版本而有所不同。

注意：实际上，在这种情况下完全失败，因为这样做完全不同，并且实际上导致逻辑上的失败。$in

备选方案是使用聚合框架，这取决于收集中的文档数量，MongoDB 2.6及更高版本的一种方法可能会影响哪种效率最高。

db.problem.aggregate([

// Match documents that "could" meet the conditions

{ "$match": {

"users.user": { "$in": [ 1, 5, 7 ] }

}},

// Keep your original document and a copy of the array

{ "$project": {

"_id": {

"_id": "$_id",

"date": "$date",

"users": "$users"

"users": 1,

}},

// Unwind the array copy

{ "$unwind": "$users" },

// Just keeping the "user" element value

{ "$group": {

"_id": "$_id",

"users": { "$push": "$users.user" }

}},

// Compare to see if all elements are a member of the desired match

{ "$project": {

"match": { "$setEquals": [

{ "$setIntersection": [ "$users", [ 1, 5, 7 ] ] },

"$users"

]}

}},

// Filter out any documents that did not match

{ "$match": { "match": true } },

// Return the original document form

{ "$project": {

"_id": "$_id._id",

"date": "$_id.date",

"users": "$_id.users"

}}

])

因此，该方法使用一些新引入的集合运算符来比较内容，但是当然您需要重组数组才能进行比较。

如所指出的，有一个直接的运算符可以做到这一点，$setIsSubset其中在单个运算符中可以实现上述组合运算符的等效功能：

db.collection.aggregate([

{ "$match": {

"users.user": { "$in": [ 1,5,7 ] }

}},

{ "$project": {

"_id": {

"_id": "$_id",

"date": "$date",

"users": "$users"

"users": 1,

}},

{ "$unwind": "$users" },

{ "$group": {

"_id": "$_id",

"users": { "$push": "$users.user" }

}},

{ "$project": {

"match": { "$setIsSubset": [ "$users", [ 1, 5, 7 ] ] }

}},

{ "$match": { "match": true } },

{ "$project": {

"_id": "$_id._id",

"date": "$_id.date",

"users": "$_id.users"

}}

])

或者采用另一种方法，同时仍然利用$sizeMongoDB 2.6 中的运算符：

db.collection.aggregate([

// Match documents that "could" meet the conditions

{ "$match": {

"users.user": { "$in": [ 1, 5, 7 ] }

}},

// Keep your original document and a copy of the array

// and a note of it's current size

{ "$project": {

"_id": {

"_id": "$_id",

"date": "$date",

"users": "$users"

"users": 1,

"size": { "$size": "$users" }

}},

// Unwind the array copy

{ "$unwind": "$users" },

// Filter array contents that do not match

{ "$match": {

"users.user": { "$in": [ 1, 5, 7 ] }

}},

// Count the array elements that did match

{ "$group": {

"_id": "$_id",

"size": { "$first": "$size" },

"count": { "$sum": 1 }

}},

// Compare the original size to the matched count

{ "$project": {

"match": { "$eq": [ "$size", "$count" ] }

}},

// Filter out documents that were not the same

{ "$match": { "match": true } },

// Return the original document form

{ "$project": {

"_id": "$_id._id",

"date": "$_id.date",

"users": "$_id.users"

}}

])

当然，哪一个仍然可以完成，尽管在2.6之前的版本中要花更长的时间：

db.collection.aggregate([

// Match documents that "could" meet the conditions

{ "$match": {

"users.user": { "$in": [ 1, 5, 7 ] }

}},

// Keep your original document and a copy of the array

{ "$project": {

"_id": {

"_id": "$_id",

"date": "$date",

"users": "$users"

"users": 1,

}},

// Unwind the array copy

{ "$unwind": "$users" },

// Group it back to get it's original size

{ "$group": {

"_id": "$_id",

"users": { "$push": "$users" },

"size": { "$sum": 1 }

}},

// Unwind the array copy again

{ "$unwind": "$users" },

// Filter array contents that do not match

{ "$match": {

"users.user": { "$in": [ 1, 5, 7 ] }

}},

// Count the array elements that did match

{ "$group": {

"_id": "$_id",

"size": { "$first": "$size" },

"count": { "$sum": 1 }

}},

// Compare the original size to the matched count

{ "$project": {

"match": { "$eq": [ "$size", "$count" ] }

}},

// Filter out documents that were not the same

{ "$match": { "match": true } },

// Return the original document form

{ "$project": {

"_id": "$_id._id",

"date": "$_id.date",

"users": "$_id.users"

}}

])

通常，这会找出不同的方法，尝试一下，看看哪种方法最适合您。$in与您现有表单的简单组合很可能是最好的组合。但是在所有情况下，请确保您具有可以选择的索引：

db.collection.ensureIndex({ "users.user": 1 })

只要您以某种方式访问它，这将为您提供最佳性能，如此处的所有示例所示。

判决

我对此很感兴趣，因此最终设计了一个测试用例，以查看性能最佳的产品。因此，首先生成一些测试数据：

var batch = [];

for ( var n = 1; n <= 10000; n++ ) {

var elements = Math.floor(Math.random(10)*10)+1;

var obj = { date: new Date(), users: [] };

for ( var x = 0; x < elements; x++ ) {

var user = Math.floor(Math.random(10)*10)+1,

group = Math.floor(Math.random(10)*10)+1;

obj.users.push({ user: user, group: group });

}

batch.push( obj );

if ( n % 500 == 0 ) {

db.problem.insert( batch );

batch = [];

}

集合中有10000个文档，其中长度为1..10的随机数组保持1..0的随机值，我得出了430个文档的匹配计数（从$inmatch的7749减少），结果如下（平均）：

JavaScript with $in子句：420ms

总有$size：395ms

带有组数组计数的聚合：650ms

包含两个集合运算符的集合：275ms

聚合时间$setIsSubset：250ms

请注意，除了最后两个样本外，所有样本均完成了约100ms 的峰值方差，并且最后两个样本均显示了220ms的响应。最大的变化是在JavaScript查询中，该查询的结果也慢了100毫秒。

但是这里的要点是与硬件相关的，在我的笔记本电脑上的VM下，硬件并不是特别出色，但是可以提供一个思路。

因此，总体而言，特别是具有集合运算符的MongoDB 2.6.1版本显然会在性能上胜出，而$setIsSubset作为单个运算符还会带来一点额外收益。

鉴于（如2.4兼容方法所示），此过程中的最大开销将是$unwind语句（超过100ms avg），因此，这特别有趣，因此，$in选择的平均时间约为32ms，其余流水线阶段将在不到100ms内执行一般。这样就给出了聚合与JavaScript性能的相对概念。

反对回复 2019-11-19

小唯快跑啊

TA贡献1863条经验获得超2个赞

我只是花了大部分时间试图通过对象比较而不是严格的相等性来实现上述Asya的解决方案。所以我想在这里分享。

假设您将问题从userIds扩展到了完整用户。您想查找所有文档，其中其users数组中的每个项目都出现在另一个用户数组中：[{user: 1, group: 3}, {user: 2, group: 5},...]

这是行不通的：db.collection.find({"users":{"$not":{"$elemMatch":{"$nin":[{user: 1, group: 3},{user: 2, group: 5},...]}}}}})因为$ nin仅适用于严格的平等。因此，我们需要找到一种不同的方式来表示对象数组的“不在数组中”。并且使用$where会大大降低查询速度。

解：

db.collection.find({

"users": {

"$not": {

"$elemMatch": {

// if all of the OR-blocks are true, element is not in array

"$and": [{

// each OR-block == true if element != that user

"$or": [

"user": { "ne": 1 },

"group": { "ne": 3 }

]

}, {

"$or": [

"user": { "ne": 2 },

"group": { "ne": 5 }

]

}, {

// more users...

}]

}

})

完善逻辑：$ elemMatch匹配数组中没有用户的所有文档。因此$ not将匹配数组中所有用户的所有文档。

反对回复 2019-11-19

3 回答
0 关注
1077 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

检查数组中的每个元素是否都符合条件

检查数组中的每个元素是否都符合条件

3 回答

添加回答