db.collection('yourColl').aggregate([
{ $match: { network: { $in: ["LLM", "YFW"] } } },
{ $group: { _id: "$romProjectName", kinds: { $addToSet: "$network" } } },
{ $match: { kinds: { $all: ["LLM", "YFW"] } } },
{ $project: { _id: 0, romProjectName: "$_id" } }
])
{ romProjectName: 1, network: 1 } 以加速匹配与分组。下面我将详细解释上面的聚合查询步骤:
{ $match: { network: { $in: ["LLM", "YFW"] } } }这一步仅选择 network 字段值为 "LLM" 或 "YFW" 的文档。这是优化查询的第一步,减少后续处理的数据量。{ $group: { _id: "$romProjectName", kinds: { $addToSet: "$network" } } }这一步按 romProjectName 字段对文档进行分组,同时使用 $addToSet 操作符收集每个项目中出现的所有独特 network 值到 kinds 数组中。$addToSet 确保每种 network 类型只会在数组中出现一次。{ $match: { kinds: { $all: ["LLM", "YFW"] } } }这一步筛选出 kinds 数组中同时包含 "LLM" 和 "YFW" 的分组结果。$all 操作符确保两个网络类型都必须存在。{ $project: { _id: 0, romProjectName: "$_id" } }这一步重塑输出格式,移除 _id 字段并将原来的 _id (即分组用的 romProjectName) 值重命名为 romProjectName 字段返回。假设有以下文档:
[
{ romProjectName: "Project1", network: "LLM", ... },
{ romProjectName: "Project1", network: "YFW", ... },
{ romProjectName: "Project2", network: "LLM", ... },
{ romProjectName: "Project3", network: "YFW", ... },
{ romProjectName: "Project4", network: "LLM", ... },
{ romProjectName: "Project4", network: "YFW", ... }
]
查询执行过程:
第一个 $match: 保留所有文档(因为所有 network 都是 "LLM" 或 "YFW")
$group: 结果变为
[
{ _id: "Project1", kinds: ["LLM", "YFW"] },
{ _id: "Project2", kinds: ["LLM"] },
{ _id: "Project3", kinds: ["YFW"] },
{ _id: "Project4", kinds: ["LLM", "YFW"] }
]
第二个 $match: 仅保留 kinds 同时包含 "LLM" 和 "YFW" 的记录,结果变为
[
{ _id: "Project1", kinds: ["LLM", "YFW"] },
{ _id: "Project4", kinds: ["LLM", "YFW"] }
]
$project: 最终输出为
[
{ romProjectName: "Project1" },
{ romProjectName: "Project4" }
]
{ network: 1, romProjectName: 1 } 的复合索引{ $limit: N } 阶段来限制结果数量