Mongoengine 简明教程
MongoEngine - Aggregation
术语 “ aggregation ” 用于对数据进行处理并返回计算结果的操作。对集合中一个或多个文档字段求和、计数和平均值可称为聚合函数。
The term ‘aggregation’ is used for the operation that processes data and returns computed result. Finding sum, count and average on one or more fields of documents in a collection can be called as aggregation functions.
MongoEngine 提供了 aggregate() 函数,该函数封装了 PyMongo 的聚合框架。聚合操作使用集合作为输入,并返回一个或多个文档作为结果。
MongoEngine provides aggregate() function that encapsulates PyMongo’s aggregation framework. Aggregation operation uses a collection as input and returns one or more documents as a result.
MongoDB 使用数据处理管道这一概念。一个 pipeline 可以有多个阶段。基本阶段提供了提供过滤器和像查询一样运作的内容。其他阶段提供了按一个或多个字段进行分组和/或排序、字符串连接任务、数组聚合工具等的工具。
MongoDB uses concept of data processing pipelines. A pipeline can have multiple stages. Basic stage provides that provide filter and operate like queries. Others provide tools for grouping and/or sorting by one or more fields, string concatenation tasks, array aggregation tools, etc.
在 MongoDB 管道创建中定义了以下阶段 -
Following stages are defined in MongoDB pipeline creation −
Name |
Description |
$project |
Reshapes each document in the stream, by adding new fields or removing existing fields. |
$match |
Filters the document stream to allow only matching documents to pass unmodified into the next stage. $match uses standard MongoDB queries. |
$redact |
Reshapes each document by restricting the content for each document based on information stored in the documents themselves. |
$limit |
Limits documents to be passed unmodified to the pipeline |
$skip |
Skips the first n documents and passes the remaining documents unmodified to the pipeline. |
$group |
Groups input documents by a given identifier expression and applies the accumulator expressions to each group. The output documents only contain the identifier field and accumulated fields. |
$sort |
Reorders the document stream by a specified sort key. |
$out |
Writes the resulting documents of the aggregation pipeline to a collection. |
聚合表达式使用字段路径来访问输入文档中的字段。要指定字段路径,请在字段名称前添加一个美元符号 $。表达式可以使用一个或多个布尔运算符($and、$or、$not)和比较运算符($eq、$gt、$lt、$gte、$lte 和 $ne)。
Aggregation expressions use field path to access fields in the input documents. To specify a field path, use a string that prefixes with a dollar sign $$$ the field name. Expression can use one or more Boolean operators ($and, $or, $not) and comparison operators ($eq, $gt, $lt, $gte, $lte and $ne).
以下算术表达式也用于聚合 −
Following arithmetic expressions are also used for aggregation −
$add |
Adds numbers to return the sum. Accepts any number of argument expressions |
$subtract |
Returns the result of subtracting the second value from the first |
$multiply |
Multiplies numbers to return the product. Accepts any number of argument expressions |
$divide |
Returns the result of dividing the first number by the second. Accepts two argument expressions |
$mod |
Returns the remainder of the first number divided by the second. Accepts two argument expressions |
以下字符串表达式也可用于聚合中 −
Following string expression can also be used in aggregation −
$concat |
Concatenates any number of strings |
$substr |
Returns a substring of a string, starting at a specified index position up to a specified length |
$toLower |
Converts a string to lowercase. Accepts a single argument expression |
$toUpper |
Converts a string to uppercase. Accepts a single argument expression |
$strcasecmp |
Performs string comparison and returns 0 if two strings are equivalent, 1 if first is greater than second, and -1 if first string is less than second |
为了演示 aggregate() 函数在 MongoEngine 中的工作方式,我们首先来定义一个名为 orders 的文档类。
To demonstrate how aggregate() function works in MongoEngine, let us first define a Document class called orders.
from mongoengine import *
con=connect('mydata')
class orders(Document):
custID = StringField()
amount= IntField()
status = StringField()
然后,我们在 orders 集合中添加以下文档 −
We then add following documents in orders collection −
_id |
custID |
amount |
status |
ObjectId("5eba52d975fa1e26d4ec01d0") |
A123 |
500 |
A |
ObjectId("5eba536775fa1e26d4ec01d1") |
A123 |
250 |
A |
ObjectId("5eba53b575fa1e26d4ec01d2") |
B212 |
200 |
D |
ObjectId("5eba540e75fa1e26d4ec01d3") |
B212 |
400 |
A |
aggregate() 函数用于查找仅当 status 等于 'A' 时每个 custID 的 amount 字段的和。于是,管道建立如下。
The aggregate() function is to be used to find sum of amount field for each custID only when status equals ‘A’. Accordingly, the pipeline is constructed as follows.
管道中的第一阶段使用 $match 来筛选 status='A' 的文档。第二阶段使用 $group 标识符对 CustID 上的文档进行分组并对 amount 求和。
First stage in pipeline uses $match to filter documents with status=’A’. Second stage uses $group identifier to group documents on CustID and performs sum of amount.
pipeline = [
{"$match" : {"status" : "A"}},
{"$group": {"_id": "$custID", "total": {"$sum": "$amount"}}}
]
此管道现在用作 aggregate() 函数的参数。
This pipeline is now used as argument to aggregate() function.
docs = orders.objects().aggregate(pipeline)
我们可以使用 for 循环对文档游标进行迭代。完整的代码如下所示 −
We can iterate over the document cursor with a for loop. The complete code is given below −
from mongoengine import *
con=connect('mydata')
class orders(Document):
custID = StringField()
amount= IntField()
status = StringField()
pipeline = [
{"$match" : {"status" : "A"}},
{"$group": {"_id": "$custID", "total": {"$sum": "$amount"}}}
]
docs = orders.objects().aggregate(pipeline)
for doc in docs:
print (x)
对于给定的数据,将生成以下输出 −
For the given data, the following output is generated −
{'_id': 'B212', 'total': 400}
{'_id': 'A123', 'total': 750}