📅  最后修改于: 2023-12-03 15:03:01.177000             🧑  作者: Mango
MongoDB is a NoSQL document-oriented database that stores data in collections instead of tables. One of the most powerful features of MongoDB is the ability to group data based on specific criteria using its aggregation framework. In this article, we will explore how to perform group by in MongoDB using Python.
Before we can start any operations on MongoDB, we need to establish a connection to the database. We can use the pymongo
module in Python to achieve this.
import pymongo
# Establishing a connection to MongoDB
client = pymongo.MongoClient("mongodb://localhost:27017/")
For demonstration purposes, we will create a sample dataset of orders
containing the following fields:
order_id
(int): unique identifier for each ordercustomer_id
(int): unique identifier for each customeramount
(float): order cost in USDstatus
(str): order status (e.g., "pending", "shipped", "delivered")We can insert the sample data into a MongoDB collection named orders
using the following code:
# Creating a sample dataset
orders = [
{"order_id": 1, "customer_id": 101, "amount": 10.5, "status": "pending"},
{"order_id": 2, "customer_id": 102, "amount": 15.0, "status": "shipped"},
{"order_id": 3, "customer_id": 101, "amount": 7.25, "status": "shipped"},
{"order_id": 4, "customer_id": 103, "amount": 20.0, "status": "delivered"},
{"order_id": 5, "customer_id": 102, "amount": 12.75, "status": "delivered"},
{"order_id": 6, "customer_id": 101, "amount": 8.0, "status": "shipped"},
{"order_id": 7, "customer_id": 103, "amount": 11.5, "status": "pending"}
]
# Inserting orders into MongoDB
db = client["mydatabase"]
orders_collection = db["orders"]
orders_collection.insert_many(orders)
To perform a group by operation in MongoDB, we need to use the $group
stage in its aggregation pipeline. The $group
stage allows us to group documents by one or more fields and perform aggregate functions on them.
In our sample dataset, let's say we want to group the orders by their status
and calculate the total sum of amount
for each status. We can achieve this using the following code:
# Performing group by on MongoDB
pipeline = [
{
"$group": {
"_id": "$status",
"total_amount": {"$sum": "$amount"}
}
}
]
result = orders_collection.aggregate(pipeline)
# Printing the results
for row in result:
print(row)
The above code creates a pipeline with a single $group
stage that groups the orders by their status
field and calculates the total sum of amount
for each status. The results are returned in a cursor object, which we can iterate over and print each row.
In this article, we explored how to perform group by in MongoDB using Python. We learned how to establish a connection to MongoDB, create a sample dataset, and perform a group by operation using the $group
stage in its aggregation pipeline.