A MongoDB Aggregation Example
In this post, we’re going to take a look at how to build a MongoDB aggregation query with Studio 3T’s Aggregation Editor.
Prerequisites
- Download Studio 3T if you haven’t already. It’s available for Windows, Mac, and Linux. The Aggregation Editor feature is available on all three editions – Core, Pro, and Enterprise.
- Download the zip file attachment at the end of the article (under Article Attachments) – which contains the publicly-available housing data from the City of Chicago Data Portal – and import the CSV file to your MongoDB database.
Prefer to watch? Check out our video tutorial.
Open the Aggregation Editor
Once we’ve opened up Studio 3T and connected to the database, we can select the collection we wish to query:
…and open Aggregation Editor either by right-click, button, or hotkey.
An empty MongoDB aggregate query will pop up – time to get cracking!
Identify the question to answer
The question we want to ask of our data is simple:
Which zip codes have the greatest number of senior housing units available?
To think how we’ll answer this and how we’ll form our query, let’s take a look at the data.
Click Execute full pipeline, which looks like a play button, to view the data. Note that executing an empty pipeline simply shows the contents of the collection.
If you prefer a JSON view of the data, it’s included below:
{
"_id" : ObjectId("5b61c4db61217f28a011df1e"),
"community_area" : {
"name" : "Humboldt Park",
"number" : NumberInt(23)
},
"property" : {
"type" : "Supportive Housing",
"name" : "Humboldt Park Residences"
},
"address" : "1152 N. Christiana Ave.",
"zip_code" : NumberInt(60647),
"phone_number" : "773-276-5338 ext. 225",
"management_company" : "LUCHA",
"units" : NumberInt(69),
"location" : {
"x_coordinate" : 1153731.4411072168,
"y_coordinate" : 1907707.148117049,
"latitude" : 41.9025857989,
"longitude" : -87.7107605283
}
}
Studio 3T supports dynamically switching between Table, Tree and JSON views of your result data.
OK, so we can see we have the fields we need.
We can check "property.type" to see that it’s senior housing, and "zip_code" and "units" give us the zip code and number of available units there are, respectively.
To answer our question, we need to combine these into the right aggregation query.
But first, let’s get familiar with MongoDB aggregation pipeline operators.
Add a new stage
Add a new stage and you’ll see a new stage in the Pipeline tab.
Double click the new stage to edit it (or simply select the Stage 1 tab):
The screenshot above jumps ahead a little bit as the stage specification has already been filled, but let’s break down each piece in turn.
Stage 1: Match criteria with the $match operator
First, notice the $match in the combo box. It’s here where we select the stage’s ‘operator’.
A stage operator defines what the stage actually does.
The $match operator takes the input set of documents and outputs only those that match the given criteria. It is essentially a filter.
For convenience, the specification of the Stage 1 $match operator is repeated below:
{
"property.type": "Senior"
}
In the stage’s specification, we can see that we are matching against the "Senior" property type, meaning only documents with a value of "Senior" for the field "property.type" will be passed onto the (yet to be created) next stage of the MongoDB aggregation pipeline for further processing.
We can check the output of this and any other stage at any time by clicking Show output from the selected stage.
Similarly, we can see the input of any stage at any time by clicking Show input to the selected stage.
This is a really nice and convenient feature, as it makes keeping track of the precise form of the data we are working at each stage in the aggregation pipeline really easy.
We can see in the Stage 1 output tab that we have the results we need from this stage, and so let’s go on and create the next.
Stage 2: Group results with the $group operator
We now need a way to group together the results from Stage 1 on zip code and then add up each of the available units figures. The ‘$group‘ operator is exactly what we need for this.
The Stage 2 $group operator specification is repeated below:
{
_id: "$zip_code",
totalUnits: { $sum: "$units" }
}
The specification of Stage 2 states that the output of this stage will be documents that have an “_id" with a distinct zip code as a value and so will group together documents input to this stage that have the same zip code, and a “totalUnits" field whose value is the sum of all the "units"field values from each of the documents in the group.
We can see the input to and output from tabs for this stage in the screenshot and can confirm that a reduction has taken place – of the 70 documents input to this stage, there were 36 distinct zip codes, and so the corresponding 36 documents are output from this stage.
Stage 3: Sort results with the $sort operator
As we want to know the zip codes that have the greatest number of senior housing units available, it would be convenient to sort the results from the greatest to the least total units available.
To do this, we’ll create a third stage using the $sort operator with the following specification, giving us exactly what we want:
{
totalUnits: -1
}
The full pipeline
Going back to the Pipeline tab we can see the result of the execution of the full query, as well as the full query itself, all in one one place.
We can see we have the expected number of results from the full aggregation pipeline, and we can now answer our question: we have a list of the zip codes that have the greatest number of senior housing units available.
Wasn’t that easy?
Tips and tricks
Export query results without running the query
We’ve heard from some users that it gets quite time-consuming to need to run a query first before you can export its results – a pain point that’s especially felt if the queries are long and you frequently need to export.
While we endeavor to introduce a solution in a future release, a workaround is to use an '$out'stage in the pipeline to write the results of the query to a collection.
This workaround requires an intermediate collection for the results, but it means that the query is run only once, and you can always directly export that collection.
Generate code from aggregation queries automatically
We’ve added Query Code – the Studio 3T feature that lets you automatically translate MongoDB and SQL queries to JavaScript, Python, Java, C#, PHP and the mongo shell language – to Aggregation Editor, which means you can now do the same with MongoDB aggregate queries.
Generate driver code in five languages simply by clicking on the Query Tab. Here it is in action:
Share aggregation queries with your team
You can save the aggregation queries you’ve created to a file. Not only can you reload them in future sessions, but also share them with other colleagues and users.
You can also easily copy and paste aggregate queries to and from the clipboard.
Handy references
It can take a bit of time to master all the different operators available in the MongoDB aggregation pipeline, so links to the MongoDB Aggregation Pipeline Quick Reference and the Aggregation Section of the MongoDB Manual are always available within a click’s reach directly in the app itself via the Operator Quick Reference and Aggregation Tutorial links, respectively.