Aggregation in MongoDB

Introduction

Aggregation operations are very important in any type of database whether it is SQL or NoSQL. To perform aggregations operations, MongoDB group values from multiple documents together and then perform a variety of operations on grouped data to return a single result. SQL uses aggregate functions to return a single value calculated from values in columns.

MongoDB has three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and the single purpose aggregation methods.

In this article, we will focus on aggregation pipeline. I’ll try to cover each major section of it using simple examples. We will be writing mongo shell commands to perform aggregation.

Aggregation Pipeline

MongoDB’s aggregation framework is based on the concept of data processing pipelines. Aggregation pipeline is similar to the UNIX world pipelines. At the very first is the collection, the collection is sent through document by document, documents are piped through processing pipeline and they go through series of stages and then we eventually get a result set.

In the figure, you see that collection is processed through different stages i.e. $project, $match, $group, $sort these stages can appear multiple times.

Various stages in pipeline are:

$project – select, reshape data
$match – filter data
$group – aggregate data
$sort – sorts data
$skip – skips data
$limit – limit data
$unwind – normalizes data

Let’s try to visualize the aggregation with an example. Don’t worry about the syntax. I will be explaining it soon.

 db.mycollection.aggregate([
	{$match:{'phone_type':'smart'}},
	{$group:{'_id':'$brand_name',total:{$sum:'$price'}}}
])

As you can see, in the diagram we have a collection, the $match stages filters out the documents then in next stage of pipeline documents gets grouped and we get the final result set.

Preparing Dummy Data

To run mongo shell commands, we need a database and some dummy records, so let’s create our database and a collection.

dept = ['IT', 'Sales', 'HR', 'Admin'];
for (i = 0; i < 10; i++) {
    
    db.mycollection.insert({ //mycollection is collection name
        '_id': i,
        'emp_code': 'emp_' + i,
        'dept_name': dept[Math.round(Math.random() * 3)],
        'experience': Math.round(Math.random() * 10),

    });

The above command will insert some dummy documents in a collection named mycollection in mydbdatabase.

Syntax

db.mycollection.aggregate([
   {$match:{'phone_type':'smart'}},
   {$group:{'_id':'$brand_name',total:{$sum:'$price'}}}
])

Syntax is pretty much easier, aggregate function takes an array as argument, in array we can pass various phases/stages of pipeline.

In the above example, we have passed two phases of pipeline that are $match which will filter out record and $group phase which will group the records and produce final record set.

Stages of Pipeline

1. $project

In the $project phase, we can add a key, remove a key, reshape a key. There are also some simple functions that we can use on the key : $toUpper, $toLower, $add, $multiply, etc.

Let’s use $project to reshape the documents that we have created.

db.mycollection.aggregate([
    {
        $project:{
            _id:0,
            'department':{$toUpper:'$dept_name'},
            'new_experience':{$add:['$experience',1]}
        }
    }
])

In this aggregate query, we are projecting the documents, _id:0 means _id which is compulsory we are hiding this field, a new key named department is created using previous dept_name field in upper case. The point to be noticed here is that field ‘dept_name’ is prefixed with ‘$’ sign to tell mongo shell that this field is the original field name of the document. Another new field named new_experience is created by adding 1 using $addfunction to the previous experience field. We will get the output like this:

2. $match

It works exactly like ‘where clause’ in SQL to filter out the records. The reason we might want to match is because we’d like to filter the results and only aggregate a portion of the documents or search for particular parts of the results set after we do the grouping. Let’s say in our collection we want to aggregate documents having department equals to sales, the query will be:

db.mycollection.aggregate([
    {
        $match:{
            dept_name:'Sales'
        }
    }
])

3. $group

As the name suggests, $group groups documents based on some key. Let’s say we want to group employeeson their department name and we want to find the number of employees in each department.

Hide Copy Code

db.mycollection.aggregate([
    {
        $group:{
            _id:'$dept_name',
            no_of_employees:{$sum:1}
        }
    }
])

Here, _id is the key for grouping and I have created new key named no_of_employees and used $sum to find the total record in each group.

Let’s improve this query to present output in a more sensible way.

db.mycollection.aggregate([
    {
        $group:{
            _id:{'department':'$dept_name'},
            no_of_employees:{$sum:1}
        }
    }
])

Let’s say we want to group documents on more than on key, all we need to do is specify the name of the keys in _id field.

db.mycollection.aggregate([
    {
        $group:{
            _id:{'department':'$dept_name',
                'year_of_experience':'$experience'
            },
            no_of_employees:{$sum:1}
        }
    }
])

4. $sort

Sort helps you to sort data after aggregation in ascending or descending as per your need. Let’s say we want to group department name in ascending order and find out the number of employees.

db.mycollection.aggregate([
    {
        $group:{
            _id:'$dept_name',
            no_of_employees:{$sum:1}
        }
    },
    {
        $sort:{
            _id:1
        }
    }

])

For descending use -1. Here in $sort, I have used _id field because in the first phase of aggregation, I used $dept_name as _id for aggregation.

5. $skip and $limit

$skip and $limit exactly same way skip and limit work when we do a simple find. It doesn’t make any sense to skip and limit unless we first sort, otherwise, the result is undefined.

We first skip records and then we limit.

Let’s see an example for the same.

db.mycollection.aggregate([
    {
        $group:{
            _id:'$dept_name',
            no_of_employees:{$sum:1}
        }
    },
    {
        $sort:{
            _id:1
        }
    },
    {
        $skip:2
    },
    {
        $limit:1
    }

])

Documents are grouped, then sorted, after that, we skipped two documents and limit the document to only one.

6. $first and $last

As we know how sort works in the aggregation pipeline, we can learn about $first and $last. They allow us to get the first and last value in each group as aggregation pipeline processes the document.

db.mycollection.aggregate([
    {
        $group:{
            _id:'$dept_name',
            no_of_employees:{$sum:1},
            first_record:{ $first:'$emp_code'}
        }
    }
])

7. $unwind

As we know in MongoDB, documents can have arrays. It is not easy to group on something within an array. $unwind first unjoin array data and then basically rejoin it in a way that lets us do grouping calculations on it.

Let’s say we have a document like this:

{
    a:somedata,
    b:someotherdata,
    c:[arr1,arr2,arr3]
}

After $unwind on ‘c’, we will get three documents:

{
    a:somedata,
    b:someotherdata,
    c:arr1
}
{
    a:somedata,
    b:someotherdata,
    c:arr2
}
{
    a:somedata,
    b:someotherdata,
    c:arr3
}

8. Aggregation Expressions

Let’s see some expressions that are very common in SQL and in MongoDB. We have an alternate for that.

$Sum: We have already seen its example.
$avg: Average works just like sum except it calculates the average for each group.
$min: Finds out the minimum value from each grouped document.
$max: Finds out the maximum value from each grouped document.

Conclusion

I have not explained all the topics in aggregation, but this article will help you kick-start using aggregation in MongoDB in your project and for your learning. I have attached mongo shell commands for your reference.