Saturday, 6 May 2017

Finding distinct field data and its count in the MongoDB



MongoDB  is very interesting NoSQL DB  supporting schema less structure  and allow all type of data structures to be saved in BSON (Binary  JSON) format  which is becoming boon for the programming community .  It serves  mind boggling speed in terms of searching  and help to product to scale at a very high level. Its Master Slave replication and load sharding  features makes it an awesome NoSql database  , very useful in the web development industry.

Mongo shell has many collections methods  and support following methods :

-----------------------------------
db.collection.aggregate()
db.collection.bulkWrite()
db.collection.count()
db.collection.copyTo()
db.collection.createIndex()
db.collection.dataSize()
db.collection.deleteOne()
db.collection.deleteMany()
db.collection.distinct()
db.collection.drop()
db.collection.dropIndex()
db.collection.dropIndexes()
db.collection.ensureIndex()
db.collection.explain()
db.collection.find()
db.collection.findAndModify()
db.collection.findOne()
db.collection.findOneAndDelete()
db.collection.findOneAndReplace()
db.collection.findOneAndUpdate()
db.collection.getIndexes()
db.collection.getShardDistribution()
db.collection.getShardVersion()
db.collection.group()
db.collection.insert()
db.collection.insertOne()
db.collection.insertMany()
db.collection.isCapped()
db.collection.latencyStats()
db.collection.mapReduce()
db.collection.reIndex()
db.collection.replaceOne()
db.collection.remove()
db.collection.renameCollection()
db.collection.save()
db.collection.stats()
db.collection.storageSize()
db.collection.totalSize()
db.collection.totalIndexSize()
db.collection.update()
db.collection.updateOne()
db.collection.updateMany()
db.collection.validate()
-----------------------------------------------


One of the very interesting and frequently used collection method  which is widely useful  is

db.collection.distinct() method. How its work lets see  that :

Suppose you have a product collection having 4 documents as follow :

{ 
    "_id" : NumberInt(1), 
    "account" : "abc", 
    "vendor" : "amazon", 
    "category" : "mobile"
}
{ 
    "_id" : NumberInt(2), 
    "account" : "abc", 
    "vendor" : "overstock", 
    "category" : "mobile"
}
{ 
    "_id" : NumberInt(3), 
    "account" : "xyz", 
    "vendor" : "fashionStreet", 
    "category" : "fashion"
}
{ 
    "_id" : NumberInt(4), 
    "account" : "pqr", 
    "vendor" : "foodTruck", 
    "category" : "food"

}

On executing    db.product.distinct("category") gives output an array of the category object as 

[ "mobile", "fashion", "food" ]

If you want to find the total of distinct  elements count  , you need to execute the following command : 

db.product.distinct("category").length

output : 3 

> db.product.distinct("vendor");
output : [ "amazon", "overstock", "fashionStreet", "foodTruck" ]

> db.product.distinct("vendor").length;

output : 4


There is another way of finding the distinct elements using Aggregation framework which is yet another brilliant feature of MongoDB  , see the following examples to have a glimpse of it .



> db.product.aggregate([{$group: {_id : "$category" } }])
{ "_id" : "food" }
{ "_id" : "fashion" }
{ "_id" : "mobile" }

Using group operator  , collection is grouped in distinct elements , if you notice distinct() & $group both are giving same output. This is a beauty of aggregation .


If you want each distinct category along with the its count use : 

> db.product.aggregate([{$group: {_id : "$category" , count : {$sum : 1}} }])
{ "_id" : "food", "count" : 1 }
{ "_id" : "fashion", "count" : 1 }
{ "_id" : "mobile", "count" : 2 }

If you want categories to  be in descending order  along with their counts  use as : 

> db.product.aggregate([{$group: {_id : "$category" , count : {$sum : 1}} } , {$sort : {count : -1}}])
{ "_id" : "mobile", "count" : 2 }
{ "_id" : "food", "count" : 1 }
{ "_id" : "fashion", "count" : 1 }

So how did it work :  

1. Using $group operator  category column is grouped in distinct category elements 

2. $sum operator count the total number of elements belonging to each categories

3. using $sort data pipelines  output of above 1 & 2 is sorted on the descending order  on the count field.


Hopefully above details are helpful in understanding the collection methods of mongo shell and also help in reaching one step ahead in the aggregation framework.


Enjoy Mongo  ...its awesome !!!!!!!




    No comments:

    Post a Comment