Given a collection name product having following documents in the collection.
{ "_id" : NumberInt("1"),"account" : "abc","vendor" : "amazon","category" : "mobile"},
{"_id" : NumberInt("2"), "account" : "abc","vendor" : "overstock","category" : "mobile"},
{"_id" : NumberInt("3"),"account" : "xyz","vendor" : "fashionStreet","category" : "fashion"},
{"_id" : NumberInt("4"),"account" : "pqr","vendor" : "foodTruck","category" : "food"},
{"_id" : NumberInt("5"),"account" : "abc","vendor" : "amazon", "category" : "mobile"}
If one notice there are duplicate elements exist in the collection with the composite key(account+ vendor+ category) marked in blue on colour , so how do we identify the duplicates in a collection and how do we remove the duplicate keys in the MongoDB?
This problem is easily solved using Aggregation pipeline framework which provides $group operator to group an compound or composite key along with the $match operator to match a criteria.
db.product.aggregate([
{ $group: {
_id: { account: "$account", vendor: "$vendor", category: "$category" },
count: { $sum: 1 }
}},
{
$match: {
count: { $gt: 1 }
}
},
{
$sort: {
count : -1 }
}
]);
{ "_id" : NumberInt("1"),"account" : "abc","vendor" : "amazon","category" : "mobile"},
{"_id" : NumberInt("2"), "account" : "abc","vendor" : "overstock","category" : "mobile"},
{"_id" : NumberInt("3"),"account" : "xyz","vendor" : "fashionStreet","category" : "fashion"},
{"_id" : NumberInt("4"),"account" : "pqr","vendor" : "foodTruck","category" : "food"},
{"_id" : NumberInt("5"),"account" : "abc","vendor" : "amazon", "category" : "mobile"}
If one notice there are duplicate elements exist in the collection with the composite key(account+ vendor+ category) marked in blue on colour , so how do we identify the duplicates in a collection and how do we remove the duplicate keys in the MongoDB?
This problem is easily solved using Aggregation pipeline framework which provides $group operator to group an compound or composite key along with the $match operator to match a criteria.
db.product.aggregate([
{ $group: {
_id: { account: "$account", vendor: "$vendor", category: "$category" },
count: { $sum: 1 }
}},
{
$match: {
count: { $gt: 1 }
}
},
{
$sort: {
count : -1 }
}
]);
Above query find all the duplicates elements whose account, vendor , category values are same and having duplicate values and output of the above query is as follow with the descending order of count :
Output :
{
"_id" : {
"account" : "abc",
"vendor" : "amazon",
"category" : "mobile"
},
"count" : 2
}
Note that _id is made composite of 3 columns account , vendor , category so in general
_id : { key1 : "$key1", key2 : "$key2", key3 : "$key3", key4 : "$key4", key5 : "$key5" }
and count plays an important role to find out the number of duplicates in the collection.
With the above example , we have identified the duplicate elements in the collection , so the next question arises , how to remove duplicate keys in a collection in MongoDB?
var duplicates = [];
db.product.aggregate([
{ $group: {
_id: { account: "$account" , vendor : "$vendor", category : "$category"}, // can be grouped on multiple properties/keys/columns
dups: { "$addToSet": "$_id" },
count: { "$sum": 1 }
}},
{ $match: {
count: { "$gt": 1 } // Duplicates considered as count greater than one
}}
])
.forEach(function(doc) {
doc.dups.shift(); // First element skipped for deleting and important line here
doc.dups.forEach( function(dupId){
duplicates.push(dupId); // Getting all duplicate ids
}
)
});
// If you want to Check all "_id" which you are deleting
printjson(duplicates);
Output :
[ NumberInt("1") ]
Now we have identified the array of duplicate "_id" as shown in the output of the above query.
To remove all the duplicates from the collection , need to execute the following
// Remove all duplicates in one go
db.product.remove({_id:{$in:duplicates}});
db.product.find({});
Output :
{"_id" : NumberInt("2"), "account" : "abc","vendor" : "overstock","category" : "mobile"},
{"_id" : NumberInt("3"),"account" : "xyz","vendor" : "fashionStreet","category" : "fashion"},
{"_id" : NumberInt("4"),"account" : "pqr","vendor" : "foodTruck","category" : "food"},
{"_id" : NumberInt("5"),"account" : "abc","vendor" : "amazon", "category" : "mobile"}
So you will get an amazing result with the all the records are unique in the collection.
Be alert while applying an index on the composite columns with {unique: true, dropDups: true}).
Suppose in the above scenario we have applied these condition then it will throw an error as
{"_id" : NumberInt("3"),"account" : "xyz","vendor" : "fashionStreet","category" : "fashion"},
{"_id" : NumberInt("4"),"account" : "pqr","vendor" : "foodTruck","category" : "food"},
{"_id" : NumberInt("5"),"account" : "abc","vendor" : "amazon", "category" : "mobile"}
So you will get an amazing result with the all the records are unique in the collection.
Be alert while applying an index on the composite columns with {unique: true, dropDups: true}).
Suppose in the above scenario we have applied these condition then it will throw an error as
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"errmsg" : "exception: E11000 duplicate key error index
"code" : 11000,
"ok" : 0
}
Note that dropDups has been removed in the Mongo 3.x release and has a severe disadvantage of removing the any random duplicate element.
Even if you apply ensureIndex() or createIndex() at the time of collection having duplicate keys then it will prompt exception "exception: E11000 duplicate key error index" , so its important to remove the duplicate elements with the above mentioned aggregation example and then applying the {unique: true} on the composite index created with the fields say here {account, vendor , category}.
Enjoy the MongoDB and the brilliant Aggregation Framework !!!!
Enjoy the coding with tek9g!!!
The most effective method to Solve MongoDB Error Message 11000 through MongoDB Technical Support
ReplyDeleteAt whatever point you confront this Error 11000 which basically signifies "Copy Key Error Index" it implies the file has two passages for a similar ID. To get very best arrangement with respect to this issue, contact to MongoDB Online Support or MongoDB Customer Support USA. We at Cognegic empower proactive checking and observing of your whole database and help to keep the issue before they happen. We just consider what is essential for the best practice and diminish single purpose of disappointment.
For More Info: https://cognegicsystems.com/
Contact Number: 1-800-450-8670
Email Address- info@cognegicsystems.com
Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801