2016-11-30

Updating Inside a Nested Array with the MongoDB Positional Operator in C#

Sometimes you have a document in MongoDB with a property that is an array of nested objects. You’d like to update one of those objects. What’s the best way to go about that in C#?

Let’s say you have a simple document representing a class with students:

Class.json
{
    "_id" : ObjectId("583ee54339eddb19c03f2bf5"),
    "Name" : "Introduction to Databases",
    "Code" : "CS321",
    "Students" : [
        {
            "Name" : "Alice",
            "Grade" : 92
        },
        {
            "Name" : "Bob",
            "Grade" : 87
        },
        {
            "Name" : "Charlie",
            "Grade" : 76
        }
    ]
}

I’d like to update Bob’s grade to a 93 since he passed his last exam.

One simple way to do it is to load the whole Class document into memory, update the grade in C#, and then put the document back into the database.

That would look something like this:

UpdateWholeDocument.cs
var client = new MongoClient("mongodb://localhost:27017")
var db = client.GetDatabase("school");
var classes = db.GetCollection<Class>("classes");

// Bring the whole Class document into memory
var classToUpdate = classes.Find(c => c.Code == "CS321").First();

// Find bob in the Students array
var bob = classToUpdate.Students.First(s => s.Name == "Bob");

// Update Bob's grade
bob.Grade = 93;

// Save the entire document back to the database
classes.ReplaceOne(c => c.Id == classToUpdate.Id, classToUpdate);

There is a problem with this approach though.

For one thing, its pretty inefficient, especially for a large document. It’s probably not much of a problem for this small sample. but imagine this was a seminar class with 200 students. And that the document also included individual paper grades for each student. This is a lot of data to bring across the wire, instantiate C# objects for, send back across the wire, then eventually garbage collect the whole document in .NET land. It can add up pretty quick, especially in a high volume environment.

A more insidious problem is the race condition involved in updating the whole document. There’s no such thing as a lock in MongoDB, so its possible that two threads could pull down the whole document, make changes and race to get their version saved last.

For example, imagine initially Alice’s score is 95 and Bob’s is 85. Two threads pull down that document.

Thread 1 updates Alice’s score to 92, and Thread 2 update’s Bob’s score to 89. Then they both write their version of the document back to the database. Will Alice have her 92? Will Bob have his 89?

Who knows! Since each thread is replacing the entire document, who ever saves last wins!

Undefined behavior is not something you want in your database layer.

Luckily, MongoDB offers a findAndUpdate operation that can atomically change a subset of the fields in a document. Since the entire document is the unit of atomicity in MongoDB, the two threads can each issue a findAndUpdate at the same time and it will work out OK, provided they are each editing separate students.

If both threads are attempting to modify the same student, than just like before, all bets are off.

Anyway, the magic incantation for findAndModify looks like this:

findAndUpdate.txt
db.classes.findAndModify({
    query: { Code: "CS321", Students: { $elemMatch: { Name: "Bob" } } },
    update: { $set: { "Students.$.Grade": NumberInt(89) } }
})

Couple things going on here.

In the query description, we’re using an $elemMatch projection which tells MongoDB to find a document with an array called Students with an element with the Name “Bob”.

The update description includes that curious string, "Students.$.Grade". That dollar sign is called the Positional Operator. It refers to the first matching element in the array.

We also use NumberInt(89) so that Mongo writes an integer instead of a floating point number, which is the default.

OK, this is all great, but how do we coerce the C# LINQ provider into executing this query? C# doesn’t have a positional operator! And whatabout the $elemMatch projection? There’s no LINQ method called ElementMatch or anything…

It turns out the Mongo Driver LINQ provider translates the LINQ Any method into $elemMatch. So the expression cls => cls.Students.Any(s => s.Name == "Bob") will give us that query.

That makes a bit of sense: it lines up with how you might query an in memory list of Classes using LINQ to Objects.

But what about the Positional Operator? Unfortunately, since there’s no C# language support for a custom operator like this, the authors of the C# driver have special-cased an index of -1 to mean “use the positional operator”.

So the whole thing winds up looking like this:

FindOneAndUpdate.cs
var client = new MongoClient("mongodb://localhost:27017")
var db = client.GetDatabase("school");
var classes = db.GetCollection<Class>("classes");

classes.FindOneAndUpdate(
    c => c.Code == "CS321" && c.Students.Any(s => s.Name == "Bob"), // find this match
    Builders<Class>.Update.Set(c => c.Students[-1].Grade, 72));     // -1 means update first matching array element

I think the hard-coded -1 index¹ is confusing. You have to know that the LINQ provider handles that in a different way than it would work in LINQ to objects. You could use a constant, like PositionalOperator to bring clarity, but I’d rather see an extension method provided by Mongo called something like FirstMatching(). Then it might look like:

ProposedSyntax.cs
// NOT SUPPORTED SYNTAX
classes.FindOneAndUpdate(
    c => c.Code == "CS321" && c.Students.Any(s => s.Name == "Bob"),
    Builders<Class>.Update.Set(c => c.Students.FirstMatching().Grade, 72));

Anyway, good luck.

1.Note that if you've modeled the Students property as IEnumerable<Student> you won't be able to apply the indexer ([]) in your LINQ expression. In that case you can use c.Students.ElementAt(-1) for the same effect. ↩