Federico Gambarino

MongoDB Bulk operations with the C# driver

July 20, 2020
MongoDB, C#

Some days ago I had to work on a technical task related to a user story: I should have find a way to update a quite big MongoDB collection from a .NET Core application. It was not a simple UpdateMany situation - when you update all the documents using the same value -, it was required that the update of each field should follow a specific logic.
Let’s say that in case of a very simple logic I would suggest using the Aggregation Pipeline, to let MongoDB adopt its optimizations. But for this task we took into account a different approach. A solution have been suggested to me: to use the BulkWriteAsync method that the MongoDB driver offers.

The BulkWriteAsync method

BulkWriteAsync method takes an IEnumerable of WriteModel abstract class as input (and an optional BulkWriteOptions as second parameter) and it wraps the db.collection.bulkWrite Mongo Shell method. There are several operations - which are derived classes of WriteModel - that you can perform with it and they are equivalent to the Mongo Shell ones:

Mongo ShellMongoDB C# DriverAction
insertOneInsertOneModelInserts the document provided
updateOneUpdateOneModelUpdates one document that matches the filter definition
updateManyUpdateManyModelUpdates all the documents that match the filter definition
replaceOneReplaceOneModelReplaces one document that matches the filter definition with the document provided
deleteOneDeleteOneModelDeletes one document that matches the filter definition
deleteManyDeleteManyModelDeletes all the documents that match the filter definition

pretty straightforward, right?
If you provide a list of operations to the BulkWriteAsync/bulkWrite those will be executed in bulk (in ordered or unordered way, that is something you can specify in the BulkWriteOptions parameter).

BulkWriteAsync in action

I will show you now a small example, based on what I did to complete the task assigned to me.
Let’s pretend that our document in the collection to update have this kind of structure:

public class ExampleModel {
	[BsonId]
	public string Id { get; set; }
	public string Name { get; set; }
	public string Surname { get; set; }
	public int    BirthYear { get; set; }
	// new property
	public string NickName { get; set; }
}

All the properties in the documents of our MongoDB collection exist, except for the NickName that has been recently required. Let’s pretend that the management decided that every user now needs a unique nickname, easy to remember. Something like:

  • first 3 letters of the Name
  • first 4 letters of the Surname
  • birth year

so that John Johnson born in 1976 will have johjohn1976 as nickname.

Note: for the sake of simplicity let’s suppose that Names have always three or more letters, Surnames have always four or more letters and that there are no conflicts in the unique nickname generation.

A possible solution is the one that follows:

public async Task RunBulkUpdates(IMongoDatabase database)
{
	var collectionToModify = database.GetCollection<ExampleModel>("exampleUsers");
	var listWrites = new List<WriteModel<ExampleModel>>();
	var filterToGetAllDocuments = Builders<ExampleModel>.Filter.Empty;
	var options = new FindOptions<ExampleModel, ExampleModel>
	{
		BatchSize = 1000
	};

	using (var cursor = await collectionToModify.FindAsync(filterToGetAllDocuments, options))
	{
		while (await cursor.MoveNextAsync())
		{
			var batch = cursor.Current;

			foreach (var doc in batch)
			{
				var filterForUpdate = Builders<ExampleModel>.Filter.Eq(x => x.Id, doc.Id);
				var nickName = GenerateNickName(doc);
				var updateDefinition = Builders<ExampleModel>.Update.Set(x => x.NickName, nickName);
				listWrites.Add(new UpdateOneModel<ExampleModel>(filterForUpdate, updateDefinition));
			}

			await collectionToModify.BulkWriteAsync(listWrites);
			listWrites.Clear();
		}
	}

	string GenerateNickName(ExampleModel model)
	{
		// uses SubString since we defined that name and surname have a length at least of 3 and 4
		var sb = new StringBuilder();
		sb.Append(model.Name.Substring(0,3))
		  .Append(model.Surname.Substring(0,4))
		  .Append(model.BirthYear);

		return sb.ToString();
	}
}

To keep it short and simple I avoided adding any kind of log and exception management, but they are very important. This is just an example!

The RunBulkUpdates method gets all the documents of the collection exampleUsers, since we have to update them all.
Then, using a cursor and with batches of 1000 documents, it defines how to update the nickname of each document. It uses the UpdateOneModel class, since each document will have a unique nickname (and we are even filtering using its specific BSON Id).
The GenerateNickName local function is the function responsible for generating the nickname with the rules defined above.
Once the listWrites list has all the WriteModel operations of the current batch, the BulkWriteAsync method is called, subsequently the listWrites list is cleared, waiting to store the operations of the next batch.

BulkWriteAsync is very performant way to insert/update/delete large sets of data. You should definitely consider it!

...and that's it for today! Thank you for your time 😃

Federico


© 2022, Federico Gambarino - All Rights Reserved

Made with ❤️ with Gatsby