Population
MongoDB has the join-like $lookup aggregation operator in versions >= 3.2. Mongoose has a more powerful alternative called populate()
, which lets you reference documents in other collections.
Population is the process of automatically replacing the specified paths in the document with document(s) from other collection(s). We may populate a single document, multiple documents, plain object, multiple plain objects, or all objects returned from a query. Let's look at some examples.
var mongoose = require('mongoose');
var Schema = mongoose.Schema;
var personSchema = Schema({
_id: Schema.Types.ObjectId,
name: String,
age: Number,
stories: [{ type: Schema.Types.ObjectId, ref: 'Story' }]
});
var storySchema = Schema({
author: { type: Schema.Types.ObjectId, ref: 'Person' },
title: String,
fans: [{ type: Schema.Types.ObjectId, ref: 'Person' }]
});
var Story = mongoose.model('Story', storySchema);
var Person = mongoose.model('Person', personSchema);
So far we've created two Models. Our Person
model has
its stories
field set to an array of ObjectId
s. The ref
option is
what tells Mongoose which model to use during population, in our case
the Story
model. All _id
s we store here must be document _id
s from
the Story
model.
Note: ObjectId
, Number
, String
, and Buffer
are valid for use
as refs. However, you should use ObjectId
unless you are an advanced
user and have a good reason for doing so.
Saving refs
Saving refs to other documents works the same way you normally save
properties, just assign the _id
value:
var author = new Person({
_id: new mongoose.Types.ObjectId(),
name: 'Ian Fleming',
age: 50
});
author.save(function (err) {
if (err) return handleError(err);
var story1 = new Story({
title: 'Casino Royale',
author: author._id // assign the _id from the person
});
story1.save(function (err) {
if (err) return handleError(err);
// thats it!
});
});
Population
So far we haven't done anything much different. We've merely created a Person
and a Story
. Now let's take a look at populating our story's author
using the query builder:
Story.
findOne({ title: 'Casino Royale' }).
populate('author').
exec(function (err, story) {
if (err) return handleError(err);
console.log('The author is %s', story.author.name);
// prints "The author is Ian Fleming"
});
Populated paths are no longer set to their original _id
, their value
is replaced with the mongoose document returned from the database by
performing a separate query before returning the results.
Arrays of refs work the same way. Just call the
populate method on the query and an
array of documents will be returned in place of the original _id
s.
Note: mongoose >= 3.6 exposes the original _ids
used during
population through the
document#populated() method.
Setting Populated Fields
In Mongoose >= 4.0, you can manually populate a field as well.
Story.findOne({ title: 'Casino Royale' }, function(error, story) {
if (error) {
return handleError(error);
}
story.author = author;
console.log(story.author.name); // prints "Ian Fleming"
});
Note that this only works for single refs. You currently can't manually populate an array of refs.
Field selection
What if we only want a few specific fields returned for the populated documents? This can be accomplished by passing the usual field name syntax as the second argument to the populate method:
Story.
findOne({ title: /casino royale/i }).
populate('author', 'name'). // only return the Persons name
exec(function (err, story) {
if (err) return handleError(err);
console.log('The author is %s', story.author.name);
// prints "The author is Ian Fleming"
console.log('The authors age is %s', story.author.age);
// prints "The authors age is null'
})
Populating multiple paths
What if we wanted to populate multiple paths at the same time?
Story.
find(...).
populate('fans').
populate('author').
exec()
If you call populate()
multiple times with the same path, only the last
one will take effect.
// The 2nd `populate()` call below overwrites the first because they
// both populate 'fans'.
Story.
find().
populate({ path: 'fans', select: 'name' }).
populate({ path: 'fans', select: 'email' });
// The above is equivalent to:
Story.find().populate({ path: 'fans', select: 'email' });
Query conditions and other options
What if we wanted to populate our fans array based on their age, select just their names, and return at most, any 5 of them?
Story.
find(...).
populate({
path: 'fans',
match: { age: { $gte: 21 }},
// Explicitly exclude `_id`, see http://bit.ly/2aEfTdB
select: 'name -_id',
options: { limit: 5 }
}).
exec()
Refs to children
We may find however, if we use the author
object, we are unable to get a
list of the stories. This is because no story
objects were ever 'pushed'
onto author.stories
.
There are two perspectives here. First, you may want the author
know
which stories are theirs. Usually, your schema should resolve
one-to-many relationships by having a parent pointer in the 'many' side.
But, if you have a good reason to want an array of child pointers, you
can push()
documents onto the array as shown below.
author.stories.push(story1);
author.save(callback);
This allows us to perform a find
and populate
combo:
Person.
findOne({ name: 'Ian Fleming' }).
populate('stories'). // only works if we pushed refs to children
exec(function (err, person) {
if (err) return handleError(err);
console.log(person);
});
It is debatable that we really want two sets of pointers as they may get
out of sync. Instead we could skip populating and directly find()
the
stories we are interested in.
Story.
find({ author: author._id }).
exec(function (err, stories) {
if (err) return handleError(err);
console.log('The stories are an array: ', stories);
});
The documents returned from
query population become fully
functional, remove
able, save
able documents unless the
lean option is specified. Do not confuse
them with sub docs. Take caution when calling its
remove method because you'll be removing it from the database, not just
the array.
Populating an existing document
If we have an existing mongoose document and want to populate some of its paths, mongoose >= 3.6 supports the document#populate() method.
Populating multiple existing documents
If we have one or many mongoose documents or even plain objects
(like mapReduce output), we may
populate them using the Model.populate()
method available in mongoose >= 3.6. This is what document#populate()
and query#populate()
use to populate documents.
Populating across multiple levels
Say you have a user schema which keeps track of the user's friends.
var userSchema = new Schema({
name: String,
friends: [{ type: ObjectId, ref: 'User' }]
});
Populate lets you get a list of a user's friends, but what if you also
wanted a user's friends of friends? Specify the populate
option to tell
mongoose to populate the friends
array of all the user's friends:
User.
findOne({ name: 'Val' }).
populate({
path: 'friends',
// Get friends of friends - populate the 'friends' array for every friend
populate: { path: 'friends' }
});
Populating across Databases
Let's say you have a schema representing events, and a schema representing conversations. Each event has a corresponding conversation thread.
var eventSchema = new Schema({
name: String,
// The id of the corresponding conversation
// Notice there's no ref here!
conversation: ObjectId
});
var conversationSchema = new Schema({
numMessages: Number
});
Also, suppose that events and conversations are stored in separate MongoDB instances.
var db1 = mongoose.createConnection('localhost:27000/db1');
var db2 = mongoose.createConnection('localhost:27001/db2');
var Event = db1.model('Event', eventSchema);
var Conversation = db2.model('Conversation', conversationSchema);
In this situation, you will not be able to populate()
normally.
The conversation
field will always be null, because populate()
doesn't
know which model to use. However,
you can specify the model explicitly.
Event.
find().
populate({ path: 'conversation', model: Conversation }).
exec(function(error, docs) { /* ... */ });
This is known as a "cross-database populate," because it enables you to populate across MongoDB databases and even across MongoDB instances.
Dynamic References
Mongoose can also populate from multiple collections at the same time. Let's say you have a user schema that has an array of "connections" - a user can be connected to either other users or an organization.
var userSchema = new Schema({
name: String,
connections: [{
kind: String,
item: { type: ObjectId, refPath: 'connections.kind' }
}]
});
var organizationSchema = new Schema({ name: String, kind: String });
var User = mongoose.model('User', userSchema);
var Organization = mongoose.model('Organization', organizationSchema);
The refPath
property above means that mongoose will look at the
connections.kind
path to determine which model to use for populate()
.
In other words, the refPath
property enables you to make the ref
property dynamic.
// Say we have one organization:
// `{ _id: ObjectId('000000000000000000000001'), name: "Guns N' Roses", kind: 'Band' }`
// And two users:
// {
// _id: ObjectId('000000000000000000000002')
// name: 'Axl Rose',
// connections: [
// { kind: 'User', item: ObjectId('000000000000000000000003') },
// { kind: 'Organization', item: ObjectId('000000000000000000000001') }
// ]
// },
// {
// _id: ObjectId('000000000000000000000003')
// name: 'Slash',
// connections: []
// }
User.
findOne({ name: 'Axl Rose' }).
populate('connections.item').
exec(function(error, doc) {
// doc.connections[0].item is a User doc
// doc.connections[1].item is an Organization doc
});
Populate Virtuals
New in 4.5.0
So far you've only populated based on the _id
field. However, that's
sometimes not the right choice.
In particular, arrays that grow without bound are a MongoDB anti-pattern.
Using mongoose virtuals, you can define more sophisticated relationships
between documents.
var PersonSchema = new Schema({
name: String,
band: String
});
var BandSchema = new Schema({
name: String
});
BandSchema.virtual('members', {
ref: 'Person', // The model to use
localField: 'name', // Find people where `localField`
foreignField: 'band', // is equal to `foreignField`
// If `justOne` is true, 'members' will be a single doc as opposed to
// an array. `justOne` is false by default.
justOne: false
});
var Person = mongoose.model('Person', PersonSchema);
var Band = mongoose.model('Band', BandSchema);
/**
* Suppose you have 2 bands: "Guns N' Roses" and "Motley Crue"
* And 4 people: "Axl Rose" and "Slash" with "Guns N' Roses", and
* "Vince Neil" and "Nikki Sixx" with "Motley Crue"
*/
Band.find({}).populate('members').exec(function(error, bands) {
/* `bands.members` is now an array of instances of `Person` */
});
Keep in mind that virtuals are not included in toJSON()
output by
default. If you want populate virtuals to show up when using functions
that rely on JSON.stringify()
, like Express'
res.json()
function,
set the virtuals: true
option on your schema's toJSON
options.
// Set `virtuals: true` so `res.json()` works
var BandSchema = new Schema({
name: String
}, { toJSON: { virtuals: true } });
If you're using populate projections, make sure foreignField
is included
in the projection.
Band.
find({}).
populate({ path: 'members', select: 'name' }).
exec(function(error, bands) {
// Won't work, foreign field `band` is not selected in the projection
});
Band.
find({}).
populate({ path: 'members', select: 'name band' }).
exec(function(error, bands) {
// Works, foreign field `band` is selected
});
Next Up
Now that we've covered query population, let's take a look at connections.