Connor Forsyth - Why I quickly moved from MongoDB to RethinkDB

I recently made a difficult decision to move a relatively young feathers.js project away from MongoDB towards RethinkDB.

To most, the idea sounds pretty insane. I moved away from a well-backed, well-funded NoSQL database with clear best practices and a mature ecosystem of ORMs and open source tools.

RethinkDB development on the other hand had recently ground to a halt, the ecosystem lacked any actively maintained ORMs and just a few months earlier, the company behind it had gone bust.

It helps to have some context around what type of app I’m building before I try to validate my reasoning.

My Project and Requirements

I am building a highly reactive, real-time application with Vue.js and feathers.js. It integrates with a number of third parties and will handle a large volume of messages from various possible sources.

I needed to group conversations and messages together, but be able to query messages separately. I needed a NoSQL structure because there is a lot of custom metadata that can be attached to a conversation.

At first I tried to use MongoDB and model this with messages as embedded documents in a conversation.

Unfortunately there are serious limitations with this approach.

You can’t query each message separately (you must pull out the conversation related to the message in its entirety). When you issue an update to a message, you can only update the conversation as a whole. This can yield to serious write-locks in a real-time app where people are collaborating on each conversation. As I was pushing updates from external APIs at varying times (metadata on location, open rates etc) I needed the ability to make cheap and quick updates.

The alternative model in MongoDB would be to have messages in their own collection. While this solves write-locks and offers full querying, it still leads to the major inefficiency of having to make two lookups (one for the conversation, and one for the messages). It basically crams a relational model into a document-based storage, and you get a lot of negatives from both storage models with few of the benefits of each.

What I was really looking for was full query flexibility on a separate message collection, joined with a flexible ‘schema-less’ conversation. RethinkDB checked these boxes straight away.

RethinkDB is Truly Real-Time

RethinkDB has one serious advantage in its arsenal: it pushes real-time updates to the client. This allows you to ‘subscribe’ to a query and simply receive new data whenever something changes. You can do something similar with MongoDB oplog, but the only notable framework using that method is Meteor.js.

With RethinkDB, you get real-time out of the box.

When you combine the real-time nature of RethinkDB with feathers.js, a framework designed for real-time applications, you get ridiculously simple reactive data in your backend and front-end. For example, if you use the official feathers-reactive package freely available on npm, you can do something along the lines of:

this.messages = app.service('messages').find({conversationId: this.conversationId}).subscribe(messages => { this.messages = messages});

It’s that easy to reactively update your messages.

Given RethinkDB is real-time at the back, you don’t have to handle everything through feathers.js either. If you have a background worker processing message metadata, you can update the message directly with the database and that information will still be propagated to the front-end of any subscribed client in real-time. The Linux Foundation

One of the main reservations I had about using RethinkDB was, understandably, the company behind it no longer existed. In recent weeks this has been generally sidelined by the fact The Linux Foundation (who maintain Node.js among others) have purchased and relicensed the database technology. This makes me far more confident that it will continue to receive updates and development will start again shortly.

My Data Model

Obviously with any change of database, I did need to adapt the data model slightly.

Carrying on with my earlier example, instead of having messages embedded in a conversation, they’re now in their own table. I set a reference field (conversationId) on each of the messages. This allows me to join them when I want and to populate a ‘messages’ property on my conversation object before it exits the data layer of my backend application.

db.table('conversations')
    .eqJoin('messageId', db.table('messages'))
    .map(row => {
        return row('left').merge( { messages: row('right') });
    });

I’m used to working with an ORM, so I have to admit it makes me feel slightly nervous no longer having one.

RethinkDB does have one notable ORM called thinky but it is no longer actively maintained. I considered using it anyway and forking any changes I may make as my application grows but this would have a real learning cost and my needs are actually relatively simple at the moment - I don’t need anything more than simple joins that can be easily abstracted. Plus, I don’t really have a ‘domain model’ in my Node.js app - my objects are typically data-only with no behaviour, and are exchanged like messages. In this type of situation I would gain a very limited amount by using an ORM.

I am going to do a full writeup of the layer of hooks /(middleware)/ I plan to use in order to tame the raw database queries in my application, so stay peeled for that in the coming days!