I originally posted this on the Couchbase blog.
Most of us could churn out a first stab at a relational database model while sleeping.
Once you’ve chosen to work with a document database, though, you’ll need to think a little differently.
It’s no more difficult, it’s just that you’re optimising for different things.
So, what are the basics to getting it right?
Well, there are three principles that can help guide your thinking:
- Answer the questions you know you’ll ask.
- Embed data for convenience, refer for integrity.
- Name your keys predictably and semantically.
Today, let’s look at the first of those in detail.
Questions, not answers
When we split our data into the tables, columns and rows of the relational model, we’re optimising for queryability. We’re building a source of almost unbound answers and deferring our decisions about what questions to ask.
Let’s take a simple example: a stock management system that lets us track Couchbase-branded swag.
In this system, we have t-shirts, USB sticks, pens and that sort of thing. From time to time we get orders to send them out to meet-up groups, conferences and individuals.
Most likely, that’d give us the following relational tables:
At first glance, Order details might seem non-obvious. However, it allows us to store references to all the items in each order without breaking the first normal form. Otherwise, we’d have to serialise the line items from each order into a string and store that in a TEXT column.
Unless we make a mistake, each product, customer and order will appear only once in our database. That gives us a guarantee that updates to records are universal and it makes it supremely easy to query the data in whatever way we choose.
Trade-offs, there are always trade-offs
That way, we get to store the data for our swag management system in its purest form and then query it in whatever way suits us later on.
So, what’s the problem?
Well, there are a couple of trade-offs:
- SQL queries are expensive: for each join there’s a disk seek, there’s CPU overhead, there’s a user waiting for their page to render.
- Relational databases are hard to scale out across a cluster.
Let’s focus on that first trade-off for now: if you’re being unkind, you could say that relational data modelling is an exercise in stealing CPU cycles from your future self.
We know most of our query patterns early on and yet we so often spend time carefully stripping our data of that context. We split the data once and then spend the rest of our applications’ lifetime asking the database server to piece it all back together.
Of course, that has its place but it’s almost the very opposite of the most efficient path with a document database.
Don’t make people wait for answers
The first thing we do when modelling for a document database is ask, “what questions do I want to ask of my data?”
Then when our system state changes, we compute the answers to those questions and store them pre-canned in the database.
Rather than re-piecing the answers each time we make a query, we pull the answer fully formed from the database in a single look-up.
What does that mean in practice?
Let’s go back to our swag management example. One of our questions would be:
What do I need to do in order to fulfill a particular order?
With a relational database, we’d write a SQL query that would find the order, use a join to find the items in the order, then another join to find the detail of what each item is and a further join to find the customer details.
With Couchbase, it would look something more like this:
- User selects their items and makes their order.
- Our system writes the order into the database as a single document.
& When we need to pull out order details, it’s one read to grab the lot.
The resultant order document might look like this:
"name": "Matthew Revell",
"address": "11-21 Paul Street",
"itemName": "Red Couchbase t-shirt",
"itemName": "Black 8GB USB stick with red Couchbase logo",
This example is pretty heavily denormalised. Our swag management system would have details of each customer and product in separate documents already but we’re repeating what we know about them by embedding their details here.
We’ll look at the trade-offs involved with embedding data versus referring to data in a future post.
Production is more complex
In a production system, we’d most likely generate several of those ready made answers. Without really thinking about it, we can probably come up with a few of them for our swag management system:
- Customer order history: customers want to look back at everything they’ve ordered, so we’d record this order against that person.
- Live order status: similarly, our customers will want to see the status of their order, so we can generate that information now and then update whenever anything changes.
- Dispatch instructions: we’ll need to tell the people in our warehouse what they need to send where.
The important thing is that we don’t have to generate all of these while a human being is waiting for a response.
While we might want to update the customer’s live order status immediately, so that they can verify their order was recorded, it’s fine to process the dispatch instruction asynchronously.
This way, when a human views something in our system, the data is already there waiting.
In summary: pre-compute your answers
The first step to efficient document database modelling is to think in terms of pre-computing your answers.
We know up-front what questions we want to ask and so we can cater to those questions in the way that we write our data.
While there’s a lot of intellectual appeal in the idea of a pure, normalised, mathematically sound representation of our data, in practice it can make it harder to serve multiple concurrent users and to scale our operation as demand changes.
By computing our various answers at write time, we:
- remove lag from our user experience
- can more easily distribute the data across a cluster
- get the added bonus of less mismatch between object state and what’s in the database.
Next time I’ll look at when to embed data and when to refer data.
Last week we held London’s first developer conference dedicated to scalable systems: Span.
Over 100 people attended, 11 people gave talks and six companies sponsored. Here’s my take on what went well and what didn’t.
Why a scaling conference in London?
Working as an evangelist/advocate, I’ve been to a lot of developer conferences.
Most are pretty good but there are some things that I, along with friends, had noticed made for a poorer conference experience. Namely:
- Sponsored talks: attendee happiness and engagement appears to be directly inverse to the number of sponsored talks. Some conferences appear to reject good talks in the hope of encouraging the purchase of a speaking slot.
- Vendor presentations in disguise: even when the talks aren’t sponsored, direct sales pitches creep into the programme and are usually dull.
- Drinks: coffee is often horrible, cold drinks are usually unhealthy and, most important of all, they all run out too quickly.
- Multiple tracks: no doubt, there are good reasons for multi-track conferences but one track of great talks is better than two tracks of reasonably interesting talks.
- Poorly defined topic/veering from the topic: there has been a creeping tendency for developer conferences to mix things up with non-developer talks and it’s very hard to make these work well.
- Talks without take-aways: sometimes it’s good to have a talk that deals with the abstract but most people are willing to take a day out for a conference that will give them stuff they can put into action.
So, I had opinions on what made some conferences less appealing to me.
It also seemed that there was a gap, in London at least, for a conference that dealt with building scalable systems.
Scalability is where so much interesting stuff is happening right now. Whether we’re building small web apps in our spare time or working on multi-million dollar applications, dealing with scale is something we either aspire to or face each day.
Each of the many London developer conferences were either narrowly focused on an aspect of something related to scaling – for example, NoSQL databases or microservices – or much broader in scope, giving only one or two slots to scalability topics.
A recipe for a scaling conference that’d make me happy
Towards the beginning of this year I began toying with ideas and came up with a list of what I’d want to see from a scaling conference:
- No sales pitches.
- Every talk had to stick to the topic.
- Shorter talks that got to the point.
- Practical takeaways, ideally focusing on war stories and lessons learned.
- A single track on a single day.
- Good food, good coffee, good opportunities to chat to other attendees.
Pretty quickly I pulled Phil Fehre into it and we started making plans.
Almost the first thing we did was to form a programme committee of people we knew who had professional experience working in distrubuted systems. Although we had experience with the topic, we wanted the programme to be validated by a larger group. There was also the issue of making the event look larger than two blokes who fancied running a conference.
So, we had the topic, we had a formula and we had a programme committee. Next came the scary part.
It’s almost all entirely guesswork
Conferences are expensive to run.
Even if you aim to keep costs down, they’re beyond the level at which a normal person could comfortably cover the costs if things went wrong.
So, you’re taking a personal risk of several thousand pounds based on an instinct that you have a good topic, interesting people will want to speak and that you have the wherewithall to make it all happen.
That leaves a terrifying period where you build something that looks like a conference but without any income. At the very least you need a venue and a date, which means making the following commitment: if someone else enquires about the date you’ve reserved, you have to be ready to pay a sizeable deposit (often 100%) within 24 hours.
So, you reserve a date and hope that you can convince at least one sponsor to support the event before you have to pay for the venue. At that stage you’re selling an idea and yourself: yes, this is a good idea and yes I have both the skills and the contacts to really make this happen.
We were lucky. Within a few minutes of tweeting, we had secured our first sponsor. That only made things more terrifying: now a real company was betting real money on our ability to execute.
We opened the call for papers and spread word of it wherever we could. Our programme committee were great at helping there. Talk submissions trickled in and, within time, we had a good selection. People have interesting things to say and, luckily, most people paid attention to our requirements. Of course, some thinly disguised vendor pitches came our way and we rejected them politely.
This was the first point at which I noticed how much grunt work the conference required. Getting talk submissions required writing a lot of emails hoping to persuade people that Span would be an interesting place for them to speak. If you can’t doggedly persist at repeated tasks, while remaining human and engaging, then you might not want to run a conference.
We ran a voting system where each member of the programme committee would give a +1, neutral or -1 for a talk, along with comments that we hoped would be helpful to rejected submitters.
After a while we had a lot of good talk submissions.
We wanted Span to be a conference at which everyone felt comfortable, so we adopted the Ada Initiative’s code of conduct.
It was also important to us that our speakers were a diverse group. After quite a few weeks, we had only one submission from a non-male speaker and that was entirely off-topic for the conference.
Spoiler alert: we didn’t do well with this. I put a lot of effort, worry and time into trying to resolve this. I contacted several groups who were helpful in spreading the word and I directly contacted speakers who had on-topic experience and who would increase the diversity of our programme. For whatever reason, I wasn’t successful in persuading non-male speakers to submit talks.
That’s a matter of disappointment for me and it’s a priority to fix that should Span happen again.
Choosing a ticket price
Honestly, we had no idea how much to charge for tickets.
There were basic costs we couldn’t ignore: catering per head came to around £40. Ticketing and payment provider costs took at least £5 from each ticket.
Sponsorships helped but we had to hit a sizeable number of ticket sales in order to cover fixed costs.
After looking at similarly sized conferences, we came up with:
- £39 for students (thereby making a loss on each one sold)
- £149 for the early bird
- £175 for the standard ticket.
Before the early bird sold out, I panicked a little and changed the standard ticket to £155 as I felt £175 was too high for an unknown conference. I also wanted to make sure as many people could afford to come as possible.
Not everyone paid these prices. We offered discounts to groups of 20% or more and also to certain meet-up groups who helped us with promotion.
We’ve had just one complaint that discounts were made available to certain groups. I think it’s fairly well understood that discounts are made available to groups that help out with promotion and so on.
Now comes the biggest secret of small conference organisers: we gave away some of the tickets for free. Having spoken to other conference organisers, this seems to be common practice for new events. It seems unfair on people who paid, or asked their employer to pay, for their tickets but it’s an inescapable reality of trying out a new conference. Those who paid clearly valued the event at that amount; those who didn’t pay were also able to enjoy it but without going over their budget. Importantly to us, those people helped to make the event a success by giving the speakers and sponsors a larger audience and by adding to the break-time conversations etc.
Naturally I looked first at Eventbrite but two things put me off:
- they hold all the ticket money until after the event
- their total charge for a £149 ticket would be £9.59 per ticket.
We couldn’t run the conference without having access to at least some of the ticket money before the event and you’re putting a lot of faith in one company’s ability to stay in business. Of course, I didn’t doubt for one minute that Eventbrite would stay around but their holding the money felt wrong and was inconvenient.
Out of a few alternatives, Ti.to seemed like the best choice. Importantly, they had ready-made Stripe integration, meaning we’d get the ticket money one week after each purchase, and their fees were a little more reasonable. The total we paid to Ti.to and Stripe for a £149 ticket was £8.26.
I’m glad we used Ti.to but it is a much more stripped back service than others. There’s no QR code generation, no mobile app to check people in and so on.
If I look at the functionality offered by ticketing services and compare it with other SaaS things I pay for, I’m not convinced that ticketing services have yet found a pricing model that offers good value for money.
The day itself
After a last minute pub meet-up for attendees the night before, the day of Span was long, hard work and fun.
114 people showed up out of 135 registrations. We ran the whole thing with me, Phil and my mate Neil as Span crew, plus four people from the venue and Tony Whitmore taking stills and Max Arnold filming.
The venue, Shoreditch Village Hall, was great. Just the right size and they’re well practised in running tech events. The food and coffee were particularly excellent. If you’re considering running a tech conference in London, speak to them.
I was particularly concerned that the speakers should have a good experience of the conference and I believe they did. Each of them has said that they enjoyed it.
The programme was too long, howewever. In an effort to provide great value for the ticket price, and because we struggled to say no to good talk submissions, we squeezed eleven talks into a ten hour day.
What I’d do differently next time
I’ve not yet thought in detail about what I’d do differently next time but I have some ideas. I’m sure Phil has some thoughts, too.
Off the top of my head:
- Next time, we must have a more diverse set of speakers.
- I’d make the early bird a few quid cheaper to make it more compelling to buy early and to make the event more accessible to all.
- Fewer talks, longer breaks.
- An official after-party, but we had an impromptu thing this time and it worked out well.
Overall, Span was six months of late nights, lots of emailing, some sleepless nights worrying about money, a sense of growing excitement as things started working out and an amazing day that almost all feedback says was a great first attempt!
Thanks to everyone who sponsored, to Couchbase for being cool with it, to all of the speakers, to everyone who helped out and, of course, to everyone who came. I’m overwhelmed by how positive the feedback has been and excited that all that work seems to have paid off.
There’s more I could’ve covered in this post but it’s long enough already. Maybe Span’ll happen again next year but, for now, I’m off to do something else with those few moments of spare time.