Accurate Atomic Counters with Amazon DynamoDB
Amazon DynamoDB is a fully managed key-value database at AWS. It’s great in a lot of ways: it’s serverless, it scales out incredibly wide, and it’s incredibly cheap for what it is. Good use cases for Dynamo are player preferences, game status, and low-cost global leaderboards.
But if you’re using Dynamo for an atomic counter under heavy load, that absolutey has to be accurate, it can be tricky to get right. In this post we’ll give an easy solution for incrementing a counter without losing updates.
To show this, we’ll imagine a common use case: adding one to a “likes”
counter on an image, say. In this example, we have a DynamoDB table
images
which has a single primary key of id
, an integer. Each item in
this table has, among other things, a likes
field which is supposed to store
the number of likes that images has.
How can we implement an atomic increment to that likes
column for any image
id
?
The Easy But (Sometimes) Wrong Way
Google DynamoDB atomic counters and you’ll immediately get to AWS’s documentation
for atomic counters.
The advice is to use ADD
where you simply add one to the likes
field of
an images
item.
In Python it looks like this:
import boto3
dynamodb = boto3.client('dynamodb')
def add_like(id):
"""Add one like to image `id`"""
dynamodb.update_item(
TableName='images',
Key={'id': {'N': id}},
UpdateExpression='ADD likes :num',
ExpressionAttributeValues={':num': {'N': '1'}}
)
So all that means is, “Add one to likes
where id
= id”.
Simple! And we like simple over here at Kickbox Frownyface.
However, there’s a problem here: this update isn’t actually atomic, in that if
you have two Lambda functions (say) calling this function at exactly the same
time, one of the ADD
functions is effectively dropped.
For example let’s say that the likes
for this adorable kitten picture
is at 5 and both Bob and Alice click “like” at pretty much the same time.
If you are unlucky, you may end up with likes
just set to 6 instead of 7,
because both Alice and Bob were adding one to 5 at precisely the same time.
(If you don’t believe me, try it out – fan out 1000 or so Lambdas and you’re
pretty likely to end up with likes
less than the expected 1000.)
Is that bad? Well, that really depends. If you’re OK with the occasional dropped update, then leave well enough alone and move on – you have bigger fish to fry!
But if your counters just must be correct, let’s make a Conditional Write.
Adding a Conditional Expression
What we want to do is change the function from “just add one” to “just add one to what I think it is right now”. In our example, when Bob clicks, the function will run “Add one to likes so long as the current number of likes is five” and when Alice clicks, the function will also run “Add one to likes so long as the current number of likes is five.” If both calls happen simultaneously, one will go through, and the other will be rejected because the conditional is no longer satisfied.
The code is pretty similar, we just do a fetch then a conditional update:
import boto3
dynamodb = boto3.client('dynamodb')
def add_like(id):
"""Add one like to image `id`"""
existing = dynamodb.get_item(
TableName='images',
Key={'id': {'N': id}}
)
dynamodb.update_item(
TableName=self.table_name,
Key={'id': {'N': id}},
UpdateExpression='ADD likes :num',
ConditionExpression='likes = :current',
ExpressionAttributeValues={':num': {'N': '1'},
':current': {'N': existing['Item']['likes']['N']
}
)
This code is fetching the current record for image id
and storing that
in existing
. Then it does an update on that image, adding 1 to likes
so long as the current likes value is equal to the value we just fetched.
Now, if Alice and Bob both add their likes
at precisely the same time, one of them will go through and the other
will have a ConditionalCheckFailedException
thrown.
Are we done? Not quite! We still have the number of likes
for that
image set to six! Either Alice’s or Bob’s like wasn’t recorded.
But now at least we know it was dropped, so let’s retry the failed one
with code like this:
def add_like_safely(id, tries=10):
attempts, success = 0, False
while attempts <= tries and not success:
try:
attempts = attempts + 1
results = add_like(id)
success = True
except ClientError as e:
if e.response['Error']['Code'] == 'ConditionalCheckFailedException':
time.sleep(0.01)
else:
raise
return success
This way, if both Alice and Bob call add_like_safely
at precisely the
same time, one of them will get through first… but the exception will
be caught and the other will try again after a short delay to give the
data time to propagate out.