DynamoDB discoveries

A lot of preparation needs to go into a DynamoDB design.

normalised fields where strings are lowercased
composite keys can be quite loaded
tag use case, BatchWriteItem might come into play
versioning
getting design support from AWS
SDK pain like getting to know dynamoattribute / expression builder and v1/v2 confusion
Are projections premature optimisation?
code struct doesn't document the table, people describe their setup in a README
Are keys Capitalised? Rich likes them Generic for Appsync
Scans versus Query... is it ever OK to do a scan?
How to store time? UTC? You want it in ISO8601 type format to make it queryable as opposed to epoch
Designs are rarely captured in code README doc example

Operational considerations:

Use on-demand scaling but that's 6x more expensive
Predicting cost isn't easy until you know the X requests per second. And some people tend to think you you should be limits in, whilst developing.

Good resources

https://www.dynamodbguide.com/

Who to follow on Twitter, when it comes to sharing DynamoDB knowledge

Case study

Via #databases on https://awsdevelopers.slack.com/, Martin asks:

Hello, everyone. I’m pretty new with DynamoDB and I need some advice on which will be the best Partition Key and Sort Key for my use case. We are migration this table jobs which is not complex but have thousands of queries per hour by date, basically always is #date between :start_time and :end_time. Below the specification

table name: jobs
Id:         unique string
created_at: date time, format ISO 8601: 20150311T122706Z
Data:       string

This is an example how we need access the data

tableName = 'jobs'
params = {
	table_name: tableName,
	key_condition_expression: "#date between :start_time and :end_time",
	expression_attribute_names: {
		"#date" => "created_at"
	},
	expression_attribute_values: {
		":start_time" => "20150311T122706Z",
		":end_time" => "20150313T122700Z"
	}
}
DynamodbClient.client.query(params)

Michael says:

Depending on how many days you query at a time, you could even make the day (or month) the hash key, and the rest of the timestamp (plus a job id?) the range key. Advantages are that, assuming you have an even distribution of job creation times, each hash key shouldn't get too "hot" (ie, you'll only have as many in each hash as you create in a day, or month). Disadvantages are that querying multiple days (or months) will mean multiple queries (can do in parallel, but if you have hundreds, bleh). And if you're creating millions a day (or month) this wouldn't be suitable either

Rich says:

I'd be inclined to use the job ID as the PK for the table because changing that value requires writing a new record and removing the old one. I'd then create a secondary index with the PK based on what you're likely to query. For example: date (pk) + date/time (sk); "date + status" (pk) + time (sk); "date + hour + status" (pk) + time (sk). Now if you want all the pending jobs in the last hour you query the key "2019090906pending".

Found any of my content interesting or useful?