DynamoDB discoveries

A lot of preparation needs to go into a DynamoDB design.

Operational considerations:

Good resources

Who to follow on Twitter, when it comes to sharing DynamoDB knowledge

Case study

Via #databases on https://awsdevelopers.slack.com/, Martin asks:

Hello, everyone. I’m pretty new with DynamoDB and I need some advice on which will be the best Partition Key and Sort Key for my use case. We are migration this table jobs which is not complex but have thousands of queries per hour by date, basically always is #date between :start_time and :end_time. Below the specification

table name: jobs
Id:         unique string
created_at: date time, format ISO 8601: 20150311T122706Z
Data:       string

This is an example how we need access the data

tableName = 'jobs'
params = {
	table_name: tableName,
	key_condition_expression: "#date between :start_time and :end_time",
	expression_attribute_names: {
		"#date" => "created_at"
	},
	expression_attribute_values: {
		":start_time" => "20150311T122706Z",
		":end_time" => "20150313T122700Z"
	}
}
DynamodbClient.client.query(params)

Michael says:

Depending on how many days you query at a time, you could even make the day (or month) the hash key, and the rest of the timestamp (plus a job id?) the range key. Advantages are that, assuming you have an even distribution of job creation times, each hash key shouldn't get too "hot" (ie, you'll only have as many in each hash as you create in a day, or month). Disadvantages are that querying multiple days (or months) will mean multiple queries (can do in parallel, but if you have hundreds, bleh). And if you're creating millions a day (or month) this wouldn't be suitable either

Rich says:

I'd be inclined to use the job ID as the PK for the table because changing that value requires writing a new record and removing the old one. I'd then create a secondary index with the PK based on what you're likely to query. For example: date (pk) + date/time (sk); "date + status" (pk) + time (sk); "date + hour + status" (pk) + time (sk). Now if you want all the pending jobs in the last hour you query the key "2019090906pending".

Found any of my content interesting or useful?