Paginate with Step Functions
AWS Step Functions AWS SDK integration lets you call huge selection of AWS services directly from your Step Functions workflow.
For API calls that can return a large list of items, APIs are returning by default only the first set of results. For example, S3 list objects response returns by default max. 1000 objects. Rest of the results must be requested by providing pagination token on the request.
For data processing, pagination is very useful and mandatory pattern. Dividing a result set to fixed size pages makes it easier to build for example meaningful retry logic for error handling. Also, executing partial result sets in parallel can improve a workflow running time significantly.
In this example I am showing how listing objects(arn:aws:states:::aws-sdk:s3:listObjectsV2
) on S3 bucket and then triggering a processing step with batch of S3 objects would be implemented in Step Functions ASL(Amazon State Language).
Note: some AWS APIs use NextToken
to paginate the results. Workflow with pagination is still same as I am covering next with ContinuationToken
.
How to implement pagination with ASL.
This example shows very simple flow of listing objects on S3 buckets and then triggering the processing step.
Below is the ASL definition. BatchSize
parameter controls how many S3 objects are included in each processing batch. We keep requesting new batches as long as the response is including IsTruncated: true
. Size of last object batch is below BatchSize
with IsTruncated: false
so we can finish processing.
{
"Comment": "List S3 objects.",
"StartAt": "list_s3",
"States": {
"list_s3": {
"Comment": "Get first batch of objects.",
"Type": "Task",
"Resource": "arn:aws:states:::aws-sdk:s3:listObjectsV2",
"ResultPath": "$.s3_objects",
"Parameters": {
"Bucket": "${BucketName}",
"MaxKeys": ${BatchSize}
},
"Next": "process_s3_objects"
},
"process_s3_objects": {
"Comment": "Processing logic. Now we just wait.",
"Type": "Wait",
"Seconds": 2,
"Next": "check_if_all_listed"
},
"check_if_all_listed": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.s3_objects.IsTruncated",
"BooleanEquals": false,
"Next": "success_state"
}
],
"Default": "list_s3_with_continuation_token"
},
"list_s3_with_continuation_token": {
"Comment": "Get next batch of objects. Provide ContinuationToken in the request.",
"Type": "Task",
"Resource": "arn:aws:states:::aws-sdk:s3:listObjectsV2",
"ResultPath": "$.s3_objects",
"Parameters": {
"Bucket": "${BucketName}",
"MaxKeys": ${BatchSize},
"ContinuationToken.$": "$.s3_objects.NextContinuationToken"
},
"Next": "process_s3_objects"
},
"success_state": {
"Type": "Succeed"
}
}
}
Wrapping up
AWS Step Functions is a perfect fit for coordinating workflows and orchestrating AWS services. I strongly recommend building library of good templates for getting a running start for adapting it to your use cases.