S3: Paths to all existing files in directory are not loading (Ruby)

Wajeeh Ahsan
2 min readApr 4, 2022

--

According to the project requirement I’m working on, I was supposed to fetch files from S3 using aws-sdk gem.
At some point, we faced an issue that list_objects_v2 method was fetching some of the file paths(key) and remaining were missed. After googling much, came to know that it fetches Up to 1000. And if there’re more files in the directory that were not fetched, an parameter in the response object (named as is_truncated ) is set to true. Now depending upon this parameter, we’re sure that there’s some data left behind. But the question is, how to fetch that.
Luckily S3 provides another parameter in response (named as next_continuation_token ) which stores the index of the first file from the chunk that was left unread.
So if you want to fetch the remaining files, the gem expects you to make another query along with a parameter continuation_token set to next_continuation_token (from the response of previous query). You put this logic in while loop which runs until is_truncated is set to false (which means no data left behind unread.)

Before implementing this change, the code used to fetch file paths looked something like this


files_array = s3.list_objects_v2({ bucket: ‘scrapping-xquic’, prefix: “directory” })[‘contents’].map(&:key)

To fix the issue, we had to make this code like this

response = s3.list_objects_v2({ bucket: 'scrapping-xquic', prefix: "directory" })files_array = response['contents'].map(&:key)while response['is_truncated'] doresponse = s3.list_objects_v2({ bucket: 'scrapping-xquic',    prefix: "directory", continuation_token:    response['next_continuation_token'] })   files_array << response['contents'].map(&:key)endfiles_array = files_array.flatten

When this block of code executed, in variable files_array ,we had all the paths to the files in the directory named as directory in the code snippet.

--

--

Wajeeh Ahsan
Wajeeh Ahsan

No responses yet