S3 Best Practices
Multiple concurrent PUTs/GETs
S3 scales to accommodate very high request rates. S3 automatically divides buckets according to the request rate if it is increasing.
S3 can reach at least 3,500 PUT/COPY/POST/DELETE requests per second per prefix within a bucket.
AWS scales and handles workloads that have less than 100 requests per minute and more than 800 requests per minute.
If your typical workload has a request rate of more than 300 PUT/LIST/DELETE requests per minute or more than 800 requests per second, you should open a support case. This will help you prepare for the workload and eliminate any temporary limits on the rate at which you can request.
If you process more than 100 requests per second, the S3 best practices guidelines can only be applied.
Workloads that contain a mix request types. If the request workload includes GET, PUT and DELETE (list objects), then choosing the appropriate key names for these objects will ensure better performance. This is because it provides low-latency access the S3 index.
This behavior is due to the way S3 stores key name. S3 keeps an index of object keynames in each AWS region.
Object keys are stored lexicographically across multiple partitions of the index (UTF-8 binary order). S3 stores key names alphabetically.
Object keys are stored in multiple partitions of the index. The key name determines which partition the key is located in.
A sequential prefix such as timestamp, or an alphabetical sequence increases the chance that S3 will target a particular partition for a large amount of keys, exceeding the I/O capability of the partition.
Introduce some randomness to the key name prefixes. The key names and the I/O load will be distributed across multiple index parts.
It also ensures that there is no limit to the number of requests per second.
S3 Transfer Acceleration allows for fast, simple, and secure file transfers over long distances between the client’s computer and an S3 bucket.
CloudFront’s global distributed edge locations allow for Transfer Acceleration. GET-intensive Workloads
CloudFront can be used to optimize performance and can help distribute content with low latency or high data transfer rate.
Caching the content, thereby reducing the number direct requests to S3
Multiple endpoints (Edge locations), are required to ensure data availability
Available in two flavors: Web distribution or RTMP Distribution
S3 Transfer Acceleration is a way to speed data transport over long distances from a client to an S3 bucket. Transfer Acceleration uses CloudFront’s globally distributed edge locations to accelerate data transport over long distancesPUTs/GETs Large Objects
AWS allows parallelizing the PUTs/GETs requests to improve upload and download performance, as well as the ability of recovering in the event it fails.
Multipart upload is a great option for PUTs. It allows you to perform multiple uploads simultaneously and maximizes network bandwidth utilization.
Rapid recovery is possible due to failures. Only the failed part must be re-uploaded.
Ability to pause and restart uploads
Upload before the object size is known
The range HTTP header can be used to improve GETs. It allows the object to be retrieved in smaller parts than the entire object.
List Operations
Object key names are stored lexicographically within S3 indexes. This makes it difficult to sort and manipulate LIST’s contents.
S3 maintains a single alphabetically sorted list indexes
For e.g., build and maintain a Secondary Index beyond S3. DynamoDB and RDS can store, index, and query objects metadata instead of