AWS re:Invent 2017
Amazon Simple Storage Service (S3) provides developers and IT Teams with secure, durable, highly scalable cloud storage.
S3 is easy to use object storage, with a simple web service interface to store and retrieve any amount of data from anywhere on the web.
- Object Storage:
- Data (files/videos/pictures) and associated metadata stored as objects
- Can't create an operating system on it (i.e. like with a filesystem )
- Objects can be up to 5TB in size
- Highly Durable:
- Objects are 99.999999999% durable (11 9's)
- This means you may lose 1 object in every 100 billion
- Data is replicated across multiple devices in multiple facilities
- Highly Available:
- Offers 99.99% availability
- Highly Scaleable:
- For users, it's basically infinite amount of storage
- Web Based:
- Upload & download data using web based protocols over the internet
- Secure:
- Features can be applied to improve confidentiality, integrity, availability and accountability of data
- Utility Based Pricing:
- Pay only for what you use
Source
What is S3 Used for?
- Backup and Archiving
- Content Storage and Distribution
- Big Data and Analytics
- Static Website Hosting
- Disaster Recovery
S3 Consists of Buckets
- A bucket is a basic container within S3 used for storage of objects
- ARN: Both buckets and objects are classed as resources (any entity in AWS that you can work with)
- Referred to with an Amazon Resource Name (ARN)
- Upload as many objects as you like into buckets, can create up to 100 buckets by default (i.e. Soft limit. You can add more
by requesting a service limit increase)
- Buckets must be created in a region
- Objects stored in a region stay in that region unless you explicitly transfer them out
- Buckets have subresources that basically define how the bucket is configured
- A subresource is a resource that belongs to another resource and cannot exist on its own
- Amazon S3 has a set of dual-stack endpoints, which support requests to S3 buckets over both Internet Protocol version 6 (IPv6) and IPv4. For more information
S3 Namespace
- S3 has a universal namespace
- Bucket names must be unique globally regardless of the region they are created in
- S3 has a flat structure
- Unlike a file system, it has no directories...
- However directories can be imitated by the use of prefixes
- http://siaweb.s3.amazonaws.com/SiaBday20120705/156.jpg
Note: /SiaBday20120705/ is Prefix to the object /SiaBday20120705/156.jpg
- Object key names can use UTF-8 encoding but must not be longer than 1024 bytes
- When naming objects, it's recommended to use DNS safe naming and characters:
[0-9A-Za-z] and !, - , _, . , *, ', (,)
- Can be accessed via Virtual or Path style URL
- Virtual:
http://bucket.s3.amazonaws.com
http://bucket.s3-aws-region.amazonaws.com
- Path:
http://s3-aws-region.amazonaws.com/bucket
- Example:
http://siaweb.s3.amazonaws.com/1/index.html
http://s3.amazonaws.com/siaweb/1/index.html
S3 Objects
- S3 is a key, value store designed to store an unlimited number of objects
- Objects consist of
- Key = Name of object
- Value = The data being stored (0-5TB)
- Version ID = A string of data assigned to an object when versioning is enabled
- Bucket + Key + Version ID = Uniquely identify an object in S3
- Metadata = Name-value pairs which are used to store information about the object
- Subresources = Additional resources specifically assigned to an object
- Access control information = Policies for controlling access to the resource
- Object Tagging
- Object tagging allows you to categorise objects using a key/value pair
- PROD=website
- Classification=confidential
- Object tags enable
- Fine grained access control
- Fine grained lifecycle managment
- Filtering for CloudWatch metrics and CloudTrail Logs
- Object Tagging Features
- Keys can be 128 unicode characters in length
- Values can be 256 unicode characters in length
- Keys and Values are case sensitive
- Up to 10 tags per object
- Each Tag must have a unique key
S3 Consistency Model
- S3 provides read-after-write consistency for puts of new objects
- Can only read the data after its been successfully written to all facilites
and returned success
- S3 provides eventual consistency for overwrite puts (updates) and deletes
- For updates old data may be returned
- For deletes old data may be returned or a deleted key may still show in a list
- Eventual consistency provides low latency and high throughput
- Also note S3 does not provide object locking
- If two requests are made at roughly the same time the one with the latest timestamp wins
Storage Classes/Tiers
- S3 provides different tiers of storage based on need:
- S3 Standard: 99.99% availability, 99.999999999% durability, stored redundantly across
Multiple devices in multiple facilities and is designed to sustain the loss of two facilities
concurrently
- S3 Standard - Infrequent Access: Used for data that is accessed less frequently, but requires rapid
access when needed. Lower fee than S3 but you are charged a revival fee.
- Reduced Redundancy Storage: Does not replicate as many times as standard S3 and therefore provides 99.99% availability
and durability and comes at a lower cost
- Glacier: Extermely cheap but only suitable for archival data or infrequently accessed data. Data is not available in
real-time and instead must be restored. You can select a retrieval tier which will determine the restore time.
-
S3 provides Lifecycle Policies that allow objects to transition between the storage classes