Simple Storage Service (S3)

Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance.

Customers of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps.

With cost-effective storage classes and easy-to-use management features, you can optimize costs, organize data, and configure fine-tuned access controls to meet specific business, organizational, and compliance requirements.

In other words, S3 is our database where we store data generated from and outside the Platform.

Two important entities to understand when we talk about S3 are:

Objects: An object is a file and any metadata that describes the file.
Buckets: A bucket is a container for objects.

We use S3 to store various data, from logs, metrics, Platform generated data and client uploads. Our goal in this section is to understand how the client data upload process goes and how we then proceed to process those files.

The buckets that we will widely use are:

prime-client: client buckets where raw data is uploaded.
prime-data-lake: contains Prime processed data.

The details about how we read, process and use the client data are explained in the Data Lifecycle section.

Overall, we do not get to create buckets or objects because buckets which we use in a daily manner already exist, while objects are created from the platform, meaning that you do not have to manually create a path, the platform creates one automatically when you export to S3.

URI

You probably noticed that we have already used S3 once in out Student Grades Assignment.

If you are curious, you can find the s3 in the following location: https://s3.console.aws.amazon.com/s3/object/prime-data-lake?region=eu-central-1&prefix=production/prime/vdh/a_data_primer_platform/input/student_grades.csv

Please note that the URL provided above is a bit different from the path that we used on the CSV importer :

s3://prime-data-lake/production/prime/vdh/a_data_primer_platform/input/student_grades.csv

This happens because we use URI instead of URL, feel free to research the differences 😄

Navigating through S3

Navigating through a bucket is really easy and intuitive.

Below is a screenshot inside an S3 folder (S3 folders don't really exist but it is helpful to think of them that way)

Here we have the option to click on input/ or results/ to see some different objects or click on vdh/ at the top bar to get back on a level.

After clicking on input/ we go inside the "folder" and on the next level

Here we have ticked the box on the left and now we have some more options with our CSV.

If we click Copy S3 URI, we will have the s3 path on our clipboard, this is the path that we used on our Student Grades Assignment.

PreviousAmazon Web Services NextElastic Map Reduce (EMR)

Last updated 2 years ago

Was this helpful?