Overview

The goal of all this way-of-working is to keep things as isolated as possible, meaning only create a new entity if one does not already exist. Below, will wrap up what we talked about above and illustrate it in real examples.

Platform and S3 Linkage

In order to have even more control over what we do, we apply the platform and s3 linkage. This is all about navigating through projects, file creations and dashboards with ease, without having to check over the whole platform. Basically, the ideal goal is to have a project in platform represent a folder in S3, in our standardized output this is as follows: s3://prime-data-lake/production/client/vdh/

Platform
S3

Standardized Output

/standardized_output/

Standardized Output > Raw

/standardized_output/raw/

Standardized Output > Raw > Store Mapper (pipe)

/standardized_output/raw/store_mapper

Standardized Output > Retail Template

/standardized_output/retail_template/

Standardized Output > Product 360

/standardized_output/product_360/

ML Solutions > Promotion Effectiveness

/solutions/promotion_effectiveness/

ML Solutions > Promotion Effectiveness > Input Prep

/solutions/promotion_effectiveness/input_prep

Data Quality Assurance > Retail Template > Point of Sale

/standardized_output/retail_template/DQA/point_of_sale

The ideal table would look like this, but it’s normal to have stuff out of these restrictions as well.

Standardized Output Graph

A new created project, should at least have these components, where the:

  1. Square represents a platform project (or subproject).

  2. The name inside the square represents the project name.

  3. The italic text under name represents the corresponding S3 path.

  4. The soft rectangle represents project components (pipelines, dashboards, etc., not subproject).

Data lifecycle Platform - S3 overview

This can also be called a starting point template for a new created project.

Last updated

Was this helpful?