Platform

✅ Do check the enabled/disabled stages before running a Job.

✅ Always query/check the files ni athena/python. The Data Overview in VDH shows a sample of the file, not all of it. (Don’t trust the sampler 🤫)

✅ Before joining two files, always ensure the matching columns you select are both Trimmed and in the same Case (uppercase/lowercase) to prevent unnecessary unmatched results.

✅ Whatever changes you make in a mapper or any parquet output, always remember to check if that has changed the schema (new column and/or different data type of an existing column), because you would need to refresh that file when using it as an input in all other stages/pipelines.

✅ Check the activity log if there was a push/commit on pipelines before the weekly run, and analyze the ones that had changes during the last week (since the last run).

✅ To increase certainty that you are aware of all pushed versions, subscribe with your email so that you get notifications anytime someone else has pushed changes in those pipelines.

❌ Don’t Push pipelines with hard coded input versions.

❌ Don’t select Append mode in parquet outputs when saving client’s raw file; instead, choose Overwrite. It happened that revenues got doubled up in the dashboard because of this occurrence.

Last updated

Was this helpful?