Platform
✅ Do check the enabled/disabled stages before running a Job.
✅ Always query/check the files ni athena/python. The Data Overview in VDH shows a sample of the file, not all of it. (Don’t trust the sampler 🤫)
✅ Before joining two files, always ensure the matching columns you select are both Trimmed and in the same Case (uppercase/lowercase) to prevent unnecessary unmatched results.
✅ Whatever changes you make in a mapper or any parquet output, always remember to check if that has changed the schema (new column and/or different data type of an existing column), because you would need to refresh that file when using it as an input in all other stages/pipelines.
✅ Check the activity log if there was a push/commit on pipelines before the weekly run, and analyze the ones that had changes during the last week (since the last run).
✅ To increase certainty that you are aware of all pushed versions, subscribe with your email so that you get notifications anytime someone else has pushed changes in those pipelines.
❌ Don’t Push pipelines with hard coded input versions.
❌ Don’t select Append mode in parquet outputs when saving client’s raw file; instead, choose Overwrite. It happened that revenues got doubled up in the dashboard because of this occurrence.
Last updated
Was this helpful?