Platform

In this section we will discuss some of the ways that you can use to do data checks on our platform.

Schema Validator

Schema Validator is a special VDH node created by PRIME to be used when we need assurance on the schema of our data.

The configuration pane of the node looks like this:

As you can see, we have to specify the columns that we expect to receive and the expected data types of each column. The module will trigger the pipeline to fail if we don’t receive exactly the same schema while executing the pipeline. We can also configure the node to send an email to one of the users after it caused the failure.

This node is especially useful before running algorithms that require large amounts of processing power, since we make sure beforehand that our pipeline will not fail due to the wrong schema.

Guard Expressions

Guard Expressions is another special VDH node created by PRIME, it enables us to validate our data by using SQL expressions. There are some suggestions on common checks that you may use, but you also have the possibility to write the SQL code yourself and check whatever you feel is necessary.

The configuration pane is similar to the one we saw on the Schema Validator:

As you can see we can also access the values of the data as well. We can also choose if we want to stop the execution of the pipeline or if we want to continue with the execution but also receive an email notifying us of possible mistakes in our data.

DQA Dashboard

Data Quality Assurance Dashboards are a great way to perform checks along the way as you continue with data processing. You can create dashboards to check whether the new data aligns with the trends of the data from past periods, any high increase/decrease should alert you to continue digging through the data/pipelines to make sure that the processing was done correctly. You can also create graphs that count null values, the number of records etc.

Dashboard Version

You can also use the Production Dashboards (the ones that are shown to our clients as well) to check the previous versions of the dashboards and see if the values there have changed. Some of our pipelines also process the data from previous periods, in this case we can check if we have the exact same values for those periods.

You can find the versions on the top right corner of the dashboard:

If you click View, you will see the pipeline exactly as it appeared on that specific date.

Last updated

Was this helpful?