Other
Copy S3 paths
Sometimes, we need to go to specific S3 paths. We can do that by manually opening the folders, but there is also a faster way by copying the path directly to S3. That is done by copying the VDH path and pasting it into the URL address path of the browser once we’re in S3.
SQL script for counting nulls
We might often need to check if all data columns are populated. As an instance, taking into account the whole daily folder of a scraped competitor, we want to see the number of null rows per column. In the example below, the product_promotion_price column has 7307309 null rows, while the other columns have 0 null rows, meaning that they are fully populated.
SELECT count(*) AS total_counts,
SUM( case when product_code IS NULL THEN 1 ELSE 0 END) AS product_code,
SUM( case when product_price IS NULL THEN 1 ELSE 0 END) AS product_price,
SUM( case when product_promotion_price IS NULL THEN 1 ELSE 0 END) AS product_promotion_price,
SUM( case when retailer_name IS NULL THEN 1 ELSE 0 END) AS retailer_name
FROM "scraper"."albert_daily"
SQL for checking all columns and their data types
SELECT *
FROM information_schema.columns
WHERE table_name = 'albert_daily'
Last updated
Was this helpful?