Overall score
64.3%
-5.2% vs last week
Six-dimension scores
completeness
82.3%
-4.2%
3 recently added columns (warehouse_zone, weight_g, hs_code) are 99%+ null — added by migration but not backfilled.
uniqueness
61.2%
-8.8%
13,720 duplicate SKUs in products table. ETL upsert logic inserts instead of updating on conflict.
validity
74.1%
-3.1%
4.2% of price_cop values are zero (possible deleted products retained). 1.1% of sku values contain spaces.
consistency
69.8%
-4.5%
inventory.quantity_available negative for 218 products. Price mismatch between products and inventory 8.3%.
integrity
88.1%
-1.2%
inventory.product_id FK to products: 98.8% match. 1,176 inventory rows have no matching product.
accuracy
62.4%
-5.8%
Supplier country_code 12% null. Product descriptions truncated at 255 chars — 34% appear cut off.
legacy_mysql.products
98,000 rows · 14 cols · scanned 1d ago| Column | Type | Null% | Distinct | Mean | p95 | Dimensions | Notes |
|---|---|---|---|---|---|---|---|
product_id | int | 0% | 98,000 | — | — | compl 100.0% valid 100.0% uniq 100.0% | |
sku | varchar(64) | 1.0% | 84,280 | — | — | compl 99.0% valid 98.9% uniq 86.0% | 13,720 duplicate SKUs. ETL inserts on conflict instead of upsert. |
product_name | varchar(255) | 0.2% | 93,100 | — | — | compl 99.8% valid 98.0% uniq 95.0% | 34% of descriptions appear truncated at 255 chars. |
price_cop | decimal(12,2) | 0.5% | 39,200 | 85,000 | 380,000 | compl 99.5% valid 95.8% uniq 40.0% | 4.2% zero-price products. |
category_id | int | 3.0% | 42 | — | — | compl 97.0% valid 100.0% uniq 0.0% | |
warehouse_zone | varchar(8) | 99.2% | 4 | — | — | compl 0.8% valid 100.0% uniq 0.0% | Recently added column — 99.2% null, not yet backfilled. |
weight_g | float | 99.4% | 588 | — | — | compl 0.6% valid 100.0% uniq 40.0% | Recently added column — 99.4% null. |
hs_code | varchar(12) | 99.1% | 882 | — | — | compl 0.9% valid 100.0% uniq 50.0% | Harmonized system code — 99.1% null. |
legacy_mysql.inventory
98,000 rows · 8 cols · scanned 1d ago| Column | Type | Null% | Distinct | Mean | p95 | Dimensions | Notes |
|---|---|---|---|---|---|---|---|
inventory_id | int | 0% | 98,000 | — | — | compl 100.0% valid 100.0% uniq 100.0% | |
product_id | int | 0% | 96,824 | — | — | compl 100.0% valid 100.0% uniq 98.8% | 1,176 orphan inventory rows have no matching product. |
quantity_available | int | 0.2% | 29,400 | 84 | — | compl 99.8% valid 99.8% uniq 30.0% | 218 rows with negative quantity_available. |
supplier_id | int | 0.5% | 294 | — | — | compl 99.5% valid 100.0% uniq 0.3% |
legacy_mysql.suppliers
340 rows · 7 cols · scanned 1d ago| Column | Type | Null% | Distinct | Mean | p95 | Dimensions | Notes |
|---|---|---|---|---|---|---|---|
supplier_id | int | 0% | 340 | — | — | compl 100.0% valid 100.0% uniq 100.0% | |
supplier_name | varchar(255) | 0% | 339 | — | — | compl 100.0% valid 99.7% uniq 99.7% | |
country_code | char(2) | 12.0% | 18 | — | — | compl 88.0% valid 98.0% uniq 0.0% | 12% null country_code. |
contact_emailPII | varchar(255) | 24.0% | 330 | — | — | compl 76.0% valid 94.0% uniq 97.0% |
Auto-prioritized fixes (3)
Resolve 13,720 duplicate SKUs — ETL upsert logic broken
The ETL pipeline is inserting rows on conflict instead of updating them. This produces multiple product rows per SKU, corrupting inventory joins and order fulfillment downstream. The legacy_mysql.products → analytics_orders.dim_products lineage path is unreliable until fixed.
-- Step 1: Identify duplicates SELECT sku, COUNT(*) as cnt FROM products GROUP BY sku HAVING cnt > 1; -- Step 2: Keep most recent by created_at, remove older duplicates DELETE p1 FROM products p1 INNER JOIN products p2 ON p1.sku = p2.sku AND p1.created_at < p2.created_at; -- Step 3: Add unique constraint ALTER TABLE products ADD UNIQUE INDEX uq_sku (sku);
13,720 rows affected
Fix 218 negative inventory quantities
218 product inventory rows show negative quantity_available. This is mathematically invalid and suggests either (a) a race condition in the inventory update logic, or (b) returns processed before the original transaction. Negative inventory values corrupt stock-level dashboards.
-- Audit negative rows SELECT product_id, quantity_available FROM inventory WHERE quantity_available < 0 ORDER BY quantity_available ASC; -- Set floor to 0 after audit UPDATE inventory SET quantity_available = 0 WHERE quantity_available < 0; -- Fix root cause: add CHECK constraint ALTER TABLE inventory ADD CONSTRAINT chk_qty_non_negative CHECK (quantity_available >= 0);
218 rows affected
Backfill 3 recently-added null-heavy columns
Three columns added in the last migration (warehouse_zone, weight_g, hs_code) are 99%+ null because the migration script did not include a backfill step. These columns appear on the active product schema but contain no usable data.
-- warehouse_zone: derive from supplier.country_code + product.category UPDATE products p JOIN inventory i ON p.product_id = i.product_id JOIN suppliers s ON i.supplier_id = s.supplier_id SET p.warehouse_zone = IF(s.country_code = 'CO', 'BOG-1', 'INT-1') WHERE p.warehouse_zone IS NULL; -- weight_g and hs_code: import from product_master.csv in GCS.
97,200 rows affected
Column similarity matches
| Column A | Column B | Jaccard |
|---|---|---|
| legacy_mysql.products.sku | analytics_orders.dim_products.product_code | 0.94 |
| legacy_mysql.products.price_cop | analytics_orders.fct_orders.revenue_local | 0.71 |
| payments_pg.merchants.nit | legacy_mysql.suppliers.supplier_name | 0.61 |