DQ
DQCatalogLegacy_mysql
Catalog/

legacy_mysql

Type: mysqlHost: mysql-legacy-01.k20x.internalDB: product_catalogLast scan: 1d agoDuration: 92s

Overall score

64.3%

-5.2% vs last week

Six-dimension scores

completeness

82.3%

-4.2%

3 recently added columns (warehouse_zone, weight_g, hs_code) are 99%+ null — added by migration but not backfilled.

uniqueness

61.2%

-8.8%

13,720 duplicate SKUs in products table. ETL upsert logic inserts instead of updating on conflict.

validity

74.1%

-3.1%

4.2% of price_cop values are zero (possible deleted products retained). 1.1% of sku values contain spaces.

consistency

69.8%

-4.5%

inventory.quantity_available negative for 218 products. Price mismatch between products and inventory 8.3%.

integrity

88.1%

-1.2%

inventory.product_id FK to products: 98.8% match. 1,176 inventory rows have no matching product.

accuracy

62.4%

-5.8%

Supplier country_code 12% null. Product descriptions truncated at 255 chars — 34% appear cut off.

legacy_mysql.products

98,000 rows · 14 cols · scanned 1d ago
ColumnTypeNull%DistinctMeanp95DimensionsNotes
product_id
int0%98,000
compl
100.0%
valid
100.0%
uniq
100.0%
sku
varchar(64)1.0%84,280
compl
99.0%
valid
98.9%
uniq
86.0%
13,720 duplicate SKUs. ETL inserts on conflict instead of upsert.
product_name
varchar(255)0.2%93,100
compl
99.8%
valid
98.0%
uniq
95.0%
34% of descriptions appear truncated at 255 chars.
price_cop
decimal(12,2)0.5%39,20085,000380,000
compl
99.5%
valid
95.8%
uniq
40.0%
4.2% zero-price products.
category_id
int3.0%42
compl
97.0%
valid
100.0%
uniq
0.0%
warehouse_zone
varchar(8)99.2%4
compl
0.8%
valid
100.0%
uniq
0.0%
Recently added column — 99.2% null, not yet backfilled.
weight_g
float99.4%588
compl
0.6%
valid
100.0%
uniq
40.0%
Recently added column — 99.4% null.
hs_code
varchar(12)99.1%882
compl
0.9%
valid
100.0%
uniq
50.0%
Harmonized system code — 99.1% null.

legacy_mysql.inventory

98,000 rows · 8 cols · scanned 1d ago
ColumnTypeNull%DistinctMeanp95DimensionsNotes
inventory_id
int0%98,000
compl
100.0%
valid
100.0%
uniq
100.0%
product_id
int0%96,824
compl
100.0%
valid
100.0%
uniq
98.8%
1,176 orphan inventory rows have no matching product.
quantity_available
int0.2%29,40084
compl
99.8%
valid
99.8%
uniq
30.0%
218 rows with negative quantity_available.
supplier_id
int0.5%294
compl
99.5%
valid
100.0%
uniq
0.3%

legacy_mysql.suppliers

340 rows · 7 cols · scanned 1d ago
ColumnTypeNull%DistinctMeanp95DimensionsNotes
supplier_id
int0%340
compl
100.0%
valid
100.0%
uniq
100.0%
supplier_name
varchar(255)0%339
compl
100.0%
valid
99.7%
uniq
99.7%
country_code
char(2)12.0%18
compl
88.0%
valid
98.0%
uniq
0.0%
12% null country_code.
contact_emailPII

email

varchar(255)24.0%330
compl
76.0%
valid
94.0%
uniq
97.0%

Auto-prioritized fixes (3)

criticalproducts.skuuniqueness

Resolve 13,720 duplicate SKUs — ETL upsert logic broken

The ETL pipeline is inserting rows on conflict instead of updating them. This produces multiple product rows per SKU, corrupting inventory joins and order fulfillment downstream. The legacy_mysql.products → analytics_orders.dim_products lineage path is unreliable until fixed.

-- Step 1: Identify duplicates
SELECT sku, COUNT(*) as cnt FROM products GROUP BY sku HAVING cnt > 1;
-- Step 2: Keep most recent by created_at, remove older duplicates
DELETE p1 FROM products p1
INNER JOIN products p2 ON p1.sku = p2.sku AND p1.created_at < p2.created_at;
-- Step 3: Add unique constraint
ALTER TABLE products ADD UNIQUE INDEX uq_sku (sku);

13,720 rows affected

warninginventory.quantity_availableconsistency

Fix 218 negative inventory quantities

218 product inventory rows show negative quantity_available. This is mathematically invalid and suggests either (a) a race condition in the inventory update logic, or (b) returns processed before the original transaction. Negative inventory values corrupt stock-level dashboards.

-- Audit negative rows
SELECT product_id, quantity_available FROM inventory WHERE quantity_available < 0 ORDER BY quantity_available ASC;
-- Set floor to 0 after audit
UPDATE inventory SET quantity_available = 0 WHERE quantity_available < 0;
-- Fix root cause: add CHECK constraint
ALTER TABLE inventory ADD CONSTRAINT chk_qty_non_negative CHECK (quantity_available >= 0);

218 rows affected

warningproducts.warehouse_zonecompleteness

Backfill 3 recently-added null-heavy columns

Three columns added in the last migration (warehouse_zone, weight_g, hs_code) are 99%+ null because the migration script did not include a backfill step. These columns appear on the active product schema but contain no usable data.

-- warehouse_zone: derive from supplier.country_code + product.category
UPDATE products p
JOIN inventory i ON p.product_id = i.product_id
JOIN suppliers s ON i.supplier_id = s.supplier_id
SET p.warehouse_zone = IF(s.country_code = 'CO', 'BOG-1', 'INT-1')
WHERE p.warehouse_zone IS NULL;
-- weight_g and hs_code: import from product_master.csv in GCS.

97,200 rows affected

Column similarity matches

Column AColumn BJaccard
legacy_mysql.products.skuanalytics_orders.dim_products.product_code0.94
legacy_mysql.products.price_copanalytics_orders.fct_orders.revenue_local0.71
payments_pg.merchants.nitlegacy_mysql.suppliers.supplier_name0.61