K/20X Labs · Production
Plan: pro · 5 datasets · 33 active rules · 12 PII findings
Overall quality score
83.1%
7-day trend
-0.6%
Critical issues
5
PII columns
12
Datasets
Six-dimension scorecard — payments_pg
completeness
99.7%
+0.1%
uniqueness
99.3%
-0.1%
validity
92.1%
+0.4%
consistency
90.7%
-0.8%
integrity
89.1%
-0.2%
accuracy
90.4%
+0.2%
completeness
99.7%
All required fields populated; 0.3% nullable metadata fields left intentionally null.
uniqueness
99.3%
transaction_id 100% unique; 487 orphan merchant_id references skew composite uniqueness.
validity
92.1%
Currency enum valid (COP/USD/EUR). 0.6% amount_cop values below floor — possible refunds.
consistency
90.7%
created_at 7-day KS drift: D=0.12, p=0.031. Upper-tail amount_cop trending higher.
integrity
89.1%
487 rows reference merchant_id not in merchants table (deleted merchants). FK integrity 99.97%.
accuracy
90.4%
Benford first-digit χ²=0.038 on amount_cop — slightly off but within tolerance for COP denominations.
Top fixes — auto-prioritized across workspace
Backfill 529 null consent_timestamps — Habeas Data critical
529 rows (6.3%) in consent_log have a null consent_timestamp. Colombian Habeas Data law (Ley 1581 de 2012) requires a documented timestamp for every consent record. These rows will fail the next SFC audit.
UPDATE consent_log SET consent_timestamp = created_at WHERE consent_timestamp IS NULL AND created_at IS NOT NULL; -- Remaining nulls: escalate to operations team for manual backfill from paper forms.
529 rows affected
Normalize 19 city variants → 6 canonical Colombian cities
19 unique strings cluster into 6 canonical cities. rapidfuzz centroid similarity threshold 0.91. BOG / bogota / Bogt´a / bogot / Bogotá D.C. all map to Bogotá. Affects downstream city-level analytics and Habeas Data subject-location reporting.
UPDATE customers SET city = CASE
WHEN lower(city) IN ('bog','bogota','bogt´a','bogot','bogotá d.c.','bogotá dc','santafe de bogota') THEN 'Bogotá'
WHEN lower(city) IN ('mde','medellin','medellín') THEN 'Medellín'
WHEN lower(city) IN ('cli','cali') THEN 'Cali'
WHEN lower(city) IN ('baq','barranquilla','barranqilla') THEN 'Barranquilla'
WHEN lower(city) IN ('ctg','cartagena','cartagena de indias') THEN 'Cartagena'
WHEN lower(city) IN ('bga','bucaramanga') THEN 'Bucaramanga'
ELSE city
END
WHERE city NOT IN ('Bogotá','Medellín','Cali','Barranquilla','Cartagena','Bucaramanga');12,200 rows affected
Resolve 13,720 duplicate SKUs — ETL upsert logic broken
The ETL pipeline is inserting rows on conflict instead of updating them. This produces multiple product rows per SKU, corrupting inventory joins and order fulfillment downstream. The legacy_mysql.products → analytics_orders.dim_products lineage path is unreliable until fixed.
-- Step 1: Identify duplicates SELECT sku, COUNT(*) as cnt FROM products GROUP BY sku HAVING cnt > 1; -- Step 2: Keep most recent by created_at, remove older duplicates DELETE p1 FROM products p1 INNER JOIN products p2 ON p1.sku = p2.sku AND p1.created_at < p2.created_at; -- Step 3: Add unique constraint ALTER TABLE products ADD UNIQUE INDEX uq_sku (sku);
13,720 rows affected
Handle 487 orphan transactions referencing deleted merchants
487 transaction rows reference a merchant_id that no longer exists in the merchants table (merchants deleted without cascade). These transactions are invisible in merchant-level reporting and may cause FK violations in downstream pipelines.
-- Option A: Archive deleted merchants INSERT INTO merchants_archive SELECT * FROM merchants_deleted WHERE merchant_id IN ( SELECT DISTINCT merchant_id FROM transactions t LEFT JOIN merchants m ON t.merchant_id = m.merchant_id WHERE m.merchant_id IS NULL ); -- Option B: Null out orphan references (loses traceability) UPDATE transactions SET merchant_id = NULL WHERE merchant_id NOT IN (SELECT merchant_id FROM merchants);
487 rows affected
Compliance coverage
Distribution analysis
Histogram — payments_pg.transactions.amount_cop
1.8M rows · mean 412k COP · log-normal with right tail (p95 = 2.1M)
Null co-occurrence heatmap — crm_postgres.customers
Jaccard similarity of null masks across 12 columns. Warm = highly co-null (address cluster).
Lineage — source-to-analytics path
18 edges tracked across 5 datasets. Quality issues in source tables propagate to analytics downstream.
| Source | Destination | Transform |
|---|---|---|
| payments_pg.transactions | analytics_orders.fct_orders | aggregate |
| payments_pg.transactions.amount_cop | analytics_orders.fct_orders.revenue_local | identity |
| payments_pg.transactions.amount_cop | analytics_orders.fct_orders.revenue_usd | fx_conversion |
| payments_pg.merchants.merchant_id | analytics_orders.fct_orders.merchant_id | identity |
| crm_postgres.customers | analytics_orders.dim_customers | filter |
| crm_postgres.customers.customer_id | analytics_orders.dim_customers.customer_id | identity |
| crm_postgres.customers.city | analytics_orders.dim_customers.city | identity |
| crm_postgres.consents | analytics_orders.dim_customers | join |
| legacy_mysql.products | analytics_orders.dim_products | rename |
| legacy_mysql.products.sku | analytics_orders.dim_products.product_code | rename |
| legacy_mysql.inventory | analytics_orders.dim_products | join |
| payments_pg.chargebacks | analytics_orders.fct_orders | join |
Column similarity — MinHash + Jaccard (top 8 pairs)
| Column A | Column B | Jaccard |
|---|---|---|
| crm_postgres.customers.email | payments_pg.merchants.contact_email | 0.89 |
| crm_postgres.customers.document_id | compliance_consent.consent_log.subject_id | 0.97 |
| legacy_mysql.products.sku | analytics_orders.dim_products.product_code | 0.94 |
| crm_postgres.customers.customer_id | analytics_orders.dim_customers.customer_id | 0.98 |
| payments_pg.transactions.customer_id | crm_postgres.customers.customer_id | 0.83 |
| legacy_mysql.products.price_cop | analytics_orders.fct_orders.revenue_local | 0.71 |
| crm_postgres.customers.city | crm_postgres.addresses.city | 0.86 |
| payments_pg.merchants.nit | legacy_mysql.suppliers.supplier_name | 0.61 |
Recent activity
Drift detected: payments_pg.transactions.amount_cop (KS D=0.12, p=0.031)
payments_pg · consistency
487 orphan transactions reference deleted merchants in payments_pg
payments_pg · integrity
Scan: payments_pg — 91.4%
1,832,200 rows · 38s
Scan: crm_postgres — 76.8%
582,000 rows · 61s
Mahalanobis outliers in analytics_orders.fct_orders.revenue_usd (47 rows)
analytics_orders · accuracy
Scan: analytics_orders — 88.2%
5,652,000 rows · 124s
CRITICAL: 529 null consent_timestamps in compliance_consent — Habeas Data violation risk
compliance_consent · completeness
Scan: compliance_consent — 82.1%
8,412 rows · 14s
Scan: payments_pg — 91.1%
1,831,000 rows · 41s
crm_postgres overall score dropped 2.1% in 7 days — investigate city/email issues
crm_postgres · validity
Scan: crm_postgres — 77.4%
580,000 rows · 58s
Scan: analytics_orders — 88.6%
5,640,000 rows · 119s
Run this on your data in 90 seconds
Connect your Postgres, Snowflake, or MySQL warehouse. DQ profiles every column, flags every PII field, and maps every lineage edge — automatically. Free tier: 1 connection, 100k rows/month. No card required.