DQ.
Scanning live

K/20X Labs · Production

Plan: pro · 5 datasets · 33 active rules · 12 PII findings

Overall quality score

83.1%

7-day trend

-0.6%

Critical issues

5

PII columns

12

Datasets

Type: postgresqlStatus: liveLast scan: 11h agoDuration: 38sTables: 3pci_dsshabeas_data_co

Six-dimension scorecard — payments_pg

completeness

99.7%

+0.1%

uniqueness

99.3%

-0.1%

validity

92.1%

+0.4%

consistency

90.7%

-0.8%

integrity

89.1%

-0.2%

accuracy

90.4%

+0.2%

completeness

99.7%

All required fields populated; 0.3% nullable metadata fields left intentionally null.

uniqueness

99.3%

transaction_id 100% unique; 487 orphan merchant_id references skew composite uniqueness.

validity

92.1%

Currency enum valid (COP/USD/EUR). 0.6% amount_cop values below floor — possible refunds.

consistency

90.7%

created_at 7-day KS drift: D=0.12, p=0.031. Upper-tail amount_cop trending higher.

integrity

89.1%

487 rows reference merchant_id not in merchants table (deleted merchants). FK integrity 99.97%.

accuracy

90.4%

Benford first-digit χ²=0.038 on amount_cop — slightly off but within tolerance for COP denominations.

Top fixes — auto-prioritized across workspace

criticalcompliance_consent.consent_log.consent_timestamphabeas_data_co

Backfill 529 null consent_timestamps — Habeas Data critical

529 rows (6.3%) in consent_log have a null consent_timestamp. Colombian Habeas Data law (Ley 1581 de 2012) requires a documented timestamp for every consent record. These rows will fail the next SFC audit.

UPDATE consent_log SET consent_timestamp = created_at WHERE consent_timestamp IS NULL AND created_at IS NOT NULL;
-- Remaining nulls: escalate to operations team for manual backfill from paper forms.

529 rows affected

criticalcrm_postgres.customers.cityhabeas_data_cogdpr

Normalize 19 city variants → 6 canonical Colombian cities

19 unique strings cluster into 6 canonical cities. rapidfuzz centroid similarity threshold 0.91. BOG / bogota / Bogt´a / bogot / Bogotá D.C. all map to Bogotá. Affects downstream city-level analytics and Habeas Data subject-location reporting.

UPDATE customers SET city = CASE
  WHEN lower(city) IN ('bog','bogota','bogt´a','bogot','bogotá d.c.','bogotá dc','santafe de bogota') THEN 'Bogotá'
  WHEN lower(city) IN ('mde','medellin','medellín') THEN 'Medellín'
  WHEN lower(city) IN ('cli','cali') THEN 'Cali'
  WHEN lower(city) IN ('baq','barranquilla','barranqilla') THEN 'Barranquilla'
  WHEN lower(city) IN ('ctg','cartagena','cartagena de indias') THEN 'Cartagena'
  WHEN lower(city) IN ('bga','bucaramanga') THEN 'Bucaramanga'
  ELSE city
END
WHERE city NOT IN ('Bogotá','Medellín','Cali','Barranquilla','Cartagena','Bucaramanga');

12,200 rows affected

criticallegacy_mysql.products.sku

Resolve 13,720 duplicate SKUs — ETL upsert logic broken

The ETL pipeline is inserting rows on conflict instead of updating them. This produces multiple product rows per SKU, corrupting inventory joins and order fulfillment downstream. The legacy_mysql.products → analytics_orders.dim_products lineage path is unreliable until fixed.

-- Step 1: Identify duplicates
SELECT sku, COUNT(*) as cnt FROM products GROUP BY sku HAVING cnt > 1;
-- Step 2: Keep most recent by created_at, remove older duplicates
DELETE p1 FROM products p1
INNER JOIN products p2 ON p1.sku = p2.sku AND p1.created_at < p2.created_at;
-- Step 3: Add unique constraint
ALTER TABLE products ADD UNIQUE INDEX uq_sku (sku);

13,720 rows affected

criticalpayments_pg.transactions.merchant_idpci_dss

Handle 487 orphan transactions referencing deleted merchants

487 transaction rows reference a merchant_id that no longer exists in the merchants table (merchants deleted without cascade). These transactions are invisible in merchant-level reporting and may cause FK violations in downstream pipelines.

-- Option A: Archive deleted merchants
INSERT INTO merchants_archive SELECT * FROM merchants_deleted WHERE merchant_id IN (
  SELECT DISTINCT merchant_id FROM transactions t
  LEFT JOIN merchants m ON t.merchant_id = m.merchant_id
  WHERE m.merchant_id IS NULL
);
-- Option B: Null out orphan references (loses traceability)
UPDATE transactions SET merchant_id = NULL WHERE merchant_id NOT IN (SELECT merchant_id FROM merchants);

487 rows affected

Compliance coverage

Distribution analysis

Histogram — payments_pg.transactions.amount_cop

1.8M rows · mean 412k COP · log-normal with right tail (p95 = 2.1M)

Null co-occurrence heatmap — crm_postgres.customers

Jaccard similarity of null masks across 12 columns. Warm = highly co-null (address cluster).

Lineage — source-to-analytics path

18 edges tracked across 5 datasets. Quality issues in source tables propagate to analytics downstream.

SourceDestinationTransform
payments_pg.transactionsanalytics_orders.fct_ordersaggregate
payments_pg.transactions.amount_copanalytics_orders.fct_orders.revenue_localidentity
payments_pg.transactions.amount_copanalytics_orders.fct_orders.revenue_usdfx_conversion
payments_pg.merchants.merchant_idanalytics_orders.fct_orders.merchant_ididentity
crm_postgres.customersanalytics_orders.dim_customersfilter
crm_postgres.customers.customer_idanalytics_orders.dim_customers.customer_ididentity
crm_postgres.customers.cityanalytics_orders.dim_customers.cityidentity
crm_postgres.consentsanalytics_orders.dim_customersjoin
legacy_mysql.productsanalytics_orders.dim_productsrename
legacy_mysql.products.skuanalytics_orders.dim_products.product_coderename
legacy_mysql.inventoryanalytics_orders.dim_productsjoin
payments_pg.chargebacksanalytics_orders.fct_ordersjoin

Column similarity — MinHash + Jaccard (top 8 pairs)

Column AColumn BJaccard
crm_postgres.customers.emailpayments_pg.merchants.contact_email0.89
crm_postgres.customers.document_idcompliance_consent.consent_log.subject_id0.97
legacy_mysql.products.skuanalytics_orders.dim_products.product_code0.94
crm_postgres.customers.customer_idanalytics_orders.dim_customers.customer_id0.98
payments_pg.transactions.customer_idcrm_postgres.customers.customer_id0.83
legacy_mysql.products.price_copanalytics_orders.fct_orders.revenue_local0.71
crm_postgres.customers.citycrm_postgres.addresses.city0.86
payments_pg.merchants.nitlegacy_mysql.suppliers.supplier_name0.61

Recent activity

11h ago

Drift detected: payments_pg.transactions.amount_cop (KS D=0.12, p=0.031)

payments_pg · consistency

11h ago

487 orphan transactions reference deleted merchants in payments_pg

payments_pg · integrity

11h ago

Scan: payments_pg — 91.4%

1,832,200 rows · 38s

11h ago

Scan: crm_postgres — 76.8%

582,000 rows · 61s

12h ago

Mahalanobis outliers in analytics_orders.fct_orders.revenue_usd (47 rows)

analytics_orders · accuracy

12h ago

Scan: analytics_orders — 88.2%

5,652,000 rows · 124s

13h ago

CRITICAL: 529 null consent_timestamps in compliance_consent — Habeas Data violation risk

compliance_consent · completeness

13h ago

Scan: compliance_consent — 82.1%

8,412 rows · 14s

17h ago

Scan: payments_pg — 91.1%

1,831,000 rows · 41s

17h ago

crm_postgres overall score dropped 2.1% in 7 days — investigate city/email issues

crm_postgres · validity

18h ago

Scan: crm_postgres — 77.4%

580,000 rows · 58s

20h ago

Scan: analytics_orders — 88.6%

5,640,000 rows · 119s

Run this on your data in 90 seconds

Connect your Postgres, Snowflake, or MySQL warehouse. DQ profiles every column, flags every PII field, and maps every lineage edge — automatically. Free tier: 1 connection, 100k rows/month. No card required.