Overall score
76.8%
-2.1% vs last week
Six-dimension scores
completeness
98.1%
-0.2%
1 row missing required city; 0.5% phone numbers null; consents table 100% complete.
uniqueness
93.4%
-1.5%
Email column has 2 exact duplicates (shared household). document_id has 6 true duplicates — likely ETL bug.
validity
71.2%
-3.1%
4 typo-domain emails (gmial/hotmial/yahooo). 19 city variants map to 6 canonical names. 3 malformed phone numbers.
consistency
78.8%
-1.8%
City name inconsistency across customers and addresses tables. Mixed locale date formats in legacy imported rows.
integrity
82.4%
-1.2%
consents table FK to customers: 99.8% match. 3 address rows have no corresponding customer.
accuracy
80.3%
-1.4%
12 PII fields detected; 2 email domains consistent with known typo-pattern list. Age distribution outlier (max=143).
crm_postgres.customers
185,000 rows · 16 cols · scanned 14h ago| Column | Type | Null% | Distinct | Mean | p95 | Dimensions | Notes |
|---|---|---|---|---|---|---|---|
customer_id | uuid | 0% | 185,000 | — | — | compl 100.0% valid 100.0% uniq 100.0% | |
emailPII | text | 0.8% | 183,525 | — | — | compl 99.2% valid 71.2% uniq 99.2% | 4 typo-domain emails: gmial.com (3 rows), hotmial.com (2 rows), yahooo.com (2 rows), gamil.com (1 row). |
document_idPII co_cedula | text | 2.0% | 184,630 | — | — | compl 98.0% valid 99.3% uniq 100.0% | Colombian cédula de ciudadanía. 6 duplicate document_ids suspected ETL bug. |
phonePII phone_co | text | 0.5% | 181,300 | — | — | compl 99.5% valid 98.3% uniq 98.0% | 3 numbers with invalid country code prefix. |
full_namePII full_name | text | 0.1% | 157,250 | — | — | compl 99.9% valid 99.0% uniq 85.0% | |
city | text | 0.0% | 19 | — | — | compl 100.0% valid 62.1% uniq 0.0% | 19 unique strings clustering to 6 canonical cities. BOG/bogota/Bogt´a/bogot/Bogotá D.C. all mean Bogotá. |
date_of_birthPII date_of_birth | date | 4.0% | 92,500 | — | — | compl 96.0% valid 99.8% uniq 50.0% | 1 outlier age=143. |
created_at | timestamptz | 0% | 148,000 | — | — | compl 100.0% valid 100.0% uniq 80.0% | |
consent_given | boolean | 0.2% | 2 | — | — | compl 99.8% valid 100.0% uniq 0.0% |
crm_postgres.addresses
212,000 rows · 10 cols · scanned 14h ago| Column | Type | Null% | Distinct | Mean | p95 | Dimensions | Notes |
|---|---|---|---|---|---|---|---|
address_id | uuid | 0% | 212,000 | — | — | compl 100.0% valid 100.0% uniq 100.0% | |
customer_id | uuid | 0% | 184,630 | — | — | compl 100.0% valid 100.0% uniq 87.0% | 3 orphan address rows have no matching customer_id. |
street_line1PII street_address | text | 0.3% | 201,400 | — | — | compl 99.7% valid 98.0% uniq 95.0% | |
city | text | 0% | 23 | — | — | compl 100.0% valid 73.0% uniq 0.0% | Same city normalization problem as customers.city. |
postal_code | text | 8.0% | 2,120 | — | — | compl 92.0% valid 97.0% uniq 1.0% |
crm_postgres.consents
185,000 rows · 7 cols · scanned 14h ago| Column | Type | Null% | Distinct | Mean | p95 | Dimensions | Notes |
|---|---|---|---|---|---|---|---|
consent_id | uuid | 0% | 185,000 | — | — | compl 100.0% valid 100.0% uniq 100.0% | |
customer_id | uuid | 0% | 184,944 | — | — | compl 100.0% valid 100.0% uniq 100.0% | |
policy_version | text | 0% | 4 | — | — | compl 100.0% valid 99.0% uniq 0.0% | |
consent_timestamp | timestamptz | 0.2% | 181,300 | — | — | compl 99.8% valid 100.0% uniq 98.0% |
Auto-prioritized fixes (4)
Normalize 19 city variants → 6 canonical Colombian cities
19 unique strings cluster into 6 canonical cities. rapidfuzz centroid similarity threshold 0.91. BOG / bogota / Bogt´a / bogot / Bogotá D.C. all map to Bogotá. Affects downstream city-level analytics and Habeas Data subject-location reporting.
UPDATE customers SET city = CASE
WHEN lower(city) IN ('bog','bogota','bogt´a','bogot','bogotá d.c.','bogotá dc','santafe de bogota') THEN 'Bogotá'
WHEN lower(city) IN ('mde','medellin','medellín') THEN 'Medellín'
WHEN lower(city) IN ('cli','cali') THEN 'Cali'
WHEN lower(city) IN ('baq','barranquilla','barranqilla') THEN 'Barranquilla'
WHEN lower(city) IN ('ctg','cartagena','cartagena de indias') THEN 'Cartagena'
WHEN lower(city) IN ('bga','bucaramanga') THEN 'Bucaramanga'
ELSE city
END
WHERE city NOT IN ('Bogotá','Medellín','Cali','Barranquilla','Cartagena','Bucaramanga');12,200 rows affected
Correct 8 typo-domain email addresses
Reference dictionary match against known typo-domain list. 3 rows have gmial.com, 2 hotmial.com, 2 yahooo.com, 1 gamil.com. These emails will hard-bounce in any campaign — and since they are PII, incorrect values create GDPR subject-access-request risk.
UPDATE customers SET email = REPLACE(email, '@gmial.com', '@gmail.com') WHERE email LIKE '%@gmial.com'; UPDATE customers SET email = REPLACE(email, '@hotmial.com', '@hotmail.com') WHERE email LIKE '%@hotmial.com'; UPDATE customers SET email = REPLACE(email, '@yahooo.com', '@yahoo.com') WHERE email LIKE '%@yahooo.com'; UPDATE customers SET email = REPLACE(email, '@gamil.com', '@gmail.com') WHERE email LIKE '%@gamil.com';
8 rows affected
Investigate 6 duplicate cédula document_ids in customers
6 customers share a document_id with another customer. In Colombia the cédula de ciudadanía is nationally unique. Likely causes: (1) manual data entry error, (2) shared family accounts, (3) ETL key collision. Must be resolved before any Habeas Data subject request response.
SELECT document_id, COUNT(*) cnt, array_agg(customer_id) ids FROM customers GROUP BY document_id HAVING COUNT(*) > 1 ORDER BY cnt DESC; -- Escalate each group to ops team for identity verification.
12 rows affected
Remove or correct 1 age=143 outlier in customers
One customer row has a date_of_birth yielding age=143 — clearly a data entry error (likely 1979 entered as 1879). Low urgency but the outlier inflates max-age statistics and may trigger false positives in age-gating logic.
SELECT customer_id, full_name, date_of_birth, TIMESTAMPDIFF(YEAR, date_of_birth, NOW()) as age FROM customers WHERE TIMESTAMPDIFF(YEAR, date_of_birth, NOW()) > 120; -- After manual verification, correct date_of_birth or NULL it.
1 rows affected
Column similarity matches
| Column A | Column B | Jaccard |
|---|---|---|
| crm_postgres.customers.email | payments_pg.merchants.contact_email | 0.89 |
| crm_postgres.customers.document_id | compliance_consent.consent_log.subject_id | 0.97 |
| crm_postgres.customers.customer_id | analytics_orders.dim_customers.customer_id | 0.98 |
| payments_pg.transactions.customer_id | crm_postgres.customers.customer_id | 0.83 |
| crm_postgres.customers.city | crm_postgres.addresses.city | 0.86 |