DQ
DQCatalogCrm_postgres
Catalog/

crm_postgres

Type: postgresqlHost: pg-crm-01.us-east-1.k20x.internalDB: crm_prodLast scan: 14h agoDuration: 61s
gdprhabeas_data_co

Overall score

76.8%

-2.1% vs last week

Six-dimension scores

completeness

98.1%

-0.2%

1 row missing required city; 0.5% phone numbers null; consents table 100% complete.

uniqueness

93.4%

-1.5%

Email column has 2 exact duplicates (shared household). document_id has 6 true duplicates — likely ETL bug.

validity

71.2%

-3.1%

4 typo-domain emails (gmial/hotmial/yahooo). 19 city variants map to 6 canonical names. 3 malformed phone numbers.

consistency

78.8%

-1.8%

City name inconsistency across customers and addresses tables. Mixed locale date formats in legacy imported rows.

integrity

82.4%

-1.2%

consents table FK to customers: 99.8% match. 3 address rows have no corresponding customer.

accuracy

80.3%

-1.4%

12 PII fields detected; 2 email domains consistent with known typo-pattern list. Age distribution outlier (max=143).

crm_postgres.customers

185,000 rows · 16 cols · scanned 14h ago
ColumnTypeNull%DistinctMeanp95DimensionsNotes
customer_id
uuid0%185,000
compl
100.0%
valid
100.0%
uniq
100.0%
emailPII

email

text0.8%183,525
compl
99.2%
valid
71.2%
uniq
99.2%
4 typo-domain emails: gmial.com (3 rows), hotmial.com (2 rows), yahooo.com (2 rows), gamil.com (1 row).
document_idPII

co_cedula

text2.0%184,630
compl
98.0%
valid
99.3%
uniq
100.0%
Colombian cédula de ciudadanía. 6 duplicate document_ids suspected ETL bug.
phonePII

phone_co

text0.5%181,300
compl
99.5%
valid
98.3%
uniq
98.0%
3 numbers with invalid country code prefix.
full_namePII

full_name

text0.1%157,250
compl
99.9%
valid
99.0%
uniq
85.0%
city
text0.0%19
compl
100.0%
valid
62.1%
uniq
0.0%
19 unique strings clustering to 6 canonical cities. BOG/bogota/Bogt´a/bogot/Bogotá D.C. all mean Bogotá.
date_of_birthPII

date_of_birth

date4.0%92,500
compl
96.0%
valid
99.8%
uniq
50.0%
1 outlier age=143.
created_at
timestamptz0%148,000
compl
100.0%
valid
100.0%
uniq
80.0%
consent_given
boolean0.2%2
compl
99.8%
valid
100.0%
uniq
0.0%

crm_postgres.addresses

212,000 rows · 10 cols · scanned 14h ago
ColumnTypeNull%DistinctMeanp95DimensionsNotes
address_id
uuid0%212,000
compl
100.0%
valid
100.0%
uniq
100.0%
customer_id
uuid0%184,630
compl
100.0%
valid
100.0%
uniq
87.0%
3 orphan address rows have no matching customer_id.
street_line1PII

street_address

text0.3%201,400
compl
99.7%
valid
98.0%
uniq
95.0%
city
text0%23
compl
100.0%
valid
73.0%
uniq
0.0%
Same city normalization problem as customers.city.
postal_code
text8.0%2,120
compl
92.0%
valid
97.0%
uniq
1.0%

crm_postgres.consents

185,000 rows · 7 cols · scanned 14h ago
ColumnTypeNull%DistinctMeanp95DimensionsNotes
consent_id
uuid0%185,000
compl
100.0%
valid
100.0%
uniq
100.0%
customer_id
uuid0%184,944
compl
100.0%
valid
100.0%
uniq
100.0%
policy_version
text0%4
compl
100.0%
valid
99.0%
uniq
0.0%
consent_timestamp
timestamptz0.2%181,300
compl
99.8%
valid
100.0%
uniq
98.0%

Auto-prioritized fixes (4)

criticalcustomers.cityvalidity

Normalize 19 city variants → 6 canonical Colombian cities

19 unique strings cluster into 6 canonical cities. rapidfuzz centroid similarity threshold 0.91. BOG / bogota / Bogt´a / bogot / Bogotá D.C. all map to Bogotá. Affects downstream city-level analytics and Habeas Data subject-location reporting.

UPDATE customers SET city = CASE
  WHEN lower(city) IN ('bog','bogota','bogt´a','bogot','bogotá d.c.','bogotá dc','santafe de bogota') THEN 'Bogotá'
  WHEN lower(city) IN ('mde','medellin','medellín') THEN 'Medellín'
  WHEN lower(city) IN ('cli','cali') THEN 'Cali'
  WHEN lower(city) IN ('baq','barranquilla','barranqilla') THEN 'Barranquilla'
  WHEN lower(city) IN ('ctg','cartagena','cartagena de indias') THEN 'Cartagena'
  WHEN lower(city) IN ('bga','bucaramanga') THEN 'Bucaramanga'
  ELSE city
END
WHERE city NOT IN ('Bogotá','Medellín','Cali','Barranquilla','Cartagena','Bucaramanga');

12,200 rows affected

criticalcustomers.emailvalidity

Correct 8 typo-domain email addresses

Reference dictionary match against known typo-domain list. 3 rows have gmial.com, 2 hotmial.com, 2 yahooo.com, 1 gamil.com. These emails will hard-bounce in any campaign — and since they are PII, incorrect values create GDPR subject-access-request risk.

UPDATE customers SET email = REPLACE(email, '@gmial.com', '@gmail.com') WHERE email LIKE '%@gmial.com';
UPDATE customers SET email = REPLACE(email, '@hotmial.com', '@hotmail.com') WHERE email LIKE '%@hotmial.com';
UPDATE customers SET email = REPLACE(email, '@yahooo.com', '@yahoo.com') WHERE email LIKE '%@yahooo.com';
UPDATE customers SET email = REPLACE(email, '@gamil.com', '@gmail.com') WHERE email LIKE '%@gamil.com';

8 rows affected

warningcustomers.document_iduniqueness

Investigate 6 duplicate cédula document_ids in customers

6 customers share a document_id with another customer. In Colombia the cédula de ciudadanía is nationally unique. Likely causes: (1) manual data entry error, (2) shared family accounts, (3) ETL key collision. Must be resolved before any Habeas Data subject request response.

SELECT document_id, COUNT(*) cnt, array_agg(customer_id) ids
FROM customers
GROUP BY document_id
HAVING COUNT(*) > 1
ORDER BY cnt DESC;
-- Escalate each group to ops team for identity verification.

12 rows affected

infocustomers.date_of_birthaccuracy

Remove or correct 1 age=143 outlier in customers

One customer row has a date_of_birth yielding age=143 — clearly a data entry error (likely 1979 entered as 1879). Low urgency but the outlier inflates max-age statistics and may trigger false positives in age-gating logic.

SELECT customer_id, full_name, date_of_birth, TIMESTAMPDIFF(YEAR, date_of_birth, NOW()) as age
FROM customers WHERE TIMESTAMPDIFF(YEAR, date_of_birth, NOW()) > 120;
-- After manual verification, correct date_of_birth or NULL it.

1 rows affected

Column similarity matches

Column AColumn BJaccard
crm_postgres.customers.emailpayments_pg.merchants.contact_email0.89
crm_postgres.customers.document_idcompliance_consent.consent_log.subject_id0.97
crm_postgres.customers.customer_idanalytics_orders.dim_customers.customer_id0.98
payments_pg.transactions.customer_idcrm_postgres.customers.customer_id0.83
crm_postgres.customers.citycrm_postgres.addresses.city0.86