DQ.

Validity — Data Quality Dimension

Validity

A column can be fully complete and uniquely populated yet entirely invalid. "not-an-email", "9999-99-99", and "USD_DOLLAR" are present, non-null, and unique — and all three are wrong. Validity catches the class of problems that completeness and uniqueness cannot.

Definition

Validity measures the proportion of values that conform to an expected format, type, allowable value set, or business rule.

validity = valid_count / total_rows

A score of 1.0 means every value passes all validity rules for that column. Validity rules include: regex patterns, type constraints, range checks, enum membership, date format checks, and cross-field logical constraints.

How DQ Measures It

During profiling, DQ infers validity rules automatically for each column:

Inferred rules can be reviewed and promoted to persistent checks with one click. Custom rules (regex, SQL expressions) can also be authored directly.

Example validity rule:
  column: country_code
  rule: matches_regex
  pattern: "^[A-Z]{2}$"
  # ISO 3166-1 alpha-2: two uppercase letters only

Common Failure Modes

Free-text entry into structured fields. A city field accepts free text in a web form. Users enter "NYC", "New York City", "new york", "N.Y.". All are present and unique. None match the canonical "New York". See /blog/repairing-dirty-cities-and-countries for the remediation approach.

Date format inconsistency. One upstream system sends YYYY-MM-DD; another sends MM/DD/YYYY. After union, half the date column fails ISO format checks. Date arithmetic on the mixed column produces incorrect or null results.

Numeric overflow stored as string. A BigDecimal from a Java service is serialized as scientific notation: "1.234e7". The downstream column expects "12340000". The value is technically correct but fails the expected format.

Enum drift. A status column is validated against ["active", "inactive", "pending"]. A new engineer adds "suspended" to the application code without updating the reference set. DQ flags the new value as invalid until the rule is updated.

For a full overview of all six dimensions, see /blog/six-dimensions-of-data-quality.


About DQ. DQ is the data quality engine that profiles, validates, and remediates your tables in 90 seconds. Built by K/20X Labs.