Completeness — Data Quality Dimension
Completeness
Completeness is the most measured and most misunderstood data quality dimension. A null rate of 2 % sounds acceptable until you realize it means 20,000 missing customer records in a table of one million rows.
Definition
Completeness measures the proportion of expected values that are present and non-null in a dataset.
completeness = 1 − (null_count / total_rows)
A score of 1.0 means no nulls. A score of 0.0 means every value is null. For columns that are semantically required (primary keys, foreign keys, mandatory fields), the target should be 1.0.
Completeness also applies at the row level (cross-column): a row is complete if all required columns are populated. Row-level completeness is stricter and more useful for downstream processing.
How DQ Measures It
On every profiling run, DQ computes completeness for every column:
- Counts null values using the database engine's native null handling.
- Counts empty strings separately (treated as incomplete by default; configurable).
- Computes the ratio and stores a timestamped score.
- Compares against the previous run and flags regressions above the configured threshold (default: 1 % degradation triggers a warning; 5 % triggers an alert).
For large tables, DQ samples up to 1 million rows for the initial profile and uses the full table for rule-based monitoring runs.
Rules can be set as: "completeness on email must be ≥ 0.98". Violations are surfaced in the dashboard and routed to configured alert channels.
Common Failure Modes
ETL null propagation. A JOIN that should be INNER JOIN is written as LEFT JOIN. Missing matches produce null values in all columns from the right table. Completeness drops silently on every new load.
Optional vs. required field confusion. A form field is marked optional in the UI but the downstream pipeline assumes it is always present. The schema allows nulls; the logic does not.
Batch load failures. A partial load completes without error but writes nulls for all values in a column that failed to parse. Row counts look correct; completeness does not.
Schema evolution. A new NOT NULL column is added to a table without backfilling historical rows. Completeness on that column is 0.0 for all rows before the migration date.
For a full overview of all six dimensions, see /blog/six-dimensions-of-data-quality.
About DQ. DQ is the data quality engine that profiles, validates, and remediates your tables in 90 seconds. Built by K/20X Labs.