DQ
DQCatalogAnalytics_orders
Catalog/

analytics_orders

Type: snowflakeHost: xy12345.us-east-1.snowflakecomputing.comDB: ANALYTICSLast scan: 15h agoDuration: 124s
soc2

Overall score

88.2%

-0.4% vs last week

Six-dimension scores

completeness

96.1%

-0.2%

revenue_usd null 3.9% (unknown-currency orders). All PK fields 100% complete.

uniqueness

99.9%

+0.0%

fct_orders.order_id 100% unique. dim_products.product_code 99.9% unique.

validity

92.4%

-0.5%

47 orders have negative line_items count. 12 orders with ship_distance_km > 6000 (overseas).

consistency

86.3%

-1.1%

KS drift D=0.18, p=0.04 on revenue_usd upper tail vs 30-day baseline. Seasonal or actual shift?

integrity

97.8%

+0.1%

dim_customers FK match 99.8%. 480 fct_orders reference dim_products with no match (new products not yet in dim).

accuracy

88.7%

-0.3%

Mahalanobis distance > 3.0 on (revenue, line_items, ship_distance): 47 outlier orders flagged.

analytics_orders.fct_orders

5,400,000 rows · 18 cols · scanned 15h ago
ColumnTypeNull%DistinctMeanp95DimensionsNotes
order_id
varchar0%5,400,000
compl
100.0%
valid
100.0%
uniq
100.0%
revenue_usd
float3.9%4,320,00087380
compl
96.1%
valid
99.4%
uniq
80.0%
KS drift on upper tail. 47 Mahalanobis outliers (distance >3.0).
revenue_local
float3.9%4,050,000340,000
compl
96.1%
valid
99.4%
uniq
75.0%
line_items
integer0%4838
compl
100.0%
valid
99.1%
uniq
0.0%
47 rows with negative line_items.
ship_distance_km
float6.0%2,700,0004201,400
compl
94.0%
valid
99.8%
uniq
50.0%
customer_id
varchar0.1%216,000
compl
99.9%
valid
100.0%
uniq
4.0%
product_id
varchar0%10,800
compl
100.0%
valid
100.0%
uniq
0.2%

analytics_orders.dim_customers

240,000 rows · 12 cols · scanned 15h ago
ColumnTypeNull%DistinctMeanp95DimensionsNotes
customer_id
varchar0%240,000
compl
100.0%
valid
100.0%
uniq
100.0%
emailPII

email

varchar1.0%237,600
compl
99.0%
valid
97.0%
uniq
99.0%
country_code
char(2)2.0%12
compl
98.0%
valid
99.8%
uniq
0.0%

analytics_orders.dim_products

12,000 rows · 10 cols · scanned 15h ago
ColumnTypeNull%DistinctMeanp95DimensionsNotes
product_id
varchar0%12,000
compl
100.0%
valid
100.0%
uniq
100.0%
product_code
varchar0%11,988
compl
100.0%
valid
99.9%
uniq
99.9%
12 near-duplicate product_codes differing only in suffix.
category
varchar1.0%24
compl
99.0%
valid
99.0%
uniq
0.0%
price_usd
float2.0%7,20054220
compl
98.0%
valid
99.8%
uniq
60.0%

Auto-prioritized fixes (2)

warningfct_orders.revenue_usdaccuracy

Investigate 47 Mahalanobis outliers in fct_orders

47 orders have Mahalanobis distance > 3.0 on the joint distribution of (revenue_usd, line_items, ship_distance_km). These could be legitimate bulk orders, test orders, or data entry errors. Concurrently, a KS drift (D=0.18, p=0.04) was detected on the revenue_usd upper tail vs the 30-day baseline.

-- Identify the 47 outliers
SELECT order_id, revenue_usd, line_items, ship_distance_km,
  mahalanobis_distance(ARRAY[revenue_usd, line_items, ship_distance_km]) as m_dist
FROM fct_orders WHERE m_dist > 3.0 ORDER BY m_dist DESC;
-- Then either: (a) tag as 'bulk_order', (b) cap at p99, or (c) exclude from aggregate metrics.

47 rows affected

infofct_orders.line_itemsvalidity

Correct 47 fct_orders rows with negative line_items count

47 orders have a negative line_items count. This appears to be a sign convention error in the dbt model int_transactions_normalized — return orders are being represented with negative counts rather than a separate 'return' flag.

-- In dbt model int_transactions_normalized:
-- Change: line_items
-- To: ABS(line_items) — and add a separate is_return boolean
SELECT order_id, line_items FROM fct_orders WHERE line_items < 0;

47 rows affected

Column similarity matches

Column AColumn BJaccard
legacy_mysql.products.skuanalytics_orders.dim_products.product_code0.94
crm_postgres.customers.customer_idanalytics_orders.dim_customers.customer_id0.98
legacy_mysql.products.price_copanalytics_orders.fct_orders.revenue_local0.71