Find Date Outliers

Overview

The Find Date Outliers calculator identifies date and timestamp values in your event log that fall outside normal expected ranges, helping you detect data quality issues before they impact your process analysis. This specialized data quality calculator automatically scans all date and timestamp attributes across your entire event log to find values that are clearly invalid, such as dates in the distant past, far future, or zero values.

Unlike manual data inspection, this calculator systematically examines every date field in your process data to highlight potential problems that could distort your process mining analysis, such as incorrect activity timestamps, malformed data imports, or default placeholder values that were never updated.

Common Uses

  • Validate data quality after importing event logs from legacy systems or new data sources
  • Detect placeholder dates or default values that indicate incomplete data entry
  • Identify system clock errors or timezone conversion problems that create impossible timestamps
  • Find dates from test data that accidentally made it into production event logs
  • Verify that timestamp data falls within expected business operation periods
  • Quickly assess overall date field quality across all attributes before detailed analysis

Settings

This calculator requires no configuration settings. It automatically examines all date and timestamp attributes in your event log and identifies outlier values based on predefined rules for what constitutes unrealistic or problematic dates.

Standard Fields:

  • Title: Optional custom title for the calculator output
  • Description: Optional description for documentation purposes

Detection Rules:

The calculator identifies date outliers by checking for:

  • Dates before 1990: Values earlier than January 1, 1990 are flagged as likely data errors or placeholders
  • Dates after 2040: Values beyond January 1, 2040 are considered unrealistic for current business processes
  • Zero or null dates: Missing, null, or zero timestamp values that indicate incomplete data
  • Invalid date formats: Malformed date values that cannot be properly parsed

Examples

Example 1: Validating Legacy System Migration

Scenario: Your organization recently migrated invoice processing data from a 20-year-old legacy ERP system to a modern platform. Before performing process mining analysis, you want to verify that all date fields were correctly converted and no placeholder or default dates remain in the dataset.

Settings:

  • Title: "Invoice Data Migration Validation"
  • Description: "Check for date conversion issues from legacy system"

Output:

The calculator produces a table showing problematic date values grouped by attribute. Each row represents a specific attribute where outliers were found:

Attribute Name Outlier Count Example Outlier Value Issue Type
Invoice_Date 847 1900-01-01 Before 1990
Payment_Due_Date 847 1900-01-01 Before 1990
Last_Modified_Date 23 2099-12-31 After 2040
Approval_Timestamp 156 NULL Zero/Null

Insights:

The output reveals significant data quality issues from the migration. The 847 invoices with dates of January 1, 1900 are clearly placeholder values from the legacy system that weren't properly converted - this date was commonly used as a default "empty" value in older systems. The 23 records with a 2099 date on Last_Modified_Date suggest these were test records that accidentally migrated to production. The 156 null Approval_Timestamp values indicate incomplete records that are missing critical process timing information.

Before performing any process analysis, you should:

  1. Work with the data team to correct or remove the 847 records with placeholder dates
  2. Filter out the 23 test records with 2099 dates
  3. Investigate why 156 invoices lack approval timestamps

This validation saved you from drawing incorrect conclusions about invoice processing times and approval patterns based on corrupted date data.

Example 2: Detecting System Clock Issues

Scenario: Users have reported that some timestamps in your order fulfillment process "don't make sense," with activities appearing to happen in the wrong order. You suspect there may be server clock synchronization issues or timezone conversion problems affecting event timestamps.

Settings:

  • Title: "Order Fulfillment Timestamp Validation"
  • Description: "Identify clock synchronization or timezone issues"

Output:

The calculator shows outliers in the activity timestamp fields:

Attribute Name Outlier Count Example Outlier Value Issue Type
Activity_Timestamp 1,247 2043-08-15 14:23:00 After 2040
Event_Start_Time 1,247 2043-08-15 14:23:00 After 2040

Insights:

The 1,247 events all have timestamps in August 2043 - exactly 20 years in the future. This is a classic sign of a system clock error on one of your application servers or a time zone conversion bug that added decades instead of hours. The fact that both Activity_Timestamp and Event_Start_Time show identical outlier counts and values confirms these are the same events being captured by multiple fields.

Investigation reveals that a warehouse management system server had its clock incorrectly set after a maintenance window, and all events processed through that server for a 6-hour period received timestamps 20 years in the future. These 1,247 events represent critical order processing activities (picking, packing, shipping) that need to be corrected to restore proper process flow analysis.

Without this calculator, these timestamp errors would have caused your process maps to show activities completely out of sequence, making it impossible to accurately analyze order fulfillment performance for the affected time period.

Example 3: Pre-Analysis Data Quality Check

Scenario: You're about to perform a comprehensive process mining analysis of your purchase-to-pay process spanning three years of data. As a best practice, you run the Find Date Outliers calculator first to ensure your dataset is clean before investing time in detailed analysis.

Settings:

  • Title: "Purchase-to-Pay Data Quality Scan"
  • Description: "Pre-analysis validation check"

Output:

The calculator returns a table showing all attributes have valid date ranges with no outliers detected.

Result: No outliers found in any date attributes.

Insights:

This is the best possible outcome - a clean bill of health for your date data. The calculator examined all timestamp and date fields across your entire three-year purchase-to-pay event log and found no values before 1990, after 2040, or that are null/zero. This gives you confidence to proceed with your process mining analysis knowing that:

  • All timestamps accurately reflect when activities occurred
  • No placeholder dates will distort your time-based metrics
  • No test data accidentally contaminated your production dataset
  • System clocks were properly synchronized throughout the data collection period

You can now trust the temporal ordering of activities in process maps, the accuracy of duration calculations, and the reliability of time-based insights. This upfront validation saves countless hours of troubleshooting confusing results that would have been caused by corrupt date data.

Example 4: Identifying Incomplete Data Entry

Scenario: Your customer service ticketing system allows support agents to manually enter certain dates, and you suspect that many tickets have missing or incomplete timestamp information that could affect your case resolution time analysis.

Settings:

  • Title: "Support Ticket Date Completeness Check"
  • Description: "Identify tickets with missing date information"

Output:

Attribute Name Outlier Count Example Outlier Value Issue Type
First_Response_Date 3,456 NULL Zero/Null
Resolution_Date 892 NULL Zero/Null
Escalation_Date 12,034 NULL Zero/Null
Follow_Up_Date 8,721 1970-01-01 Before 1990

Insights:

The analysis reveals significant data entry gaps. The high number of null values indicates that agents are not consistently recording critical dates:

  • 3,456 tickets with no First_Response_Date: These cases cannot be included in response time SLA analysis
  • 892 tickets with no Resolution_Date: It's impossible to calculate resolution time for these cases
  • 12,034 tickets with no Escalation_Date: This is actually acceptable - most tickets shouldn't be escalated, so null is expected here
  • 8,721 tickets with 1970-01-01 as Follow_Up_Date: This Unix epoch date (January 1, 1970) is a classic default value indicating the field was never properly set

The most concerning issue is the 3,456 tickets missing first response dates, as this represents 15% of your ticket volume and directly impacts your ability to measure customer service responsiveness. You should:

  1. Update your ticketing system to make First_Response_Date a required field
  2. Provide agent training on the importance of complete date entry
  3. Consider automated timestamp capture rather than manual entry where possible
  4. Filter out the 892 unresolved tickets from completed case analysis

This validation helped you understand that your case resolution metrics have been understated because they excluded tickets with missing data, giving management a falsely optimistic view of support team performance.

Output

The calculator produces a data table that lists all date and timestamp attributes containing outlier values. The table is designed to help you quickly identify and prioritize data quality issues:

Attribute Name (Text): The name of the case or event attribute field that contains date outliers. This allows you to identify exactly which fields have problems.

Outlier Count (Number): The number of cases or events that have problematic date values in this attribute. Higher counts indicate more severe data quality issues requiring urgent attention.

Example Outlier Value (DateTime): A sample of one of the problematic date values found in the attribute, helping you understand the nature of the issue (e.g., "1900-01-01" suggests placeholder dates, while "2050-01-15" suggests clock errors).

Issue Type (Category): The type of outlier detected - "Before 1990", "After 2040", or "Zero/Null" - helping you understand whether the problem is placeholder dates, future dates, or missing values.

Interactive Analysis:

The output table is fully interactive - you can:

  • Click on any row to drill down into the specific cases containing those outlier values
  • Sort by Outlier Count to prioritize which attributes need correction first
  • Filter the results to focus on specific types of issues
  • Export the outlier list to share with data quality teams

Best Practices:

  • Run this calculator as the first step in any new process mining project
  • Re-run after any data imports or system migrations
  • Address outliers before creating process maps or calculating performance metrics
  • Use the calculator regularly on ongoing data feeds to catch quality degradation early

Note: The calculator only examines attributes with date or timestamp data types. Text fields containing dates are not analyzed. If no outliers are found, the calculator will display "No date outliers detected" - this indicates your data quality is excellent.


This documentation is part of the mindzie Studio process mining platform.

An error has occurred. This application may no longer respond until reloaded. Reload ??