Overview
The Dataset Information calculator provides a comprehensive statistical summary of your entire process dataset. It displays key metrics including time ranges, case and event counts, activity statistics, and data structure information in a single, easy-to-read overview.
This calculator requires no configuration and is ideal for quickly understanding the scope and characteristics of your process data.
Common Uses
- Understand the scope of a new dataset before beginning analysis
- Validate that data extraction captured the expected volume and time range
- Compare datasets by reviewing their statistical profiles side-by-side
- Monitor process volume trends by tracking case and event counts over time
- Verify data quality by checking case duration ranges and event distributions
- Generate dataset metadata for reports and presentations
Settings
There are no specific settings for this calculator beyond the standard title and description fields. The calculator automatically analyzes the entire dataset and displays all available metrics.
Examples
Example 1: Initial Process Discovery
Scenario: You have just imported a new purchase-to-pay dataset and want to understand its characteristics before starting your analysis.
Settings:
- Title: "Purchase-to-Pay Dataset Overview"
- Description: "Q4 2024 procurement data"
Output:
The calculator displays a comprehensive table with the following metrics:
- Start Dataset Time: 2024-10-01 00:00:00
- End Dataset Time: 2024-12-31 23:59:59
- Dataset Timespan: 92 days
- Min Case Time: 2 hours
- Max Case Time: 45 days
- Average Case Time: 8.5 days
- Median Case Time: 6.2 days
- Total Case Count: 1,847
- Total Activity Count: 14,776
- Average activities per case: 8.0
- Activities: 23 unique activities
- Case Columns: 15 attributes
- Activity Columns: 12 attributes
Insights: This dataset covers a full quarter with nearly 1,900 purchase orders. The average case duration of 8.5 days is reasonable for a procurement process, though some cases take up to 45 days, suggesting potential delays worth investigating. With an average of 8 activities per case across 23 unique activities, the process shows moderate complexity with some variation in execution paths.
Example 2: Comparing Filtered vs. Unfiltered Data
Scenario: You want to understand how applying a time filter affects your dataset characteristics.
Settings:
- Create two Dataset Information calculators:
- "Full Dataset Overview" (no filters)
- "Last 30 Days Overview" (with time period filter)
Output:
Full Dataset:
- Total Case Count: 1,847
- Dataset Timespan: 92 days
- Average Case Time: 8.5 days
Last 30 Days:
- Total Case Count: 623
- Dataset Timespan: 30 days
- Average Case Time: 9.2 days
Insights: The filtered view shows that about one-third of cases fall within the most recent month. Interestingly, the average case duration increased from 8.5 to 9.2 days in the most recent period, suggesting process performance may be declining and warranting further investigation.
Example 3: Data Quality Validation
Scenario: After completing a data extraction, you need to verify that all expected data was captured correctly.
Settings:
- Title: "Data Quality Check"
- Description: "Validation of January 2025 extraction"
Output:
- Start Dataset Time: 2025-01-01 00:00:00
- End Dataset Time: 2025-01-31 23:59:59
- Total Case Count: 412
- Total Activity Count: 3,296
- Activities: 18 unique activities
Insights: The dataset correctly spans the entire month of January 2025 as expected. The case count of 412 aligns with the expected monthly volume. All 18 standard activities are present in the data, confirming that the extraction captured all activity types. The average of 8 activities per case is consistent with historical patterns.
Example 4: Performance Baseline Documentation
Scenario: You need to document baseline metrics for your process before implementing improvement initiatives.
Settings:
- Title: "Pre-Improvement Baseline Metrics"
- Description: "Invoice processing baseline - January 2025"
Output:
- Total Case Count: 2,156
- Average Case Time: 12.3 days
- Median Case Time: 9.5 days
- Min Case Time: 4 hours
- Max Case Time: 67 days
- Average activities per case: 11.2
Insights: Current invoice processing averages 12.3 days with significant variation (4 hours to 67 days). The gap between average (12.3 days) and median (9.5 days) suggests that a subset of invoices with very long processing times is pulling up the average. These metrics establish a clear baseline for measuring improvement after implementing process changes.
Output
The Dataset Information calculator displays a single table with two columns:
Name: The name of each metric
Value: The corresponding value for that metric
Metrics Included
Time Metrics:
- Start Dataset Time: The timestamp of the earliest event in the dataset
- End Dataset Time: The timestamp of the latest event in the dataset
- Dataset Timespan: The total time period covered by the dataset
Case Duration Metrics:
- Min Case Time: The shortest case duration in the dataset
- Max Case Time: The longest case duration in the dataset
- Average Case Time: The mean duration across all cases
- Median Case Time: The median (middle value) case duration
Volume Metrics:
- Total Case Count: The number of unique cases in the dataset
- Total Activity Count: The total number of events across all cases
- Average activities per case: The mean number of events per case
Structure Metrics:
- Activities: The number of unique activity types in the process
- Case Columns: The number of attributes at the case level
- Activity Columns: The number of attributes at the event level
All time values are displayed in a readable format (e.g., "8.5 days" or "2 hours 30 minutes"). The output can be added to dashboards for ongoing monitoring or exported for documentation purposes.
This documentation is part of the mindzie Studio process mining platform.