Overview
The Variant Frequency filter selects cases based on how frequently their process variant occurs within the dataset. It allows you to focus on either common patterns (high-frequency variants) or rare patterns (low-frequency variants) by setting minimum and maximum frequency thresholds as percentages of total cases. This case-level filter groups all cases by their process variant, counts how many cases follow each variant, and keeps only those variants whose case counts fall within your specified range.
Process variants represent unique sequences of activities that cases follow through your process. By filtering based on variant frequency, you can isolate standard workflows, identify exceptional paths, or exclude statistically insignificant patterns from your analysis.
Common Uses
- Focus on Standard Processes: Filter to show only the most common variants to understand your typical process flows and standard operating procedures.
- Identify Exceptional Cases: Isolate rare variants that occur infrequently to detect unusual process paths, exceptions, or potential problems.
- Exclude Noise: Remove very rare variants that represent insignificant statistical outliers or one-off cases from your analysis.
- Process Standardization: Analyze how many cases follow standardized variants versus non-standard paths.
- Compliance Analysis: Find cases that follow uncommon process paths which might indicate non-compliant behavior.
- Performance Optimization: Focus analysis on the most frequent variants where process improvements will have the greatest impact.
Settings
Minimum Percent: The minimum frequency threshold as a decimal percentage (0.0 to 1.0). Variants that occur in fewer cases than this percentage of the total will be filtered out. Must be greater than or equal to 0 and less than or equal to Maximum Percent.
Maximum Percent: The maximum frequency threshold as a decimal percentage (0.0 to 1.0). Variants that occur in more cases than this percentage of the total will be filtered out. Must be less than or equal to 1.0 and greater than or equal to Minimum Percent.
Note: Percentages are expressed as decimals. For example, use 0.1 for 10%, 0.05 for 5%, and 1.0 for 100%.
Examples
Example 1: Analyzing Common Process Variants
Scenario: You want to focus your analysis on the most frequently occurring process paths to understand standard operations. You only want to see variants that occur in at least 10% of cases.
Settings:
- Minimum Percent: 0.1
- Maximum Percent: 1.0
Result: The filter keeps only cases whose variants occur in at least 10% of the total cases. If you have 1,000 cases total, only variants with at least 100 cases will be included.
Insights: This helps you concentrate on the main process flows while filtering out less common variations. By focusing on high-frequency variants, you can identify your standard operating procedures and ensure process improvements target the workflows that affect the most cases.
Example 2: Finding Rare and Exceptional Variants
Scenario: You want to identify unusual process paths that only occur in a small percentage of cases to detect exceptions, errors, or non-standard workflows.
Settings:
- Minimum Percent: 0.0
- Maximum Percent: 0.05
Result: The filter keeps only cases whose variants occur in at most 5% of the total cases. If you have 1,000 cases, only variants with 50 or fewer cases will be included.
Insights: Rare variants often represent exceptions, errors, workarounds, or special handling procedures. Analyzing these cases can reveal:
- Process deviations that need investigation
- Compliance issues or unauthorized procedures
- System errors or data quality problems
- Opportunities to standardize exceptional handling
Example 3: Excluding Extreme Outliers
Scenario: You want to analyze mid-range variants while excluding both the most common patterns and the rarest one-off cases to focus on moderate variation in your process.
Settings:
- Minimum Percent: 0.05
- Maximum Percent: 0.25
Result: The filter keeps only cases whose variants occur between 5% and 25% of the time. This excludes both the dominant variants and the statistical outliers.
Insights: This range helps you understand process variation in your middle tier of cases. These variants are significant enough to matter statistically but aren't part of your standard process, revealing:
- Secondary standard processes
- Seasonal or conditional workflows
- Process alternatives that occur regularly but not dominantly
Example 4: Isolating High-Frequency Variants for Process Optimization
Scenario: You want to optimize your process by focusing on the top variants that represent at least 20% of all cases, ensuring your improvements affect a significant portion of your workload.
Settings:
- Minimum Percent: 0.2
- Maximum Percent: 1.0
Result: The filter keeps only cases whose variants occur in at least 20% of all cases. With 1,000 total cases, only variants with 200 or more cases are included.
Insights: By focusing on high-frequency variants, you ensure that:
- Process improvements will have maximum impact
- Analysis is statistically significant
- Resources are directed toward the most common workflows
- Standardization efforts target the right processes
Output
The filter returns a new dataset containing only the cases whose process variants have frequencies within the specified range. Each case preserves all its original events and attributes. The variant structure and sequence of activities remain unchanged - only cases from qualifying variants are retained.
If no variants fall within the specified frequency range, the filter returns an empty result set.
Technical Notes
- Filter Type: Case-level filter (removes entire cases, not individual events)
- Grouping Logic: Groups cases by variant, counts occurrences, then applies frequency thresholds
- Frequency Calculation: Converts percentage thresholds to absolute counts based on total case count
- Range Inclusivity: Both minimum and maximum thresholds are inclusive
- Validation: Ensures Minimum Percent is not greater than Maximum Percent and both are within valid ranges (0.0 to 1.0)
This documentation is part of the mindzieStudio process mining platform.