Value Frequency

Overview

The Value Frequency filter selects cases based on how frequently their attribute values appear across the entire dataset. This case-level filter groups cases by their values in a specified attribute, counts how often each value occurs, and includes or excludes cases based on whether the frequency meets your specified threshold. You can set thresholds using either absolute counts (e.g., "at least 5 occurrences") or percentages (e.g., "in at least 20% of cases").

This filter is particularly useful for identifying common patterns, detecting rare outliers, focusing on high-volume categories, or filtering out infrequent edge cases that may skew analysis results.

Common Uses

Focus on Major Categories: Keep only cases where attribute values appear frequently enough to be statistically significant, eliminating rare outliers.
Outlier Detection: Identify unusual or rare cases by filtering for attribute values that appear infrequently in the dataset.
Data Quality Analysis: Find potentially problematic data by identifying values that appear exactly once, which may indicate data entry errors or duplicate records.
High-Volume Analysis: Concentrate analysis on the most common regions, products, or customer segments by filtering for frequently occurring values.
Noise Reduction: Remove edge cases and low-frequency variants that add complexity without adding meaningful insights.
Pattern Recognition: Discover systematic issues by identifying values that appear with specific frequencies (e.g., exactly twice, suggesting systematic duplication).

Settings

Column Name: Select the attribute to analyze for value frequency. The filter supports integer and text attributes. Hidden columns and case ID columns are not available.

Compare Method: Choose how to compare the frequency against your threshold:

Equal: Keep cases where values appear exactly the specified number of times
Greater Than: Keep cases where values appear more times than the threshold
Greater Than or Equal: Keep cases where values appear at least the specified number of times
Less Than: Keep cases where values appear fewer times than the threshold
Less Than or Equal: Keep cases where values appear no more times than the threshold
Not Equal: Keep cases where values do not appear exactly the specified number of times

Threshold Type: Specify whether the threshold represents:

Count: An absolute number of occurrences
Percent: A decimal percentage of total cases (0.0 to 1.0)

Compare Threshold: Enter the numeric threshold value. For Count mode, this is the number of occurrences. For Percent mode, enter a decimal (e.g., 0.4 for 40%).

Examples

Example 1: Focus on Major Regions

Scenario: Your process data includes cases from 15 different regions, but you want to focus analysis only on regions that represent significant volume. You decide to keep only regions that appear in at least 10% of all cases.

Settings:

Column Name: Region
Compare Method: Greater Than or Equal
Threshold Type: Percent
Compare Threshold: 0.1

Result: The filter keeps only cases from regions that appear in 10% or more of the dataset. If you have 1,000 cases, this means regions with at least 100 cases are included, while smaller regions are filtered out.

Insights: This focuses your analysis on the major regions while eliminating noise from small regional offices with minimal activity, making patterns and trends easier to identify.

Example 2: Identify Unique Cases

Scenario: You suspect some cases have unique attribute values that may indicate data quality issues or special handling. You want to find all cases where the value appears exactly once in the entire dataset.

Settings:

Column Name: Customer ID
Compare Method: Equal
Threshold Type: Count
Compare Threshold: 1.0

Result: The filter returns only cases where the Customer ID appears exactly once across all cases.

Insights: These unique customers may represent:

One-time customers who never returned
Potential data entry errors with misspelled customer names
Test cases that should be removed
VIP customers requiring special attention

Example 3: Find High-Frequency Products

Scenario: You want to analyze only your best-selling products that appear in at least 50 cases to understand successful product patterns.

Settings:

Column Name: Product Name
Compare Method: Greater Than or Equal
Threshold Type: Count
Compare Threshold: 50.0

Result: The filter keeps cases for products that were ordered at least 50 times in the dataset.

Insights: By focusing on high-volume products, you can identify patterns in successful product processing, common bottlenecks, and optimization opportunities that will have the greatest business impact.

Example 4: Exclude Rare Process Variants

Scenario: Your process has many rare variants that make the process map cluttered. You want to remove cases where the starting activity is uncommon (appears in less than 5% of cases).

Settings:

Column Name: _calcStartActivity
Compare Method: Less Than
Threshold Type: Percent
Compare Threshold: 0.05

Result: The filter keeps only cases where the starting activity appears in less than 5% of all cases, effectively selecting the rare variants.

Insights: This helps identify unusual process entry points that may indicate exceptions, errors, or non-standard workflows requiring investigation.

Example 5: Remove Duplicate Detection

Scenario: You want to identify potentially duplicated cases by finding attribute values that appear exactly twice, which might indicate systematic duplication issues.

Settings:

Column Name: Order Number
Compare Method: Equal
Threshold Type: Count
Compare Threshold: 2.0

Result: The filter returns cases where the Order Number appears exactly twice in the dataset.

Insights: These pairs of cases may represent:

System errors causing duplicate order creation
Split shipments for the same order
Order amendments or revisions
Data integration issues from multiple systems

Example 6: Exclude Low-Frequency Outliers

Scenario: You want to clean your dataset by removing cases from categories that represent less than 2% of the total volume, as these are likely edge cases.

Settings:

Column Name: Department
Compare Method: Greater Than or Equal
Threshold Type: Percent
Compare Threshold: 0.02

Result: The filter keeps only cases from departments that handle at least 2% of all cases.

Insights: This creates a cleaner dataset focused on the core business operations while filtering out small departments or test departments that may not represent typical process behavior.

Output

The filter returns a new dataset containing only cases that meet the specified frequency criteria for the selected attribute. All cases with the same attribute value are treated as a group - either the entire group is included, or the entire group is excluded, based on how many cases share that value.

For example, if "Region A" appears in 100 cases and meets your threshold, all 100 cases with "Region A" are included. The filter preserves all events and attributes for the included cases.

Technical Notes

Filter Type: Case-level filter (removes entire cases based on attribute value frequency)
Grouping Logic: All cases are grouped by their values in the specified attribute, and each group's frequency is compared against the threshold
Null Handling: Null values are treated as a distinct group and counted like any other value
Supported Data Types: Integer (Int32, Int64) and text (String) attributes
Threshold Conversion: When using Percent mode, the percentage is automatically converted to an absolute count by multiplying by the total number of cases
Validation: The filter suggests similar column names if you misspell the attribute name

This documentation is part of the mindzieStudio process mining platform.