Overview
The Filter Process Log enrichment is a powerful data cleanup operator that permanently removes unwanted cases and events from your process dataset based on specified filter criteria. Unlike temporary filtering that only hides data during analysis, this enrichment physically removes the filtered data from the log, creating a smaller, more focused dataset. This permanent filtering is essential for data quality management, privacy compliance, and performance optimization in process mining projects.
This enrichment operates at the most fundamental level of process mining by modifying the actual event log structure. When you apply filters through this enrichment, it evaluates each case against your defined criteria and removes all cases (and their associated events) that don't meet the requirements. The result is a streamlined dataset that contains only the relevant process instances, making all subsequent analyses faster and more accurate. This is particularly valuable when working with large datasets where irrelevant data can obscure important patterns or when you need to create specialized views of your process for different stakeholder groups.
The Filter Process Log enrichment is unique in its permanent nature - once executed, the filtered data is removed from the working dataset. This makes it ideal for creating production-ready datasets, removing test data, eliminating outliers, or focusing on specific time periods or business segments. The enrichment leverages the same powerful filtering engine used throughout mindzieStudio, allowing you to combine multiple filter conditions with complex logic to precisely define which data to retain.
Common Uses
- Remove test cases and dummy data before production analysis
- Extract specific time periods for period-over-period comparisons
- Eliminate incomplete cases that would skew process metrics
- Create department or region-specific datasets from enterprise-wide logs
- Remove outliers and anomalies that distort standard process patterns
- Ensure data privacy by filtering out sensitive case categories
- Optimize performance by reducing dataset size for complex analyses
Settings
Filter List: The core configuration component that defines which cases to keep or remove from the process log. Access the filter configuration through the three-dot menu, where you can add multiple filter conditions. Each filter can target different aspects of your data - case attributes, event attributes, timestamps, or activity names. Filters can be combined using AND/OR logic to create sophisticated selection criteria. The filter interface provides a visual builder that helps you construct complex filter logic without writing code. Common filter types include:
- Attribute filters: Based on case or event attribute values
- Time filters: Select specific date ranges or time periods
- Activity filters: Include or exclude cases containing certain activities
- Performance filters: Based on duration, throughput, or other metrics
- Conformance filters: Cases matching or violating process rules
The filter list supports saving and loading filter configurations, allowing you to reuse common filtering patterns across different datasets or projects.
Examples
Example 1: Remove Test Data from Production Dataset
Scenario: A SAP implementation contains test transactions marked with specific prefixes that need to be removed before analyzing real business processes. The test data was created during system validation and would distort KPIs if included in the analysis.
Settings:
- Filter List Configuration:
- Filter 1: Order_Number NOT STARTS WITH "TEST"
- Filter 2: Customer_Name NOT EQUALS "Dummy Customer"
- Filter 3: Created_Date AFTER "2024-01-01"
- Logic: Filter 1 AND Filter 2 AND Filter 3
Output: The enrichment removes all cases where:
- Order numbers begin with "TEST" (e.g., "TEST_001", "TEST_PO_2024")
- Customer name is exactly "Dummy Customer"
- Cases created before January 1, 2024
Original dataset: 150,000 cases with 2.3 million events Filtered dataset: 142,000 cases with 2.18 million events Removed: 8,000 test cases and their associated 120,000 events
Insights: The cleaned dataset now accurately represents actual business operations, improving the reliability of process metrics and conformance analysis. Performance calculations, cycle times, and bottleneck analyses now reflect real operational challenges rather than artificial test scenarios.
Example 2: Extract High-Value Purchase Orders
Scenario: In a procurement process spanning multiple categories, management wants to focus exclusively on high-value purchase orders above $50,000 to optimize approval workflows and identify cost-saving opportunities.
Settings:
- Filter List Configuration:
- Filter 1: Total_Order_Value GREATER THAN 50000
- Filter 2: Order_Status NOT EQUALS "Cancelled"
- Filter 3: Order_Type IN ["Standard PO", "Contract PO", "Planned PO"]
- Logic: Filter 1 AND Filter 2 AND Filter 3
Output: Creates a focused dataset containing only:
- Purchase orders with total value exceeding $50,000
- Active orders (excluding cancelled ones)
- Standard business order types (excluding emergency or spot purchases)
Before filtering: 45,000 total purchase orders After filtering: 3,200 high-value orders representing 72% of total spend Events reduced from 890,000 to 95,000
Insights: The filtered dataset reveals that high-value orders follow different approval patterns, have longer cycle times, and involve more stakeholders. This focused view enables targeted process optimization for the orders with the greatest financial impact.
Example 3: Create Region-Specific Dataset
Scenario: A multinational corporation needs to create separate process analyses for European operations due to GDPR compliance requirements and regional process variations.
Settings:
- Filter List Configuration:
- Filter 1: Region EQUALS "Europe"
- Filter 2: Country IN ["Germany", "France", "Italy", "Spain", "Netherlands", "Belgium"]
- Filter 3: Process_Start_Date BETWEEN "2024-01-01" AND "2024-12-31"
- Logic: (Filter 1 OR Filter 2) AND Filter 3
Output: Extracts all European cases for the 2024 calendar year:
- Original global dataset: 500,000 cases across 35 countries
- Filtered European dataset: 185,000 cases from 6 countries
- Events reduced from 8.5 million to 3.1 million
- All non-European data permanently removed from working dataset
Insights: The region-specific dataset enables compliance with local data regulations, reveals European-specific process patterns, and provides a manageable dataset size for detailed regional analysis and optimization initiatives.
Example 4: Focus on Completed Healthcare Episodes
Scenario: A hospital wants to analyze only fully completed patient treatment episodes, excluding ongoing treatments and administrative-only visits, to accurately measure treatment effectiveness and resource utilization.
Settings:
- Filter List Configuration:
- Filter 1: Episode_Status EQUALS "Completed"
- Filter 2: Treatment_Type NOT EQUALS "Administrative"
- Filter 3: Has_Clinical_Outcome EQUALS "Yes"
- Filter 4: Duration_Days BETWEEN 1 AND 365
- Logic: Filter 1 AND Filter 2 AND Filter 3 AND Filter 4
Output: Filtered dataset includes only:
- Completed treatment episodes with documented outcomes
- Clinical treatments (excluding administrative visits)
- Realistic duration range (1-365 days)
Original dataset: 120,000 patient episodes Filtered dataset: 78,000 completed clinical episodes Removed: 42,000 incomplete, administrative, or outlier cases
Insights: The cleaned dataset provides accurate metrics for treatment duration, resource usage, and clinical pathways without the noise of incomplete data, enabling reliable quality metrics and process improvement initiatives.
Example 5: Eliminate Outliers for Standard Process Analysis
Scenario: A manufacturing company wants to analyze their standard production process by removing extreme outliers that represent equipment failures or exceptional circumstances, focusing on the typical 95% of cases.
Settings:
- Filter List Configuration:
- Filter 1: Cycle_Time_Hours BETWEEN 2 AND 48
- Filter 2: Number_of_Rework_Loops LESS THAN 3
- Filter 3: Production_Status NOT IN ["Emergency", "Experimental", "Failed"]
- Filter 4: Defect_Rate LESS THAN 0.05
- Logic: Filter 1 AND Filter 2 AND Filter 3 AND Filter 4
Output: Removes outlier cases:
- Cases with extreme cycle times (< 2 hours or > 48 hours)
- Excessive rework (3+ loops)
- Non-standard production runs
- High defect rates (> 5%)
Before: 25,000 production runs with high variance After: 23,750 standard production runs Removed: 1,250 outlier cases (5% of total)
Insights: The filtered dataset represents normal operating conditions, enabling accurate baseline metrics, realistic improvement targets, and identification of standard process variations versus exceptional events.
Output
The Filter Process Log enrichment produces a permanently modified dataset with the following characteristics:
Modified Process Log: The enrichment returns a new SuperLog object containing only the cases that meet your filter criteria. All filtered cases and their associated events are permanently removed from the working dataset. This is an irreversible operation within the current analysis session.
Case Count Reduction: The number of cases in your dataset will decrease based on the filter criteria. You can monitor this reduction in the dataset statistics to ensure the filtering achieved the expected results.
Event Count Impact: When cases are removed, all events belonging to those cases are also removed. This can significantly reduce the total event count, especially for cases with many events.
Preserved Data Structure: All existing attributes, both at the case and event level, remain intact for the retained cases. The enrichment only removes entire cases; it doesn't modify the structure or content of surviving cases.
Performance Benefits: The reduced dataset size leads to faster execution of all subsequent enrichments, filters, and calculations. This is particularly noticeable with complex process mining operations.
Downstream Impact: All analyses, visualizations, and exports will reflect the filtered dataset. Ensure you save a copy of the original dataset if you need to reference the complete data later.
Important Considerations
Permanent Operation: Unlike visualization filters that temporarily hide data, this enrichment permanently removes data from your working dataset. Always maintain a backup of your original data before applying this enrichment.
Order of Operations: Apply this enrichment early in your analysis workflow if you know certain data is irrelevant. This improves performance for all subsequent operations.
Filter Validation: Test your filters using the preview functionality before executing the enrichment to ensure you're retaining the intended data.
Cascading Effects: Removing cases might impact calculations that depend on the full dataset, such as percentile calculations or relative performance metrics.
This documentation is part of the mindzie Studio process mining platform.