Remove Data From Underactive Periods

Overview

The Remove Data from Underactive Periods filter automatically trims the beginning and end of your process log by identifying and removing low-activity periods. This intelligent case-level filter calculates daily event frequencies, determines average activity levels, and removes cases that fall in "warm-up" or "wind-down" periods where event volumes are below your specified thresholds. It's particularly useful for eliminating startup and shutdown periods when analyzing steady-state process operations.

Common Uses

  • Remove system warm-up periods from the start of process logs
  • Eliminate wind-down periods at the end of data collection timeframes
  • Focus analysis on steady-state operations excluding ramp-up phases
  • Clean logs from pilot programs before full rollout
  • Remove low-activity periods during system migrations or transitions
  • Trim data collection periods that don't represent normal operations

Settings

Start Factor: A multiplier applied to the mean daily event count to determine the threshold for the first day to include. Days are included once daily activity exceeds StartFactor times the mean.

End Factor: A multiplier applied to the mean daily event count to determine the threshold for the last day to include. Days are included while daily activity exceeds EndFactor times the mean.

Setting Purpose Typical Values Effect
Start Factor Controls how aggressively to trim the beginning 0.1 - 0.5 Lower = more lenient, Higher = more aggressive trimming
End Factor Controls how aggressively to trim the ending 0.1 - 0.5 Lower = more lenient, Higher = more aggressive trimming

How it works:

  1. Calculates the number of events per day across the entire log
  2. Computes the mean (average) daily event count
  3. Finds the first day where activity exceeds (Start Factor x Mean)
  4. Finds the last day where activity exceeds (End Factor x Mean)
  5. Removes all cases that fall outside this date range

Examples

Example 1: Removing System Launch Period

Scenario: Your new order management system was launched on January 1st, but only a few pilot users were active in the first two weeks while the system was being validated. You want to remove this low-activity launch period and focus analysis on normal operations that began in mid-January.

Settings:

  • Start Factor: 0.3
  • End Factor: 0.1

Result:

The filter calculates that your mean daily event count is 500 events/day. With Start Factor = 0.3, it looks for the first day with at least 150 events (30% of mean). Days in early January with only 20-80 events are excluded. The analysis begins on January 14th when activity reached 150+ events. End trimming is minimal with End Factor = 0.1, removing only the very last days if activity dropped below 50 events/day.

Insights: This removes the pilot phase from your analysis, ensuring metrics reflect actual operational performance rather than early testing. Your cycle times, variant frequencies, and bottleneck analysis now represent real steady-state operations after the system was fully adopted.

Example 2: Cleaning Year-End Data Collection

Scenario: Your data collection ended on December 31st, but activity naturally decreased in late December as staff took holiday time off. You also had a slow start in early January as operations ramped up. You want to analyze only the core operational period with normal staffing.

Settings:

  • Start Factor: 0.2
  • End Factor: 0.2

Result:

With balanced start and end factors, the filter trims both low-activity periods. If your mean daily events was 800, days with fewer than 160 events are excluded from both ends. The holiday slowdown in late December (maybe 50-100 events/day) is removed, as is the slow January ramp-up, leaving only fully-staffed operational periods for analysis.

Insights: Your analysis now reflects normal operational capacity without seasonal anomalies. Metrics like average case duration and resource utilization represent typical performance rather than being skewed by holiday periods with skeleton staff coverage.

Example 3: Analyzing Mature System Operations

Scenario: You're analyzing a system that has been in production for years, but you want to exclude the most recent few days which might have incomplete data or ongoing cases. You want aggressive trimming at the start but gentle trimming at the end.

Settings:

  • Start Factor: 0.5
  • End Factor: 0.1

Result:

With Start Factor = 0.5, only days reaching at least 50% of mean activity are included from the start, aggressively cutting any slow periods. With End Factor = 0.1, almost all recent days are kept as long as they have at least 10% of mean activity. This gives you a mature operational period without cutting too much recent data.

Insights: The aggressive start trimming ensures you're analyzing a fully mature system, while the gentle end trimming preserves recent data for trend analysis. This balance is ideal when you have years of historical data and want to focus on recent stable operations.

Example 4: Conservative Trimming for Complete Analysis

Scenario: You want to include as much data as possible while removing only the most extreme low-activity periods at the beginning and end of your log. You're analyzing a process with naturally variable activity levels and don't want to lose valid operational data.

Settings:

  • Start Factor: 0.1
  • End Factor: 0.1

Result:

With both factors at 0.1, only days with less than 10% of mean daily events are excluded. If mean daily events is 1000, only days with fewer than 100 events are trimmed. This conservative approach removes only the most obvious warm-up and wind-down periods while preserving all normal operational periods, even those with lower activity.

Insights: This minimal trimming ensures you don't lose valuable data from naturally quieter periods like weekends or holidays that are still legitimate operational time. Use this when your process has high variability or when you need comprehensive historical coverage.

Output

This filter operates at the case level and uses date-based filtering:

  • Automatically calculates optimal start and end dates based on activity thresholds
  • Removes entire cases that fall outside the calculated date range
  • Preserves all cases within the active period unchanged
  • Does not modify event data, only filters cases by date
  • Returns original data if activity calculation is not possible

The resulting dataset focuses on steady-state operations, excluding low-activity startup and shutdown periods that could skew your process mining analysis.


This documentation is part of the mindzie Studio process mining platform.

An error has occurred. This application may no longer respond until reloaded. Reload ??