Small Ends Filter

Overview

The Small Ends filter automatically trims low-activity periods from the beginning and end of your event log. This intelligent case-level filter detects "warm-up" and "wind-down" periods where event volumes are significantly below average, then removes cases that fall entirely within these periods. It helps ensure your analysis focuses on periods of normal business activity rather than data collection artifacts or seasonal low points.

Common Uses

  • Remove data from system go-live periods before processes stabilized
  • Exclude end-of-extraction periods where data may be incomplete
  • Filter out holiday periods with reduced activity
  • Eliminate data quality issues from log boundaries
  • Focus analysis on periods with representative process behavior
  • Clean event logs for accurate throughput and performance metrics

Settings

Start Factor: A multiplier (0.0 to 1.0) applied to the mean daily event count. Days at the beginning of the log with event counts below (Start Factor x Mean Events Per Day) are trimmed. A lower value is more permissive (keeps more data); a higher value is more aggressive (removes more early data).

End Factor: A multiplier (0.0 to 1.0) applied to the mean daily event count. Days at the end of the log with event counts below (End Factor x Mean Events Per Day) are trimmed. Works the same as Start Factor but for the end of the log.

Default Values: Both factors default to 0.1 (10%), meaning days with less than 10% of the average daily activity are considered "small" and trimmed.

Examples

Example 1: Standard Cleanup

Scenario: Your event log starts with a system implementation period where few transactions occurred, and ends with incomplete data from the extraction date. You want to automatically trim these low-activity periods.

Settings:

  • Start Factor: 0.1
  • End Factor: 0.1

Result:

The filter calculates the average events per day across your entire log (e.g., 500 events/day). Days with fewer than 50 events are considered low-activity. If the first 5 days have 10, 25, 30, 45, and 80 events respectively, the filter starts from day 4 onward. Similarly, low-activity days at the end are trimmed.

Insights: This automatically handles data boundary issues without manual date selection, ensuring analysis covers only periods with representative activity levels.

Example 2: Aggressive Start Trimming

Scenario: Your process data includes a lengthy pilot period before full rollout. You want to aggressively trim early data while preserving end-of-log data.

Settings:

  • Start Factor: 0.3
  • End Factor: 0.1

Result:

Days at the start with fewer than 30% of mean daily activity are trimmed. This removes more of the pilot/ramp-up period. The end uses the standard 10% threshold, preserving more recent data.

Insights: Asymmetric factors let you handle situations where the start and end of your log have different characteristics. Pilot periods often have longer ramp-up than wind-down.

Example 3: Minimal Trimming

Scenario: You want to keep as much data as possible but still remove obvious data quality issues at log boundaries.

Settings:

  • Start Factor: 0.05
  • End Factor: 0.05

Result:

Only days with fewer than 5% of mean daily activity are trimmed. This catches only the most extreme low-activity periods while preserving the vast majority of data, including moderate seasonal variations.

Insights: Use low factors when your business has natural activity variation and you don't want to accidentally remove legitimate low-activity periods like weekends or seasonal dips.

Example 4: Removing Seasonal Boundaries

Scenario: Your log spans a full year but includes December (holiday season) at both the beginning and end due to the extraction timing. You want to focus on non-holiday periods.

Settings:

  • Start Factor: 0.4
  • End Factor: 0.4

Result:

Days with fewer than 40% of average activity are trimmed from both ends. This effectively removes holiday periods where activity dropped significantly below normal levels.

Insights: Higher factors help exclude seasonal variations that might skew analysis. However, be cautious not to remove too much valid data.

Example 5: New System Implementation

Scenario: Data was extracted from a new system that went live 3 months ago. The first month had very low activity as users were being trained and migrated.

Settings:

  • Start Factor: 0.5
  • End Factor: 0.1

Result:

The first portion of the log (implementation/training period with < 50% activity) is removed, while recent data is preserved with only minimal end trimming. This focuses analysis on the period after the system stabilized.

Insights: Implementation periods often show patterns that don't represent normal operations. Trimming them ensures your process metrics reflect actual operational performance.

How It Works

  1. Calculate Daily Frequencies: The filter counts events for each day in the log
  2. Compute Mean Activity: Calculates the average events per day across the entire period
  3. Find Start Boundary: Scans from the beginning to find the first day exceeding (Start Factor x Mean)
  4. Find End Boundary: Scans from the end to find the last day exceeding (End Factor x Mean)
  5. Apply Date Range: Filters to keep only cases within the calculated date boundaries

Output

This filter operates at the case level based on temporal boundaries:

  • Automatically calculates activity thresholds based on mean daily events
  • Identifies the first day of "normal" activity at the log start
  • Identifies the last day of "normal" activity at the log end
  • Returns cases contained within the calculated normal activity period
  • Preserves all case and event attributes for included cases
  • Factors must be between 0 and 1 (exclusive)

Use the Small Ends filter to automatically clean event log boundaries, ensuring your analysis reflects normal business operations rather than implementation phases, data extraction artifacts, or seasonal anomalies.


This documentation is part of the mindzie Studio process mining platform.