Duplicate Cases in Log

Overview

The Duplicate Cases in Log enrichment creates physical copies of existing cases within your event log. This is a specialized administrator-only tool designed for testing purposes, allowing you to quickly expand your dataset by duplicating cases with modified case IDs. Each duplicated case retains all original events and attributes but receives a new unique identifier to distinguish it from the source case.

This enrichment is particularly useful when you need to test how your process mining analysis, filters, or dashboards perform with larger datasets, or when you want to create synthetic data for training and demonstration purposes.

Note: This enrichment is available only to administrators due to its significant impact on data volume and its intended use for testing and development scenarios rather than production analysis.

Common Uses

Expand small test datasets to simulate production-scale data volumes
Create stress-test scenarios for performance evaluation of dashboards and calculators
Generate duplicate data for testing filter behavior with larger case counts
Prepare demonstration datasets with sufficient volume for training purposes
Test system performance and response times with increased data loads
Validate that enrichments and calculations handle large datasets correctly

Settings

Number of Copies: Specify how many copies of each case to create. For example, setting this to 5 will result in each original case being duplicated 5 times, effectively multiplying your total case count by 6 (original plus 5 copies). The default value is 1, which doubles your dataset.

Example

Dataset Expansion for Performance Testing

Scenario: You have a process log with 100 cases and need to test how your dashboard performs with 1,000 cases before deploying to production.

Settings:

Number of Copies: 9

Before: | Case ID | Activity | Timestamp | |---------|----------|-----------| | PO-001 | Create Order | 2024-01-15 09:00 | | PO-001 | Approve Order | 2024-01-15 10:00 | | PO-002 | Create Order | 2024-01-15 11:00 | | PO-002 | Approve Order | 2024-01-15 12:00 |

After (showing copies for PO-001): | Case ID | Activity | Timestamp | |---------|----------|-----------| | PO-001 | Create Order | 2024-01-15 09:00 | | PO-001 | Approve Order | 2024-01-15 10:00 | | PO-001_2 | Create Order | 2024-01-15 09:00 | | PO-001_2 | Approve Order | 2024-01-15 10:00 | | PO-001_3 | Create Order | 2024-01-15 09:00 | | ... | ... | ... | | PO-001_10 | Create Order | 2024-01-15 09:00 | | PO-001_10 | Approve Order | 2024-01-15 10:00 |

Result: Your 100-case dataset now contains 1,000 cases, allowing you to test performance characteristics at scale.

Insights: After duplicating cases, you can identify performance bottlenecks in calculators and identify which visualizations need optimization before deploying with production data volumes.

How It Works

Case Iteration: The enrichment iterates through all existing cases in your event log
Case Duplication: For each original case, it creates the specified number of copies
ID Generation: Each copy receives a unique case ID by appending "_n" to the original ID (where n is the copy number starting from 2)
Event Copying: All events from the original case are duplicated to the new case, preserving timestamps and all event attributes
Attribute Preservation: All case-level attributes (except calculated columns) are copied to the new cases
Log Finalization: The event log is finalized with the expanded case and event tables

Output

The enrichment modifies the event log by:

New Cases: Creates (NumberOfCopies * original case count) additional cases
Case IDs: New cases have IDs in the format "OriginalCaseId_n" where n is the copy index (2, 3, 4, etc.)
Events: Each new case contains exact copies of all events from the original case
Attributes: All case and event attributes are preserved on duplicated cases and events

Important Notes:

This enrichment does not create any new attributes
The original cases remain unchanged
Calculated columns are not copied (they will be recalculated based on the data)
Hidden columns are not copied to new events

Best Practices

Use this enrichment in development or test environments only
Be mindful of data volume - duplicating large datasets can significantly increase processing time
Remove the enrichment or save a separate copy of your notebook after testing
Consider the impact on calculated metrics that may be affected by duplicate data patterns

This documentation is part of the mindzie Studio process mining platform.