Duplicate Cases in Log
Overview
The Duplicate Cases in Log enrichment creates physical copies of existing cases within your event log. This is a specialized administrator-only tool designed for testing purposes, allowing you to quickly expand your dataset by duplicating cases with modified case IDs. Each duplicated case retains all original events and attributes but receives a new unique identifier to distinguish it from the source case.
This enrichment is particularly useful when you need to test how your process mining analysis, filters, or dashboards perform with larger datasets, or when you want to create synthetic data for training and demonstration purposes.
Note: This enrichment is available only to administrators due to its significant impact on data volume and its intended use for testing and development scenarios rather than production analysis.
Common Uses
- Expand small test datasets to simulate production-scale data volumes
- Create stress-test scenarios for performance evaluation of dashboards and calculators
- Generate duplicate data for testing filter behavior with larger case counts
- Prepare demonstration datasets with sufficient volume for training purposes
- Test system performance and response times with increased data loads
- Validate that enrichments and calculations handle large datasets correctly
Settings
Number of Copies: Specify how many copies of each case to create. For example, setting this to 5 will result in each original case being duplicated 5 times, effectively multiplying your total case count by 6 (original plus 5 copies). The default value is 1, which doubles your dataset.
Example
Dataset Expansion for Performance Testing
Scenario: You have a process log with 100 cases and need to test how your dashboard performs with 1,000 cases before deploying to production.
Settings:
- Number of Copies: 9
Before: | Case ID | Activity | Timestamp | |---------|----------|-----------| | PO-001 | Create Order | 2024-01-15 09:00 | | PO-001 | Approve Order | 2024-01-15 10:00 | | PO-002 | Create Order | 2024-01-15 11:00 | | PO-002 | Approve Order | 2024-01-15 12:00 |
After (showing copies for PO-001): | Case ID | Activity | Timestamp | |---------|----------|-----------| | PO-001 | Create Order | 2024-01-15 09:00 | | PO-001 | Approve Order | 2024-01-15 10:00 | | PO-001_2 | Create Order | 2024-01-15 09:00 | | PO-001_2 | Approve Order | 2024-01-15 10:00 | | PO-001_3 | Create Order | 2024-01-15 09:00 | | ... | ... | ... | | PO-001_10 | Create Order | 2024-01-15 09:00 | | PO-001_10 | Approve Order | 2024-01-15 10:00 |
Result: Your 100-case dataset now contains 1,000 cases, allowing you to test performance characteristics at scale.
Insights: After duplicating cases, you can identify performance bottlenecks in calculators and identify which visualizations need optimization before deploying with production data volumes.
How It Works
- Case Iteration: The enrichment iterates through all existing cases in your event log
- Case Duplication: For each original case, it creates the specified number of copies
- ID Generation: Each copy receives a unique case ID by appending "_n" to the original ID (where n is the copy number starting from 2)
- Event Copying: All events from the original case are duplicated to the new case, preserving timestamps and all event attributes
- Attribute Preservation: All case-level attributes (except calculated columns) are copied to the new cases
- Log Finalization: The event log is finalized with the expanded case and event tables
Output
The enrichment modifies the event log by:
- New Cases: Creates (NumberOfCopies * original case count) additional cases
- Case IDs: New cases have IDs in the format "OriginalCaseId_n" where n is the copy index (2, 3, 4, etc.)
- Events: Each new case contains exact copies of all events from the original case
- Attributes: All case and event attributes are preserved on duplicated cases and events
Important Notes:
- This enrichment does not create any new attributes
- The original cases remain unchanged
- Calculated columns are not copied (they will be recalculated based on the data)
- Hidden columns are not copied to new events
Best Practices
- Use this enrichment in development or test environments only
- Be mindful of data volume - duplicating large datasets can significantly increase processing time
- Remove the enrichment or save a separate copy of your notebook after testing
- Consider the impact on calculated metrics that may be affected by duplicate data patterns
This documentation is part of the mindzie Studio process mining platform.