Duplicate Cases Filter
Overview
The Duplicate Cases filter identifies and selects cases that share identical values across multiple specified columns. This advanced case-level filter helps detect potential duplicate transactions, repeated submissions, or data quality issues where the same business entity appears multiple times in your process data.
Common Uses
- Identify duplicate invoice submissions in accounts payable
- Find repeated customer orders with identical details
- Detect potential fraud through duplicate transaction patterns
- Discover data migration issues with replicated records
- Identify cases that should have been consolidated
- Analyze patterns in recurring submissions or requests
Settings
Column Names: Select 2 to 5 columns to use for duplicate detection. Cases with identical values across ALL selected columns are considered duplicates. Only columns with comparable data types are available (String, Integer, DateTime, etc.).
How It Works:
- Groups cases by the values in all selected columns
- Identifies groups containing 2 or more cases
- Returns all cases that belong to any duplicate group
- Results are ordered by group size (largest duplicate groups first)
Supported Column Types: String, Int32, Int64, Double, Single, DateTime, TimeSpan
Examples
Example 1: Duplicate Invoice Detection
Scenario: You want to find potentially duplicate invoices in your accounts payable process by matching on vendor, amount, and invoice date.
Settings:
- Column Names: ["Vendor", "Invoice Amount", "Invoice Date"]
Result:
Cases where all three values match are grouped together. Group 1: 5 invoices from "Acme Corp" for $10,000 dated 2024-01-15. Group 2: 3 invoices from "Beta LLC" for $5,500 dated 2024-02-01. Single invoices with unique combinations are excluded.
Insights: Multiple invoices with identical vendor, amount, and date often indicate duplicate submissions that may result in duplicate payments. These require investigation and potentially blocking.
Example 2: Customer Order Duplicates
Scenario: Your order management process may have duplicate orders when customers submit multiple times. You want to find orders with matching customer, product, and quantity.
Settings:
- Column Names: ["Customer ID", "Product Code", "Order Quantity"]
Result:
Orders with identical customer, product, and quantity are flagged. This catches scenarios where a customer accidentally submitted the same order multiple times within a short period.
Insights: Duplicate orders increase fulfillment costs, create inventory issues, and lead to customer dissatisfaction when they receive unwanted duplicates.
Example 3: Transaction Pattern Analysis
Scenario: You're investigating potential fraud by looking for transactions with matching amounts, source accounts, and transaction times.
Settings:
- Column Names: ["Source Account", "Amount", "Transaction Hour"]
Result:
Transactions from the same account, with the same amount, during the same hour are grouped. This pattern might indicate automated fraud or system errors creating duplicate transactions.
Insights: Legitimate transactions rarely have identical characteristics across multiple fields. High duplicate rates warrant deeper investigation of specific accounts or time periods.
Example 4: Data Migration Verification
Scenario: After migrating data from a legacy system, you want to verify that records weren't duplicated during the migration process.
Settings:
- Column Names: ["Legacy ID", "Creation Date"]
Result:
Records with the same legacy identifier and creation date are flagged as potential migration duplicates. Ideally, this should return no results if the migration was clean.
Insights: Migration duplicates can cause reporting inaccuracies, compliance issues, and operational confusion. Identifying them allows for data cleanup before they cause downstream problems.
Example 5: Multiple Column Matching
Scenario: You want to find purchase orders that might be duplicates based on comprehensive matching: same vendor, same amount, same department, and same requested date.
Settings:
- Column Names: ["Vendor Name", "PO Amount", "Department", "Requested Date"]
Result:
Purchase orders matching on all four dimensions are identified. This strict matching reduces false positives while still catching true duplicates that slipped through procurement controls.
Insights: Using more columns makes matching stricter but more precise. Start with fewer columns if you're exploring, then add more to reduce false positives.
Output
This filter operates at the case level using multi-column grouping:
- Groups cases by values across all specified columns
- Returns only cases that appear in groups of 2 or more
- Results ordered by duplicate group size (largest first)
- Requires 2-5 columns for duplicate detection
- Columns must contain comparable data types
- Hidden columns and case ID columns are excluded
- Preserves all case and event attributes for matched cases
Use the Duplicate Cases filter to identify potential data quality issues, detect duplicate submissions, or find cases that may represent the same business transaction entered multiple times.
This documentation is part of the mindzie Studio process mining platform.