Duplicate Cases Filter

Overview

The Duplicate Cases filter identifies and selects cases that share identical values across multiple specified columns. This advanced case-level filter helps detect potential duplicate transactions, repeated submissions, or data quality issues where the same business entity appears multiple times in your process data.

Common Uses

Identify duplicate invoice submissions in accounts payable
Find repeated customer orders with identical details
Detect potential fraud through duplicate transaction patterns
Discover data migration issues with replicated records
Identify cases that should have been consolidated
Analyze patterns in recurring submissions or requests

Settings

Column Names: Select 2 to 5 columns to use for duplicate detection. Cases with identical values across ALL selected columns are considered duplicates. Only columns with comparable data types are available (String, Integer, DateTime, etc.).

How It Works:

Groups cases by the values in all selected columns
Identifies groups containing 2 or more cases
Returns all cases that belong to any duplicate group
Results are ordered by group size (largest duplicate groups first)

Supported Column Types: String, Int32, Int64, Double, Single, DateTime, TimeSpan

Examples

Example 1: Duplicate Invoice Detection

Scenario: You want to find potentially duplicate invoices in your accounts payable process by matching on vendor, amount, and invoice date.

Settings:

Column Names: ["Vendor", "Invoice Amount", "Invoice Date"]

Result:

Cases where all three values match are grouped together. Group 1: 5 invoices from "Acme Corp" for $10,000 dated 2024-01-15. Group 2: 3 invoices from "Beta LLC" for $5,500 dated 2024-02-01. Single invoices with unique combinations are excluded.

Insights: Multiple invoices with identical vendor, amount, and date often indicate duplicate submissions that may result in duplicate payments. These require investigation and potentially blocking.

Example 2: Customer Order Duplicates

Scenario: Your order management process may have duplicate orders when customers submit multiple times. You want to find orders with matching customer, product, and quantity.

Settings:

Column Names: ["Customer ID", "Product Code", "Order Quantity"]

Result:

Orders with identical customer, product, and quantity are flagged. This catches scenarios where a customer accidentally submitted the same order multiple times within a short period.

Insights: Duplicate orders increase fulfillment costs, create inventory issues, and lead to customer dissatisfaction when they receive unwanted duplicates.

Example 3: Transaction Pattern Analysis

Scenario: You're investigating potential fraud by looking for transactions with matching amounts, source accounts, and transaction times.

Settings:

Column Names: ["Source Account", "Amount", "Transaction Hour"]

Result:

Transactions from the same account, with the same amount, during the same hour are grouped. This pattern might indicate automated fraud or system errors creating duplicate transactions.

Insights: Legitimate transactions rarely have identical characteristics across multiple fields. High duplicate rates warrant deeper investigation of specific accounts or time periods.

Example 4: Data Migration Verification

Scenario: After migrating data from a legacy system, you want to verify that records weren't duplicated during the migration process.

Settings:

Column Names: ["Legacy ID", "Creation Date"]

Result:

Records with the same legacy identifier and creation date are flagged as potential migration duplicates. Ideally, this should return no results if the migration was clean.

Insights: Migration duplicates can cause reporting inaccuracies, compliance issues, and operational confusion. Identifying them allows for data cleanup before they cause downstream problems.

Example 5: Multiple Column Matching

Scenario: You want to find purchase orders that might be duplicates based on comprehensive matching: same vendor, same amount, same department, and same requested date.

Settings:

Column Names: ["Vendor Name", "PO Amount", "Department", "Requested Date"]

Result:

Purchase orders matching on all four dimensions are identified. This strict matching reduces false positives while still catching true duplicates that slipped through procurement controls.

Insights: Using more columns makes matching stricter but more precise. Start with fewer columns if you're exploring, then add more to reduce false positives.

Output

This filter operates at the case level using multi-column grouping:

Groups cases by values across all specified columns
Returns only cases that appear in groups of 2 or more
Results ordered by duplicate group size (largest first)
Requires 2-5 columns for duplicate detection
Columns must contain comparable data types
Hidden columns and case ID columns are excluded
Preserves all case and event attributes for matched cases

Use the Duplicate Cases filter to identify potential data quality issues, detect duplicate submissions, or find cases that may represent the same business transaction entered multiple times.

This documentation is part of the mindzie Studio process mining platform.