Duplicate Cases

Overview

The Duplicate Cases calculator identifies cases with identical values across selected attributes. This powerful data quality tool helps you find duplicate entries, system errors, and data integrity issues in your process data.

Common Uses

  • Find cases that have been entered more than once
  • Identify cases duplicated due to system errors
  • Detect potential double-payment scenarios
  • Find duplicate orders or invoices
  • Validate data migration integrity

Settings

Column Names: Select the list of attributes that will be used to identify duplicate cases. Cases with identical values for all selected attributes will be flagged as duplicates.

Max Rows: Specify the maximum number of rows to display in the output.

Example

Identifying Potentially Duplicate Invoices

Scenario: You want to identify invoices that may have been entered multiple times with the same vendor, amount, and date.

Settings:

  • Column Names: Vendor Name, Invoice Amount, Invoice Date
  • Max Rows: 100

Output:

The calculator displays two view options:

  1. Duplicate Cases View (default):

    • Shows one row per unique combination of the selected attributes
    • The last column displays the count of cases matching that combination
    • Entries with a count greater than 1 are potential duplicates
  2. Expanded View (select from top-right dropdown):

    • Shows all individual cases grouped by matching attribute values
    • Displays additional attributes not specified in settings
    • Reveals that cases in the same group may differ in other attributes (e.g., different Invoice IDs despite matching amounts)

Insights:

The expanded view is particularly useful because it shows that cases grouped as "duplicates" based on your selected attributes might actually be legitimate separate cases with different values in other columns. For example:

  • Same vendor, amount, and date might be two different invoices (check Invoice ID)
  • Legitimate duplicate payments vs. data entry errors
  • System-generated duplicate records vs. actual business duplicates

This helps you distinguish between true duplicates requiring correction and similar cases that are legitimately separate.


This documentation is part of the mindzie Studio process mining platform.

An error has occurred. This application may no longer respond until reloaded. Reload ??