Find Duplicate Invoices

Overview

The Find Duplicate Invoices enrichment automatically detects potential duplicate invoices in your accounts payable process by analyzing invoice attributes such as vendor, invoice number, amount, and date. This powerful fraud detection and data quality tool identifies exact matches and near-matches that may indicate duplicate payments, data entry errors, or intentional fraud.

The enrichment uses intelligent matching algorithms including exact vendor-invoice number matching and optional Levenshtein distance comparison to detect similar invoice numbers that might indicate typos or intentional variations. Cases are grouped by their match patterns, with each group containing the match type, group count, and total value for analysis.

Note: This enrichment is available only to administrators due to its specialized use in accounts payable audit and fraud detection scenarios.

Common Uses

  • Detect duplicate invoice submissions from vendors
  • Identify potential double payment risks
  • Find data entry errors where invoice numbers were mistyped
  • Discover fraudulent duplicate invoices with slight variations
  • Audit accounts payable processes for payment integrity
  • Prepare for financial audits by identifying potential duplicates
  • Clean up invoice data quality issues

Settings

Invoice Number Column Name: Select the column containing the invoice number. This is the primary identifier used for matching, combined with the vendor name.

Vendor Column Name: Select the column containing the vendor or supplier name. Invoices are grouped by vendor before comparing invoice numbers to find duplicates.

Invoice Amount Column Name: Select the column containing the invoice amount. This helps identify match types (exact vs. amount changed) and calculates total group value.

Invoice Date Column Name: Select the column containing the invoice date. Used to identify match types where the date may have changed between duplicate submissions.

Due Date Column Name (Optional): Select the column containing the payment due date, if available. Helps identify patterns where due dates were changed on resubmitted invoices.

Use Similar Invoice Numbers: When enabled, the enrichment uses Levenshtein distance calculation to find invoices with similar (but not identical) invoice numbers. This catches typos and intentional variations where invoice numbers differ by 1-2 characters. Invoices with the same vendor and amount but similar invoice numbers are flagged as potential duplicates.

Filter List: Optionally apply filters to limit which cases are analyzed. This allows you to focus duplicate detection on specific vendor categories, time periods, or other criteria.

Example

Detecting Exact Duplicate Invoices

Scenario: An accounts payable department wants to identify invoices submitted multiple times with identical vendor and invoice number.

Settings:

  • Invoice Number Column Name: "Invoice Number"
  • Vendor Column Name: "Vendor Name"
  • Invoice Amount Column Name: "Invoice Amount"
  • Invoice Date Column Name: "Invoice Date"
  • Use Similar Invoice Numbers: No

Before (Case Attributes): | Case ID | Vendor Name | Invoice Number | Invoice Amount | Invoice Date | |---------|-------------|----------------|----------------|--------------| | INV-001 | Acme Corp | A12345 | $5,000 | 2024-01-10 | | INV-002 | Acme Corp | A12345 | $5,000 | 2024-01-10 | | INV-003 | Acme Corp | A12345 | $5,200 | 2024-01-15 | | INV-004 | Beta Inc | B9876 | $3,000 | 2024-01-12 |

After (New Attributes Added): | Case ID | Duplicate Group | Group Count | Group Value | Match Type | |---------|-----------------|-------------|-------------|------------| | INV-001 | Acme Corp_A12345 | 3 | $15,200 | Exact Match | | INV-002 | Acme Corp_A12345 | 3 | $15,200 | Exact Match | | INV-003 | Acme Corp_A12345 | 3 | $15,200 | Amount Changed | | INV-004 | (none) | (none) | (none) | (none) |

Insights: Three invoices from Acme Corp share the same invoice number A12345. Two are exact duplicates (same amount and date), while the third has a different amount - indicating either a correction or fraudulent duplicate.

Finding Invoices with Similar Numbers

Scenario: You suspect some duplicate invoices may have been submitted with slight typos in the invoice number to evade detection.

Settings:

  • Invoice Number Column Name: "Invoice Number"
  • Vendor Column Name: "Vendor Name"
  • Invoice Amount Column Name: "Invoice Amount"
  • Invoice Date Column Name: "Invoice Date"
  • Use Similar Invoice Numbers: Yes

Example Detection: | Case ID | Vendor Name | Invoice Number | Invoice Amount | Match Type | |---------|-------------|----------------|----------------|------------| | INV-010 | Acme Corp | INV-2024-001 | $10,000 | Similar Invoice Number | | INV-011 | Acme Corp | INV-2024-0O1 | $10,000 | Similar Invoice Number | | INV-012 | Acme Corp | INV-2024-OO1 | $10,000 | Similar Invoice Number |

Insights: The enrichment detects that these three invoices have near-identical invoice numbers (the difference being the letter "O" vs digit "0"), same vendor, and same amount. This pattern is highly suspicious and warrants investigation.

Match Types

The enrichment categorizes duplicate groups by match type:

  • Exact Match: Vendor, invoice number, amount, and invoice date are all identical
  • Amount Changed: Same vendor and invoice number, but different invoice amount
  • Invoice Date Changed: Same vendor and invoice number, but different invoice date
  • Due Date Changed: Same vendor and invoice number, but different due date
  • Similar Invoice Number: Same vendor and amount, invoice numbers differ by 1-2 characters

Output

The enrichment creates multiple case-level attributes:

Duplicate Group: A string combining vendor name and invoice number that identifies which duplicate group a case belongs to. Format: "VendorName_InvoiceNumber"

Group Count: The number of cases in the duplicate group. Cases with count > 1 are potential duplicates.

Group Value: The total invoice value across all cases in the duplicate group. Helps prioritize high-value duplicate groups for investigation.

Match Type: Indicates how the duplicates match (Exact Match, Amount Changed, Invoice Date Changed, Due Date Changed, or Similar Invoice Number).

Investigation Attributes: Additional columns for tracking investigation status:

  • Is Not A Duplicate (Boolean) - Mark false positives
  • Is Resolved (Boolean) - Mark investigated cases
  • Resolved By (String) - Who resolved the case
  • Resolved Time (DateTime) - When it was resolved
  • Outcome (String) - Investigation outcome
  • Comments (String) - Investigation notes

Best Practices

  • Run on a regular schedule (weekly or monthly) for ongoing monitoring
  • Focus initial investigation on high-value duplicate groups (sort by Group Value)
  • Review "Amount Changed" matches carefully - may indicate corrected invoices or fraud
  • "Similar Invoice Number" matches require more investigation as they may be false positives
  • Use the Filter List to exclude vendor categories known to have legitimate duplicate patterns
  • Document investigation outcomes in the provided tracking columns

See Also


This documentation is part of the mindzie Studio process mining platform.