Replace Text

Overview

The Replace Text enrichment is a powerful data transformation operator that performs find-and-replace operations on text attributes throughout your dataset. This enrichment enables systematic text substitution across case and event attributes, allowing you to standardize terminology, correct systematic errors, or transform data formats consistently. Whether you need to replace outdated product codes, standardize department names, or correct recurring typos in your process data, this enrichment provides a reliable and efficient solution for bulk text modifications.

Unlike manual find-and-replace operations that risk missing occurrences or introducing inconsistencies, this enrichment processes every instance of the specified text pattern across all selected attributes. The enrichment supports both case-sensitive and case-insensitive replacement modes, giving you precise control over how text matching occurs. This flexibility is essential when dealing with data from multiple sources where capitalization conventions may vary, such as when integrating data from different ERP systems or regional offices.

The Replace Text enrichment operates directly on your dataset's string attributes, modifying values in-place to maintain data relationships and integrity. This approach ensures that all downstream analyses, filters, and calculations automatically benefit from the standardized text values without requiring additional configuration or data mapping steps.

Common Uses

  • Standardize varying department or location names across different systems (e.g., replace "NY Office", "New York", "NYC" with a standard "New York Office")
  • Update obsolete product codes or SKUs after system migrations or rebranding initiatives
  • Correct systematic spelling errors or abbreviations in activity names for clearer process visualization
  • Replace sensitive information with anonymized values for compliance with data privacy regulations
  • Standardize date or time formats in text fields by replacing separators or formatting characters
  • Transform status codes or abbreviations into readable business terms for better reporting
  • Harmonize vendor or customer names that have multiple variations in the source data

Settings

Attribute Name: Select the text attribute where you want to perform the replacement operation. The dropdown displays all available string attributes from both case-level and event-level data. Only text (string) type attributes that are not hidden or calculated fields are available for selection. Choose the specific attribute containing the text values you need to modify.

Original Text: Enter the exact text string you want to find and replace within the selected attribute. This is the search pattern that will be matched in your data. The text must match exactly (considering the Ignore Case setting) for replacement to occur. Leave this field empty if you want to replace empty strings with a specific value. Common examples include outdated codes, misspellings, or inconsistent terminology.

New Text: Specify the replacement text that will substitute all occurrences of the Original Text. This can be any text value, including an empty string if you want to remove the original text entirely. The new text will replace every matched occurrence within the attribute values. Consider the impact on downstream processes and ensure the new text maintains data integrity and meaning.

Ignore Case: Enable this option to perform case-insensitive matching when searching for the Original Text. When checked, the enrichment will match text regardless of uppercase or lowercase differences (e.g., "approved", "Approved", and "APPROVED" would all be matched). When unchecked, only exact case matches will be replaced. This setting is particularly useful when dealing with inconsistent capitalization from manual data entry or different source systems.

Examples

Example 1: Standardizing Department Names in Purchase Orders

Scenario: A multinational corporation needs to standardize department names in their purchase order system where "Information Technology", "IT Dept", "I.T.", and "InfoTech" all refer to the same department, causing fragmented spend analysis and approval routing issues.

Settings:

  • Attribute Name: Department
  • Original Text: IT Dept
  • New Text: Information Technology
  • Ignore Case: Checked

Output: The enrichment replaces all occurrences of "IT Dept" (and variations like "it dept", "It Dept") with "Information Technology" in the Department attribute. After running multiple passes with different original text values ("I.T.", "InfoTech", etc.), all department references are standardized.

Before: | Case ID | Department | Amount | |---------|------------|--------| | PO-001 | IT Dept | $5,000 | | PO-002 | Information Technology | $3,000 | | PO-003 | it dept | $2,500 | | PO-004 | I.T. | $4,000 |

After: | Case ID | Department | Amount | |---------|------------|--------| | PO-001 | Information Technology | $5,000 | | PO-002 | Information Technology | $3,000 | | PO-003 | Information Technology | $2,500 | | PO-004 | Information Technology | $4,000 |

Insights: After standardization, the company discovered that Information Technology actually accounted for $14,500 in purchase orders rather than appearing as four separate departments with unclear spending patterns. This enabled proper budget tracking and revealed opportunities for volume discounts with vendors.

Example 2: Updating Product Codes After System Migration

Scenario: A retail company migrated to a new inventory system with updated product coding standards, requiring all old format codes (e.g., "PROD-") to be replaced with new format codes (e.g., "SKU-") across historical order data for accurate inventory reconciliation.

Settings:

  • Attribute Name: Product_Code
  • Original Text: PROD-
  • New Text: SKU-
  • Ignore Case: Unchecked

Output: All product codes beginning with "PROD-" are updated to begin with "SKU-", maintaining the numeric portions while updating the prefix to match the new system format.

Before: | Case ID | Product_Code | Quantity | Order_Date | |---------|--------------|----------|------------| | ORD-501 | PROD-12345 | 10 | 2024-01-15 | | ORD-502 | PROD-67890 | 5 | 2024-01-16 | | ORD-503 | prod-12345 | 3 | 2024-01-16 | | ORD-504 | PROD-54321 | 8 | 2024-01-17 |

After: | Case ID | Product_Code | Quantity | Order_Date | |---------|--------------|----------|------------| | ORD-501 | SKU-12345 | 10 | 2024-01-15 | | ORD-502 | SKU-67890 | 5 | 2024-01-16 | | ORD-503 | prod-12345 | 3 | 2024-01-16 | | ORD-504 | SKU-54321 | 8 | 2024-01-17 |

Insights: Note that "prod-12345" was not replaced because the search was case-sensitive. This helped identify 47 orders with incorrect lowercase product codes that required separate data quality investigation, revealing a specific data entry issue with one warehouse location.

Example 3: Anonymizing Customer Names for Compliance

Scenario: A healthcare provider needs to anonymize patient names in their appointment scheduling process data for research purposes while maintaining the ability to distinguish between different patients.

Settings:

  • Attribute Name: Patient_Name
  • Original Text: Smith, John
  • New Text: Patient_001
  • Ignore Case: Unchecked

Output: Specific patient names are replaced with anonymized identifiers, allowing process analysis while protecting patient privacy according to HIPAA requirements.

Before: | Case ID | Patient_Name | Appointment_Type | Department | |---------|--------------|------------------|------------| | APT-101 | Smith, John | Initial Consultation | Cardiology | | APT-102 | Jones, Mary | Follow-up | Orthopedics | | APT-103 | Smith, John | Test Results | Cardiology | | APT-104 | Brown, David | Emergency | Emergency |

After (first replacement): | Case ID | Patient_Name | Appointment_Type | Department | |---------|--------------|------------------|------------| | APT-101 | Patient_001 | Initial Consultation | Cardiology | | APT-102 | Jones, Mary | Follow-up | Orthopedics | | APT-103 | Patient_001 | Test Results | Cardiology | | APT-104 | Brown, David | Emergency | Emergency |

Insights: The anonymization process preserved the relationship between appointments for the same patient while removing personally identifiable information. Process mining revealed that patients with initial cardiology consultations had a 73% rate of follow-up appointments within 30 days.

Example 4: Correcting Activity Name Typos in Manufacturing

Scenario: A manufacturing plant's MES system has inconsistent activity naming where operators sometimes type "Quaility Check" instead of "Quality Check", causing process conformance checking to incorrectly flag deviations.

Settings:

  • Attribute Name: Activity
  • Original Text: Quaility Check
  • New Text: Quality Check
  • Ignore Case: Checked

Output: All misspelled instances of quality check activities are corrected, regardless of capitalization variations, ensuring accurate process discovery and conformance analysis.

Event Data Before: | Case ID | Activity | Timestamp | Resource | |---------|----------|-----------|----------| | WO-801 | Material Receipt | 2024-02-01 08:00 | Warehouse | | WO-801 | Quaility Check | 2024-02-01 09:15 | QC Team | | WO-801 | Assembly Start | 2024-02-01 10:00 | Line 1 | | WO-802 | Material Receipt | 2024-02-01 08:30 | Warehouse | | WO-802 | QUAILITY CHECK | 2024-02-01 09:45 | QC Team |

Event Data After: | Case ID | Activity | Timestamp | Resource | |---------|----------|-----------|----------| | WO-801 | Material Receipt | 2024-02-01 08:00 | Warehouse | | WO-801 | Quality Check | 2024-02-01 09:15 | QC Team | | WO-801 | Assembly Start | 2024-02-01 10:00 | Line 1 | | WO-802 | Material Receipt | 2024-02-01 08:30 | Warehouse | | WO-802 | Quality Check | 2024-02-01 09:45 | QC Team |

Insights: After correction, conformance checking showed that 98% of work orders properly followed the standard process with quality checks, rather than the 67% shown before the correction. This revealed that the perceived process compliance issue was actually a data quality problem.

Example 5: Standardizing Status Codes Across Systems

Scenario: A logistics company integrates shipment data from three different carrier systems, each using different codes for delivery status ("DLVRD", "Delivered", "COMPLETE"), requiring standardization for unified tracking dashboards.

Settings:

  • Attribute Name: Delivery_Status
  • Original Text: DLVRD
  • New Text: Delivered
  • Ignore Case: Unchecked

Output: Carrier-specific status codes are replaced with standardized business terms, enabling consistent status reporting across all shipment sources.

Before: | Case ID | Carrier | Delivery_Status | Delivery_Date | |---------|---------|-----------------|---------------| | SHP-901 | CarrierA | DLVRD | 2024-03-01 | | SHP-902 | CarrierB | Delivered | 2024-03-01 | | SHP-903 | CarrierC | COMPLETE | 2024-03-01 | | SHP-904 | CarrierA | DLVRD | 2024-03-02 |

After (first replacement): | Case ID | Carrier | Delivery_Status | Delivery_Date | |---------|---------|-----------------|---------------| | SHP-901 | CarrierA | Delivered | 2024-03-01 | | SHP-902 | CarrierB | Delivered | 2024-03-01 | | SHP-903 | CarrierC | COMPLETE | 2024-03-01 | | SHP-904 | CarrierA | Delivered | 2024-03-02 |

Insights: After running additional replacements for "COMPLETE" and other variations, the logistics team could accurately report that 94% of shipments were delivered on time, compared to fragmented reporting by carrier system that obscured overall performance metrics.

Output

The Replace Text enrichment modifies the selected attribute values directly within your dataset, performing in-place replacement of the specified text patterns. The enrichment maintains the original attribute structure and data type while updating only the text content that matches your search criteria.

For case attributes, the replacement occurs once per case, affecting the attribute value associated with each case. For event attributes, the replacement processes every event in your dataset, potentially updating multiple occurrences within the same case. The enrichment preserves null values and only processes non-null string values within the selected attribute.

After execution, the modified attribute retains its original name and position in your dataset but contains the updated text values. These changes immediately affect all dependent calculations, filters, and visualizations that reference the modified attribute. The enrichment does not create new attributes or backup columns - it directly transforms the existing data based on your specifications.

The replacement operation is case-sensitive by default but can be configured for case-insensitive matching using the Ignore Case setting. When performing case-insensitive replacements, the original casing of non-matched portions of the text is preserved, while the matched portion is replaced entirely with the New Text value as specified.

See Also

  • Trim Text - Remove leading and trailing whitespace from text attributes
  • Text Start - Extract a specified number of characters from the beginning of text values
  • Text End - Extract a specified number of characters from the end of text values
  • Group Attribute Values - Combine multiple attribute values into standardized categories
  • Categorize Attribute Values - Create categories based on attribute value ranges or patterns
  • Concatenate Text Attributes - Combine multiple text attributes into a single field

This documentation is part of the mindzie Studio process mining platform.

An error has occurred. This application may no longer respond until reloaded. Reload ??