Anonymize

Overview

The Anonymize enrichment provides comprehensive data privacy protection by systematically replacing sensitive text attribute values with anonymized placeholders while preserving the analytical value of your process data. This critical data protection operator ensures compliance with privacy regulations such as GDPR, HIPAA, and other data protection standards by replacing personally identifiable information (PII), confidential business data, and other sensitive text values with consistent anonymous identifiers. The enrichment maintains data relationships and patterns essential for process analysis while removing the actual sensitive content, making it safe to share datasets with external parties, use in demonstrations, or store in less secure environments.

The Anonymize enrichment works by grouping identical attribute values together and replacing each unique value with a standardized anonymous identifier in the format "AttributeName 0001", "AttributeName 0002", etc. This approach ensures that all instances of the same original value receive the same anonymous identifier, preserving data consistency and enabling meaningful process analysis without exposing sensitive information. The enrichment can operate on all text attributes automatically or target specific attributes based on your privacy requirements, providing flexible control over what data gets anonymized while leaving non-sensitive attributes intact for reference.

Common Uses

  • Protect personally identifiable information (PII) such as customer names, employee IDs, email addresses, and social security numbers
  • Anonymize financial data including account numbers, credit card information, and transaction references before sharing with third parties
  • Prepare datasets for external consultants or vendors while maintaining data confidentiality
  • Create demonstration datasets from production data without exposing sensitive business information
  • Ensure GDPR compliance by anonymizing personal data in process mining projects
  • Protect patient information in healthcare process analysis while maintaining case relationships
  • Anonymize supplier and vendor names in procurement process analysis for competitive confidentiality

Settings

Attribute Names (Optional): Select specific text attributes to anonymize. When left empty, the enrichment automatically anonymizes all text attributes in both case and event tables, excluding system attributes like Case ID and Activity names. This selective approach allows you to anonymize only sensitive attributes while preserving non-sensitive reference data. The dropdown shows all available text attributes from your dataset. You can select multiple attributes by clicking on each one you want to anonymize. Only string/text type attributes are available for selection, as numeric and date attributes typically don't contain personally identifiable information and are essential for process analysis.

Examples

Example 1: GDPR-Compliant Customer Service Process

Scenario: A telecommunications company needs to share their customer service process data with an external consulting firm for process optimization analysis, but must protect customer personal information to comply with GDPR regulations.

Settings:

  • Attribute Names: Customer_Name, Phone_Number, Email_Address, Account_Number, Address, Credit_Card_Last4

Output: The enrichment replaces sensitive customer data with anonymous identifiers:

  • Customer_Name: "John Smith" becomes "Customer_Name 0001"
  • Customer_Name: "Jane Doe" becomes "Customer_Name 0002"
  • Phone_Number: "+1-555-0123" becomes "Phone_Number 0001"
  • Email_Address: "john.smith@example.com" becomes "Email_Address 0001"
  • Account_Number: "ACC-789456123" becomes "Account_Number 0001"

All instances of "John Smith" across different cases are consistently replaced with "Customer_Name 0001", maintaining data relationships for analysis.

Insights: The consulting firm can analyze customer service patterns, identify bottlenecks, and recommend improvements without ever accessing actual customer personal information, ensuring full GDPR compliance while enabling meaningful process insights.

Example 2: Healthcare Patient Journey Analysis

Scenario: A hospital needs to analyze patient treatment pathways across departments but must protect patient health information (PHI) to comply with HIPAA regulations before the data can be used for research purposes.

Settings:

  • Attribute Names: Patient_Name, Medical_Record_Number, SSN, Insurance_ID, Physician_Name, Diagnosis_Description, Medication_Names

Output: Sensitive medical information is systematically anonymized:

  • Patient_Name: "Robert Johnson" becomes "Patient_Name 0001"
  • Medical_Record_Number: "MRN-2024-45678" becomes "Medical_Record_Number 0001"
  • SSN: "123-45-6789" becomes "SSN 0001"
  • Physician_Name: "Dr. Sarah Williams" becomes "Physician_Name 0001"
  • Diagnosis_Description: "Type 2 Diabetes" becomes "Diagnosis_Description 0001"

The same diagnosis appearing in multiple cases maintains the same anonymous identifier, allowing pattern analysis.

Insights: Researchers can study treatment patterns, analyze patient flow between departments, and identify care optimization opportunities while maintaining complete patient privacy and HIPAA compliance.

Example 3: Financial Audit Process Anonymization

Scenario: An accounting firm needs to demonstrate their audit process methodology to potential clients using real audit data, but must protect sensitive financial account information and company names.

Settings:

  • Attribute Names: Company_Name, Account_Number, Bank_Name, Auditor_Name, Contact_Person, Tax_ID

Output: Financial and business identifiers are replaced with anonymous codes:

  • Company_Name: "Acme Corporation" becomes "Company_Name 0001"
  • Account_Number: "4532-1234-5678-9012" becomes "Account_Number 0001"
  • Bank_Name: "First National Bank" becomes "Bank_Name 0001"
  • Auditor_Name: "Michael Chen" becomes "Auditor_Name 0001"

All references to "Acme Corporation" across different audit steps receive the same identifier "Company_Name 0001".

Insights: The firm can showcase their audit process efficiency, demonstrate compliance checking procedures, and highlight their methodology without revealing any client confidential information.

Example 4: Supply Chain Data Sharing

Scenario: A manufacturing company wants to share supply chain process data with a logistics optimization vendor but needs to protect supplier relationships and pricing information from potential competitors.

Settings:

  • Attribute Names: Supplier_Name, Supplier_Contact, PO_Number, Part_Number, Supplier_Location

Output: Supplier and component information is anonymized while preserving relationships:

  • Supplier_Name: "TechParts Asia Ltd" becomes "Supplier_Name 0001"
  • Supplier_Contact: "Lisa Wang" becomes "Supplier_Contact 0001"
  • PO_Number: "PO-2024-789456" becomes "PO_Number 0001"
  • Part_Number: "CPU-X7-2024-ADV" becomes "Part_Number 0001"

The same supplier appearing in multiple purchase orders maintains consistent anonymization.

Insights: The logistics vendor can analyze supply chain patterns, identify delivery bottlenecks, and optimize routing without accessing competitive supplier information or pricing details.

Example 5: Employee Performance Review Process

Scenario: An HR consulting firm is helping optimize a performance review process and needs access to process data without seeing actual employee names, IDs, or salary information.

Settings:

  • Attribute Names: (Leave empty to anonymize all text attributes automatically)

Output: All text attributes are automatically anonymized:

  • Employee_Name: "Jennifer Brown" becomes "Employee_Name 0001"
  • Manager_Name: "David Lee" becomes "Manager_Name 0001"
  • Department: "Sales West" becomes "Department 0001"
  • Job_Title: "Senior Account Manager" becomes "Job_Title 0001"
  • Review_Comments: "Exceeds expectations" becomes "Review_Comments 0001"
  • Employee_ID: "EMP-45678" becomes "Employee_ID 0001"

Numeric attributes like Review_Score and Years_of_Service remain unchanged for analysis.

Insights: The consulting firm can analyze review cycle times, identify process inefficiencies, and recommend improvements while maintaining complete employee confidentiality and privacy.

Output

The Anonymize enrichment modifies existing text attribute values in-place, replacing sensitive content with anonymous identifiers while preserving the attribute structure and data types. The anonymization follows a consistent pattern that maintains data relationships essential for process mining analysis.

Anonymization Format: Each unique value within an attribute is replaced with the pattern "[AttributeName] [4-digit-number]", where the number is assigned sequentially starting from 0001. For example, the first unique value in the "Customer_Name" attribute becomes "Customer_Name 0001", the second unique value becomes "Customer_Name 0002", and so on.

Consistency Guarantee: The enrichment ensures that all instances of the same original value receive the same anonymous identifier across all cases and events. This consistency preservation is critical for maintaining data relationships and enabling meaningful process analysis. If "John Smith" appears in 100 different cases, all 100 instances will be replaced with the same identifier "Customer_Name 0001".

Scope of Anonymization: When no specific attributes are selected, the enrichment automatically anonymizes all text (string) attributes in both the case table and event table, with the following exceptions:

  • Case ID attributes are preserved to maintain case identity
  • Activity names are preserved to maintain process flow visibility
  • Calculated attributes are skipped as they don't contain source sensitive data
  • Hidden attributes are skipped
  • Non-text attributes (numbers, dates, booleans) remain unchanged

Irreversibility: The anonymization process is irreversible within mindzieStudio. Once applied, the original values cannot be recovered from the anonymized dataset. Always maintain a backup of your original data before applying anonymization if you need to preserve the original values for other purposes.

Performance Considerations: The enrichment groups all unique values for each attribute before applying anonymization, ensuring efficient processing even for large datasets. The sequential numbering approach maintains a predictable and readable format while ensuring uniqueness.

Integration with Other Features: Anonymized attributes retain their original data type and can be used in all mindzieStudio features including filters, process maps, and other enrichments. The anonymous identifiers can be used in group-by operations, conformance checking, and performance analysis just like the original values. The consistent replacement ensures that process patterns, frequencies, and relationships remain analyzable after anonymization.

See Also

  • Hide Attribute - Completely hide sensitive attributes from view without modifying data
  • Hide Blank Attributes - Remove attributes with no values from the dataset
  • Group Attribute Values - Combine similar attribute values into categories
  • Categorize Attribute Values - Create meaningful categories from attribute ranges
  • Trim Text - Clean up text attributes by removing leading/trailing spaces
  • Text Start - Extract the beginning portion of text attributes
  • Text End - Extract the ending portion of text attributes

This documentation is part of the mindzie Studio process mining platform.

An error has occurred. This application may no longer respond until reloaded. Reload ??