Text End

Overview

The Text End enrichment extracts a specified number of characters from the end of text attribute values, creating a new attribute containing the extracted suffix. This powerful text manipulation operator enables you to isolate and analyze the ending portions of text fields, which often contain critical identifying information, classification codes, or standardized suffixes. By focusing on the rightmost characters of text values, you can extract meaningful patterns and categories that are commonly appended to the end of business identifiers.

In process mining, the Text End enrichment is particularly valuable for working with structured codes and identifiers where the ending portion carries specific meaning. Many business systems use suffixes to denote categories, regions, product types, or status indicators. For example, invoice numbers might end with country codes, product SKUs might include category suffixes, or case IDs might contain department identifiers. This enrichment allows you to extract these meaningful endings for analysis, filtering, and process variant detection. The operator works with both case attributes and event attributes, providing flexibility in how you extract and analyze text patterns throughout your process data.

Common Uses

  • Extract file extensions from document names to analyze document types in approval processes
  • Isolate country or region codes from the end of customer or supplier identifiers
  • Extract department or team suffixes from case IDs for organizational analysis
  • Retrieve product category codes from the end of SKU numbers for inventory analysis
  • Identify version numbers or revision codes from the end of document references
  • Extract status indicators or flags appended to transaction codes
  • Isolate year or period indicators from financial reference numbers

Settings

New Attribute Name: Specify the name for the new attribute that will store the extracted text ending. Choose a descriptive name that clearly indicates what information is being extracted from the source text. For example, use "File_Extension" when extracting file types, "Country_Code" when extracting location identifiers, or "Category_Suffix" when extracting classification codes. The name must be unique and cannot conflict with existing attributes in your dataset.

Column Name: Select the text attribute from which you want to extract the ending characters. This dropdown presents all available text attributes from both case and event levels. The enrichment automatically detects whether the selected attribute is a case or event attribute and creates the new attribute at the same level. Only text (string) type attributes that are not hidden will be available for selection.

Length: Specify the number of characters to extract from the end of the text value. This must be a positive integer (minimum value of 1). If the specified length exceeds the actual length of a text value, the entire value will be returned. For example, if you specify a length of 3 and a value is only 2 characters long, the full 2-character value will be extracted. Consider the maximum expected length of the suffix you want to extract to avoid capturing unnecessary characters.

Examples

Example 1: Extracting File Extensions from Document Names

Scenario: In a document approval process, you need to analyze which document types are most commonly submitted and their processing times. Document names are stored with their file extensions, and you want to extract these extensions for categorization.

Settings:

  • New Attribute Name: Document_Type
  • Column Name: Document_Name
  • Length: 4

Output: Creates a new attribute "Document_Type" containing the last 4 characters of each document name. For cases with document names:

  • "Q3_Report_2024.pdf" → ".pdf"
  • "Contract_Amendment.docx" → "docx"
  • "Invoice_10245.xlsx" → "xlsx"
  • "Presentation.ppt" → ".ppt"

Insights: By extracting file extensions, you can analyze which document types require longer approval times, identify departments that work with specific file formats, and detect potential compliance issues with unauthorized file types.

Example 2: Isolating Country Codes from Supplier IDs

Scenario: In a global procurement process, supplier IDs end with two-letter country codes. You need to extract these codes to analyze procurement patterns by country and ensure compliance with regional sourcing policies.

Settings:

  • New Attribute Name: Supplier_Country
  • Column Name: Supplier_ID
  • Length: 2

Output: Creates a new case attribute "Supplier_Country" with the country code. For suppliers:

  • "SUP-2024-0145-US" → "US"
  • "SUP-2024-0892-DE" → "DE"
  • "SUP-2024-0234-CN" → "CN"
  • "SUP-2024-0567-BR" → "BR"

Insights: This extraction enables geographic analysis of supplier distribution, calculation of regional procurement metrics, and identification of compliance with local sourcing requirements.

Example 3: Extracting Department Codes from Case IDs

Scenario: In a healthcare patient registration system, case IDs include a three-character department code at the end. You need to extract these codes to analyze patient flow across different departments and identify bottlenecks.

Settings:

  • New Attribute Name: Department_Code
  • Column Name: Case_ID
  • Length: 3

Output: Creates a new attribute "Department_Code" containing department identifiers. For case IDs:

  • "PAT-2024-10523-EMR" → "EMR" (Emergency)
  • "PAT-2024-10524-RAD" → "RAD" (Radiology)
  • "PAT-2024-10525-LAB" → "LAB" (Laboratory)
  • "PAT-2024-10526-SUR" → "SUR" (Surgery)

Insights: Extracting department codes enables analysis of patient routing patterns, identification of department-specific delays, and comparison of processing times across different medical units.

Example 4: Retrieving Product Categories from SKU Numbers

Scenario: In a retail inventory management process, product SKUs end with a two-character category code. You want to extract these codes to analyze inventory turnover by product category and optimize stock levels.

Settings:

  • New Attribute Name: Product_Category
  • Column Name: SKU_Number
  • Length: 2

Output: Creates a new attribute "Product_Category" with category codes. For SKUs:

  • "PROD-854621-EL" → "EL" (Electronics)
  • "PROD-854622-CL" → "CL" (Clothing)
  • "PROD-854623-FD" → "FD" (Food)
  • "PROD-854624-TY" → "TY" (Toys)

Insights: Category extraction allows for analysis of category-specific inventory patterns, identification of slow-moving product types, and optimization of reorder points by product category.

Example 5: Extracting Year Indicators from Financial References

Scenario: In an accounts payable process, invoice numbers end with a four-digit year. You need to extract the year to analyze payment patterns over time and identify aging invoices.

Settings:

  • New Attribute Name: Invoice_Year
  • Column Name: Invoice_Number
  • Length: 4

Output: Creates a new attribute "Invoice_Year" containing the year. For invoice numbers:

  • "INV-US-054321-2024" → "2024"
  • "INV-EU-098765-2023" → "2023"
  • "INV-AP-012345-2024" → "2024"
  • "INV-LA-067890-2022" → "2022"

Insights: Year extraction enables trend analysis of invoice processing times, identification of old unpaid invoices, and year-over-year comparison of payment performance metrics.

Output

The Text End enrichment creates a new attribute (either case or event level, matching the source attribute) containing the extracted text from the end of the original values. The new attribute is always of string data type, regardless of what the extracted content represents. The attribute is automatically added to the appropriate table (case or event) and becomes immediately available for use in filters, calculators, and other enrichments.

For case attributes, the extraction is performed once per case, with the result stored at the case level. For event attributes, the extraction is performed for each event, allowing you to analyze how suffixes might vary across different activities in your process. If the source value is null or empty, the new attribute will also be null for that case or event.

The extracted text preserves the exact characters from the end of the source string, including any special characters, numbers, or punctuation marks. This ensures that meaningful suffixes like file extensions (including the dot) or composite codes are captured accurately. The enrichment handles variable-length source texts gracefully - if a source value is shorter than the specified extraction length, the entire value is returned rather than generating an error.


This documentation is part of the mindzie Studio process mining platform.

An error has occurred. This application may no longer respond until reloaded. Reload ??