Text Start

Overview

The Text Start enrichment extracts a specified number of characters from the beginning of a text attribute value, creating a new attribute containing the extracted prefix. This powerful enrichment enables you to systematically extract and analyze the leading portions of text data, such as product codes, department identifiers, location prefixes, or any other meaningful text patterns that appear at the beginning of attribute values.

In process mining, Text Start is invaluable for standardizing and categorizing data based on text prefixes. For example, you might extract the first three characters of invoice numbers to identify regional offices, pull department codes from employee IDs, or extract product line identifiers from SKUs. By creating new attributes with these extracted prefixes, you can perform more granular analysis, create meaningful groupings, and uncover patterns that might otherwise be hidden within longer text strings. This enrichment works with both case-level and event-level attributes, providing flexibility in how you structure and analyze your process data.

Common Uses

Extract department codes from employee IDs (e.g., "FIN-12345" to "FIN")
Identify regional identifiers from invoice numbers or order codes
Pull product category prefixes from SKU codes for inventory analysis
Extract area codes from phone numbers for geographical analysis
Identify document types from document IDs that follow naming conventions
Create groupings based on standardized prefixes in reference numbers
Extract year or month identifiers from date-based text codes

Settings

New Attribute Name: The name of the new attribute that will be created to store the extracted text prefix. This should be a descriptive name that clearly indicates what information the attribute contains. For example, if extracting department codes from employee IDs, you might name it "DepartmentCode" or "EmployeeDept". The new attribute will be created at the same level (case or event) as the source attribute.

Column Name: The source text attribute from which you want to extract the beginning characters. This dropdown lists all available text attributes in your dataset that are not hidden. The enrichment will process each value in this column, extracting the specified number of characters from the start. If a value is shorter than the specified length, the entire value will be used.

Length: The number of characters to extract from the beginning of the text value. This must be a positive integer (1 or greater). For example, setting this to 3 will extract the first three characters, while setting it to 5 will extract the first five characters. If the source text is shorter than the specified length, the enrichment will use the entire available text without padding or error.

Examples

Example 1: Department Code Extraction from Employee IDs

Scenario: A healthcare organization uses employee IDs that begin with department codes (e.g., "NUR-45678" for nursing, "ADM-12345" for administration, "LAB-98765" for laboratory). They want to analyze process performance by department.

Settings:

New Attribute Name: DepartmentCode
Column Name: EmployeeID
Length: 3

Output: The enrichment creates a new case attribute "DepartmentCode" with values:

Employee "NUR-45678" → DepartmentCode: "NUR"
Employee "ADM-12345" → DepartmentCode: "ADM"
Employee "LAB-98765" → DepartmentCode: "LAB"
Employee "IT-5432" → DepartmentCode: "IT-" (includes hyphen as part of first 3 characters)

Insights: With the extracted department codes, the organization can now filter processes by department, compare cycle times across departments, and identify department-specific bottlenecks or compliance issues.

Example 2: Regional Office Identification from Invoice Numbers

Scenario: A multinational corporation uses invoice numbers where the first two characters represent the regional office (e.g., "US-INV-2024-0001" for United States, "EU-INV-2024-0002" for Europe, "AP-INV-2024-0003" for Asia Pacific).

Settings:

New Attribute Name: RegionalOffice
Column Name: InvoiceNumber
Length: 2

Output: The enrichment creates a new case attribute "RegionalOffice" with values:

Invoice "US-INV-2024-0001" → RegionalOffice: "US"
Invoice "EU-INV-2024-0002" → RegionalOffice: "EU"
Invoice "AP-INV-2024-0003" → RegionalOffice: "AP"
Invoice "UK-INV-2024-0004" → RegionalOffice: "UK"

Insights: The company can now analyze invoice processing times by region, identify regional variations in approval workflows, and benchmark performance across different offices to standardize best practices.

Example 3: Product Line Extraction from SKU Codes

Scenario: A manufacturing company uses SKU codes where the first four characters identify the product line (e.g., "ELEC-TV-55-BLK" for electronics, "FURN-CHR-WD-01" for furniture, "TOYS-DOL-12-PNK" for toys).

Settings:

New Attribute Name: ProductLine
Column Name: SKUCode
Length: 4

Output: The enrichment creates a new event attribute "ProductLine" with values:

SKU "ELEC-TV-55-BLK" → ProductLine: "ELEC"
SKU "FURN-CHR-WD-01" → ProductLine: "FURN"
SKU "TOYS-DOL-12-PNK" → ProductLine: "TOYS"
SKU "APP-SHT-L-BLU" → ProductLine: "APP-" (note: shorter code, gets first 4 chars including hyphen)

Insights: The manufacturer can analyze order fulfillment processes by product line, identify which product lines have longer lead times, and optimize warehouse operations based on product line characteristics.

Example 4: Document Type Classification in Procurement

Scenario: A procurement system uses document IDs that start with three-letter codes indicating document type (e.g., "POR-2024-0001" for purchase orders, "RFQ-2024-0002" for requests for quotation, "CON-2024-0003" for contracts).

Settings:

New Attribute Name: DocumentType
Column Name: DocumentID
Length: 3

Output: The enrichment creates a new case attribute "DocumentType" with values:

Document "POR-2024-0001" → DocumentType: "POR"
Document "RFQ-2024-0002" → DocumentType: "RFQ"
Document "CON-2024-0003" → DocumentType: "CON"
Document "INV-2024-0004" → DocumentType: "INV"

Insights: The procurement team can track processing times by document type, ensure appropriate approval workflows are followed for different document types, and identify which document types experience the most delays or rework.

Example 5: Year Extraction from Date-Based Reference Numbers

Scenario: A financial services company uses reference numbers that begin with the year (e.g., "2024-FIN-00123", "2023-FIN-98765"). They want to analyze trends and volumes by year.

Settings:

New Attribute Name: ReferenceYear
Column Name: ReferenceNumber
Length: 4

Output: The enrichment creates a new case attribute "ReferenceYear" with values:

Reference "2024-FIN-00123" → ReferenceYear: "2024"
Reference "2023-FIN-98765" → ReferenceYear: "2023"
Reference "2022-FIN-45678" → ReferenceYear: "2022"
Reference "2021-FIN-12345" → ReferenceYear: "2021"

Insights: The company can track transaction volumes by year, analyze year-over-year process improvements, identify seasonal patterns, and measure the impact of process changes implemented in specific years.

Output

The Text Start enrichment creates a new attribute (either case-level or event-level, matching the source attribute's level) containing the extracted text prefix. The new attribute is always of type String and will contain the first N characters from each value in the source column, where N is the specified length.

The enrichment handles various scenarios gracefully:

If the source text is longer than the specified length, exactly the specified number of characters is extracted
If the source text is shorter than or equal to the specified length, the entire text value is used
If the source value is null or empty, the new attribute will also be null for that row
Special characters, spaces, and punctuation are treated as regular characters and included in the extraction if they fall within the specified length

The new attribute can be used immediately in subsequent enrichments, filters, and calculators. Common follow-up analyses include using the extracted prefixes in Group Attribute Values enrichment to create categories, applying filters to focus on specific prefixes, or using the prefixes in conformance checking to ensure proper coding standards are followed.

This documentation is part of the mindzie Studio process mining platform.