Overview
The AI Case Prediction enrichment enables you to leverage machine learning and artificial intelligence to make predictions about case outcomes, behaviors, or characteristics based on historical patterns in your process data. This powerful enrichment trains predictive models using your existing case attributes and then applies those models to predict unknown values for current or future cases.
Unlike traditional rule-based enrichments, AI Case Prediction uses statistical learning algorithms to discover complex patterns and relationships in your data that may not be immediately apparent. The enrichment supports classification tasks (predicting categories or outcomes) and can handle both training model creation and prediction deployment within your process mining workflow.
This enrichment is particularly valuable for process optimization, risk management, and proactive decision-making. By predicting case outcomes early in the process lifecycle, you can take preventive actions, allocate resources more effectively, and identify potential issues before they occur.
Common Uses
- Outcome Prediction: Predict whether a case will be approved or rejected, completed on time or delayed, successful or failed based on early case attributes
- Risk Assessment: Identify high-risk cases that are likely to encounter problems, require rework, or result in customer complaints
- Duration Forecasting: Predict how long a case will take to complete based on its initial characteristics and current progress
- Resource Allocation: Predict which cases will require specialized handling or additional resources based on complexity indicators
- Customer Churn Prevention: Predict which customer cases are at risk of cancellation or abandonment based on behavior patterns
- Quality Prediction: Forecast whether a case will meet quality standards or require additional inspection based on process execution patterns
- Cost Estimation: Predict the final cost of a case based on initial parameters and early activity patterns
Settings
Prediction Type
Prediction Type: Specifies the type of machine learning task to perform. Currently, the enrichment supports Classification, which predicts categorical outcomes or class labels.
- Classification: Use for predicting discrete categories or outcomes such as "Approved/Rejected", "High Risk/Low Risk", "On Time/Delayed", or any categorical attribute. The model learns to classify cases into predefined groups based on patterns in the feature columns.
- Regression: (Future) Will predict continuous numeric values such as durations, costs, or quantities
- Clustering: (Future) Will group similar cases together without predefined categories
- Time Series: (Future) Will predict temporal patterns and sequences
- Anomaly Detection: (Future) Will identify unusual or outlier cases
- Recommendation: (Future) Will suggest optimal next actions or activities
For most business use cases, Classification is the appropriate choice when you want to predict a specific outcome that falls into distinct categories.
Feature Columns
Feature Columns: Select the case attributes that will be used as input features for training and prediction. These are the independent variables that the AI model will analyze to make predictions. Choose attributes that you believe influence or correlate with the outcome you're trying to predict.
Best practices for selecting feature columns:
- Include attributes that are known early in the case lifecycle if you want to make early predictions
- Select attributes with good data quality (minimal missing values)
- Include both categorical and numeric attributes for richer patterns
- Avoid selecting the target column (the one you're predicting) as a feature
- Consider domain knowledge about which factors influence outcomes
- Start with 3-10 relevant features; too many can reduce model accuracy
Examples of useful feature columns:
- Customer type, region, or segment
- Order amount, priority, or category
- Initial request characteristics
- Resource assignments or department
- Time-based attributes (day of week, month, season)
Predict Value Column
Predict Value Column: Select the case attribute that contains the known outcomes you want the model to learn from during training. This is the dependent variable or target that the model will predict for new cases. This column must have known values in your training data but may be empty for cases where you want to make predictions.
For Classification prediction type, valid columns are:
- String attributes (text categories like "Approved", "Rejected", "Pending")
- Boolean attributes (true/false outcomes)
- Integer attributes (numeric codes representing categories)
The Predict Value Column should:
- Contain the actual outcome you want to predict
- Have sufficient examples of each category in the training data
- Be the key business outcome you want to forecast
- Not be available or known at the time you want to make the prediction
Training Filters
Training Filters: Define filter criteria to select which cases will be used to train the AI model. This allows you to use only high-quality, complete cases for model training while excluding cases that may not be representative or have incomplete data.
Common training filter scenarios:
- Include only completed cases (exclude in-progress cases)
- Include only cases where the predict value is known (not empty)
- Exclude cases with data quality issues or missing feature values
- Include only recent cases to train on current process patterns
- Filter by specific time periods, departments, or regions
- Balance the training set by including equal numbers of different outcome categories
Example: "Case End Time is not empty AND Outcome is not empty AND Case Start Time is after 2024-01-01"
Prediction Filters
Prediction Filters: Define filter criteria to select which cases will receive predictions when the enrichment runs. This allows you to predict selectively for cases where predictions are most valuable or where the outcome is not yet known.
Common prediction filter scenarios:
- Include only in-progress cases (where outcome is not yet known)
- Include only cases where the predict value is empty
- Filter to specific time periods or current active cases
- Include only cases that meet certain risk criteria
- Predict only for high-value or high-priority cases
Example: "Outcome is empty AND Case Status equals 'In Progress' AND Case Start Time is after 2025-01-01"
New Prediction Column
New Prediction Column: Define the name, data type, and display format for the new case attribute that will store the AI predictions. This column will be added to your case table and populated with the predicted values when the enrichment executes.
Configuration options:
- Column Name: Internal name for the new attribute (no spaces, use underscores)
- Display Name: User-friendly name shown in analysis dashboards
- Data Type: Must match the data type of your Predict Value Column (String for text categories, Boolean for true/false, Integer for numeric codes)
- Format: How the values should be displayed in visualizations (Text, Number, Percentage, etc.)
Example configurations:
- Column Name: "predicted_outcome", Display Name: "Predicted Outcome", Type: String
- Column Name: "risk_prediction", Display Name: "Risk Level Prediction", Type: String
- Column Name: "will_delay", Display Name: "Predicted to Delay", Type: Boolean
Model Id
Model Id: (Optional) Specify the unique identifier (GUID) of a previously trained model to use for predictions. When you train a model and save it, mindzieStudio assigns it a unique Model Id. By providing this Id, you can reuse the trained model without retraining, ensuring consistent predictions across different datasets or time periods.
Leave this field empty if you want the enrichment to train a new model each time it runs. Provide a Model Id when:
- You've already trained and validated a model that performs well
- You want to ensure consistency by using the same model over time
- You're applying predictions to a new dataset using an existing model
- You want to avoid the computational cost of retraining
The Model Id can be found in the enrichment execution logs or model management interface after successful model training.
Python Image
Python Image: Specifies the Python execution environment to use for running the AI model training and prediction scripts. mindzieStudio supports multiple Python execution modes to accommodate different deployment scenarios.
Options:
- LOCAL: Uses the local Python installation on the mindzieStudio server. This is the fastest option when Python 3.x is installed locally with required machine learning libraries (pandas, scikit-learn, etc.)
- Docker Image Name: Specifies a Docker container image that contains Python and required libraries. Example: "python:3.9-slim" or custom images with pre-installed ML libraries
- Python not configured: Indicates that neither local Python nor Docker is available. You'll need to configure Python execution before using this enrichment.
The default behavior:
- If local Python is available, it automatically selects "LOCAL"
- If Docker is configured but not local Python, it uses the default Docker Python image
- If neither is available, it prompts you to configure Python execution
For production use, Docker images are recommended for consistency and isolation, while LOCAL is convenient for development and testing when you have full control over the server environment.
Examples
Example 1: Predicting Purchase Order Approval Outcomes
Scenario: A procurement organization wants to predict whether purchase orders will be approved or rejected based on order characteristics, so they can flag potential rejections early and work proactively with requesters to improve approval rates.
Settings:
- Prediction Type: Classification
- Feature Columns: Order_Amount, Department, Vendor_Category, Requester_Level, Budget_Available, Previous_Orders_Count, Urgency_Flag
- Predict Value Column: Approval_Outcome (contains "Approved" or "Rejected" for completed orders)
- Training Filters: "Approval_Outcome is not empty AND Case_End_Time is not empty" (use only completed orders with known outcomes)
- Prediction Filters: "Approval_Outcome is empty AND Case_Status equals 'Under Review'" (predict for orders currently being reviewed)
- New Prediction Column:
- Column Name: predicted_approval
- Display Name: Predicted Approval Outcome
- Data Type: String
- Model Id: (empty - train new model)
- Python Image: LOCAL
Output: The enrichment creates a new case attribute called "Predicted Approval Outcome" with values of either "Approved" or "Rejected" for each order under review. The prediction is based on patterns learned from historical orders, such as:
- Orders over $50,000 from new vendors are more likely to be rejected
- Orders with budget available and requester level "Manager" or higher are more likely to be approved
- Urgent orders with previous successful orders from the same vendor have higher approval rates
Insights: By analyzing the predictions, the procurement team discovers that 23% of current orders under review are predicted to be rejected. They proactively reach out to requesters of predicted rejections to gather additional justification, suggest alternative vendors, or split large orders into smaller approvals. This intervention improves the overall approval rate from 78% to 89% and reduces process cycle time by avoiding lengthy rejection-resubmission cycles.
Example 2: Healthcare Patient Readmission Risk Prediction
Scenario: A hospital wants to predict which discharged patients are at high risk of readmission within 30 days, enabling care coordinators to provide targeted follow-up support and reduce readmission rates.
Settings:
- Prediction Type: Classification
- Feature Columns: Patient_Age, Diagnosis_Category, Length_of_Stay, Comorbidity_Count, Prior_Admissions, Discharge_Destination, Medication_Complexity, Social_Support_Score
- Predict Value Column: Readmitted_30_Days (contains "Yes" or "No" for past discharge cases)
- Training Filters: "Discharge_Date is not empty AND Days_Since_Discharge >= 30" (use only cases where 30-day outcome is known)
- Prediction Filters: "Discharge_Date is not empty AND Days_Since_Discharge < 30" (predict for recent discharges)
- New Prediction Column:
- Column Name: readmission_risk_prediction
- Display Name: Predicted Readmission Risk
- Data Type: String
- Model Id: (empty)
- Python Image: LOCAL
Output: The enrichment adds a "Predicted Readmission Risk" attribute showing "Yes" or "No" for each recently discharged patient. Sample predictions show:
- Patient ID 45321: Age 72, Heart Failure, 8-day stay, 3 comorbidities, discharged to home alone = Predicted Risk "Yes"
- Patient ID 45322: Age 55, Minor Surgery, 2-day stay, no comorbidities, discharged to home with family = Predicted Risk "No"
- Patient ID 45323: Age 68, Pneumonia, 5-day stay, 2 comorbidities, prior admission 3 months ago = Predicted Risk "Yes"
Insights: The model identifies 78 patients in the last 30 days predicted to be at high risk of readmission. The care coordination team prioritizes these patients for home health visits, medication reviews, and follow-up appointments. After 90 days of using the predictions to guide interventions, the actual readmission rate for high-risk patients drops from 22% to 14%, demonstrating the value of proactive, data-driven patient management.
Example 3: Manufacturing Quality Defect Prediction
Scenario: A manufacturing company wants to predict which production orders will result in quality defects based on initial order parameters and early production metrics, allowing them to implement additional quality controls before defects occur.
Settings:
- Prediction Type: Classification
- Feature Columns: Product_Type, Batch_Size, Material_Supplier, Production_Line, Operator_Experience_Level, Temperature_Variance, First_Pass_Yield, Cycle_Time_Deviation
- Predict Value Column: Quality_Defect_Found (contains "Defect" or "Pass" for completed orders)
- Training Filters: "Production_Status equals 'Completed' AND Quality_Inspection_Complete equals true" (use only fully inspected completed orders)
- Prediction Filters: "Production_Status equals 'In Progress' AND Percent_Complete >= 25 AND Percent_Complete < 100" (predict for orders in production)
- New Prediction Column:
- Column Name: defect_prediction
- Display Name: Predicted Quality Outcome
- Data Type: String
- Model Id: (empty)
- Python Image: LOCAL
Output: The enrichment generates quality predictions for 156 orders currently in production. Example predictions:
- Order #10045: Large batch, new material supplier, high temperature variance = Predicted "Defect" (quality alert triggered)
- Order #10046: Standard product, experienced operator, normal metrics = Predicted "Pass"
- Order #10047: Complex product, Production Line B, cycle time 15% over normal = Predicted "Defect" (quality alert triggered)
The system creates a real-time quality dashboard showing predicted defects alongside actual production status, enabling quality engineers to intervene before orders complete.
Insights: Using the predictions, the quality team implements enhanced inspections and process adjustments for orders predicted to have defects. Over 3 months, they prevent 34 defective orders from reaching final inspection by catching issues early. The defect rate drops from 8.2% to 4.1%, and rework costs decrease by $127,000. The model reveals that orders with new material suppliers combined with high temperature variance have a 67% defect rate, leading to updated supplier qualification procedures and tighter temperature controls.
Example 4: Financial Loan Default Risk Prediction
Scenario: A financial institution wants to predict which approved loan applications are likely to default within the first 12 months, enabling risk managers to adjust loan terms, require additional collateral, or implement more frequent monitoring for high-risk loans.
Settings:
- Prediction Type: Classification
- Feature Columns: Loan_Amount, Credit_Score, Debt_to_Income_Ratio, Employment_Duration, Loan_Purpose, Property_Value, Down_Payment_Percent, Previous_Loans
- Predict Value Column: Defaulted_12_Months (contains "Default" or "Performing" for loans with 12+ months history)
- Training Filters: "Loan_Origination_Date < '2024-01-01' AND Months_Since_Origination >= 12" (use only loans with known 12-month outcomes)
- Prediction Filters: "Loan_Status equals 'Active' AND Months_Since_Origination < 12" (predict for recent loans)
- New Prediction Column:
- Column Name: default_risk_prediction
- Display Name: Predicted Default Risk
- Data Type: String
- Model Id: a1b2c3d4-e5f6-7890-a1b2-c3d4e5f6g7h8 (using a previously trained and validated model)
- Python Image: LOCAL
Output: The enrichment applies the trained model to 892 active loans originated in the past 12 months, generating default risk predictions:
- 724 loans predicted as "Performing" (low risk)
- 168 loans predicted as "Default" (high risk)
Sample high-risk predictions:
- Loan #50012: $320K, credit score 640, DTI 42%, employment 8 months = "Default"
- Loan #50034: $180K, credit score 680, DTI 38%, previous late payments = "Default"
- Loan #50078: $425K, credit score 655, DTI 45%, high loan-to-value ratio = "Default"
Insights: The risk management team segments the portfolio into predicted risk levels and implements differentiated monitoring strategies. High-risk loans receive monthly check-ins versus quarterly for low-risk loans. They also adjust pricing models to account for predicted risk, increasing interest rates by 0.5-1.0% for high-risk profiles. After 12 months, the model's predictions prove 82% accurate, and the proactive monitoring reduces actual default rates in the high-risk segment from 15% to 9%, saving an estimated $2.3 million in losses.
Example 5: Customer Service Case Resolution Prediction
Scenario: A customer service organization wants to predict whether support tickets will be resolved within the target SLA timeframe based on initial ticket characteristics, allowing them to escalate at-risk cases early and improve SLA compliance rates.
Settings:
- Prediction Type: Classification
- Feature Columns: Issue_Category, Customer_Tier, Complexity_Score, Assigned_Team, Initial_Response_Time, Customer_Sentiment, Product_Version, Similar_Cases_Count
- Predict Value Column: Resolved_Within_SLA (contains "Yes" or "No" for closed tickets)
- Training Filters: "Ticket_Status equals 'Closed' AND Close_Date is not empty" (use only resolved tickets)
- Prediction Filters: "Ticket_Status equals 'Open' AND Hours_Since_Creation >= 2 AND Hours_Since_Creation < 24" (predict for recently opened tickets)
- New Prediction Column:
- Column Name: sla_compliance_prediction
- Display Name: Predicted SLA Compliance
- Data Type: String
- Model Id: (empty)
- Python Image: LOCAL
Output: The enrichment predicts SLA compliance for 234 currently open support tickets. Example predictions:
- Ticket #7845: Billing issue, Premium customer, Complexity 2, Team A, 15-min response = Predicted "Yes"
- Ticket #7846: Technical bug, Standard customer, Complexity 8, Team B, 45-min response = Predicted "No" (escalation triggered)
- Ticket #7847: Password reset, Basic customer, Complexity 1, Team C, 5-min response = Predicted "Yes"
The predictions are displayed in the support team dashboard with color-coding: green for predicted SLA compliance, red for predicted SLA breach.
Insights: Support managers use the predictions to proactively escalate at-risk tickets to senior engineers or allocate additional resources. Over 6 months, the SLA compliance rate improves from 83% to 91%. The model reveals that tickets with high complexity scores assigned to Team B during peak hours have only a 58% chance of meeting SLA, leading to workload rebalancing and additional training for Team B. The organization also discovers that initial response time is the strongest predictor of overall resolution time, prompting new policies to ensure first responses within 15 minutes.
Output
When the AI Case Prediction enrichment executes successfully, it creates a new case attribute in your dataset with the name you specified in the "New Prediction Column" configuration. This attribute is added as a derived column to the case table and appears alongside your other case attributes in all analysis dashboards, filters, and visualizations.
Prediction Values
The values stored in the new prediction column depend on your Predict Value Column data type:
For String (Text) Predictions:
- The column contains text values matching the categories from your training data
- Example: "Approved", "Rejected", "High Risk", "Low Risk", "Delayed", "On Time"
- These values can be used in filters, grouping, and color-coding in dashboards
For Boolean Predictions:
- The column contains True or False values
- Example: True = "Will Default", False = "Will Not Default"
- Ideal for binary outcome predictions and simple yes/no classifications
For Integer Predictions:
- The column contains numeric codes representing categories
- Example: 0 = "Low Risk", 1 = "Medium Risk", 2 = "High Risk"
- Useful when categories have a natural numeric ordering
Using Prediction Results
Once the prediction column is created, you can leverage it throughout mindzieStudio:
In Filters:
- Filter cases to show only high-risk predictions: "Predicted Risk equals 'High Risk'"
- Exclude low-risk cases from detailed analysis: "Predicted Outcome not equals 'Low Risk'"
- Combine predictions with other criteria: "Predicted Delay equals 'Yes' AND Order Amount > $10,000"
In Dashboards:
- Create performance charts grouped by predicted outcome
- Use predictions as color-coding in process maps to visualize risk across process paths
- Build KPI metrics showing prediction accuracy by comparing predicted vs actual outcomes
- Create heat maps showing predicted risk by department, product, or time period
In Further Enrichments:
- Use predictions as input to calculators (Example: "High Risk Score" calculator that considers predicted risk)
- Combine with other enrichments to create composite risk scores
- Use as filter criteria for targeted enrichments (Example: "Add compliance check only for predicted non-compliant cases")
For Process Improvement:
- Identify process patterns that lead to negative predicted outcomes
- Prioritize process redesign efforts on activities that most influence negative predictions
- Monitor prediction trends over time to measure process improvement effectiveness
- Compare predicted vs actual outcomes to validate and refine your model
Model Training Output
When training a new model (when Model Id is not provided), the enrichment generates additional artifacts:
Training Files:
- Training.csv: The filtered case data used for model training
- Training.schema: Data type definitions for training columns
- Prediction.csv: The filtered case data requiring predictions
- Prediction.schema: Data type definitions for prediction columns
Model Files:
- script.py: The Python script that trains and applies the model
- model_trainer.py: The model training logic
- mindzie_helper.py: Utility functions for data loading and processing
Console Output: The enrichment execution logs show:
- "Loading training data..." with row counts
- "Fitting model to training data..." with progress indicators
- "Model training completed successfully!"
- "Loading prediction data..." with row counts
- "Generating predictions..." with completion status
- "Successfully saved predictions to: out/Prediction.csv"
This detailed output helps you verify that training completed successfully and understand the scope of predictions generated.
Prediction Quality Indicators
For production use, consider monitoring these quality indicators:
- Prediction Coverage: What percentage of cases received predictions vs failed due to missing feature values
- Prediction Distribution: Are predictions balanced or heavily skewed toward one outcome
- Validation Accuracy: When comparing predicted vs actual outcomes for historical cases, what is the accuracy rate
- Missing Value Handling: Which cases failed to receive predictions due to incomplete feature data
By analyzing these indicators, you can iteratively improve your feature selection, training filters, and data quality to enhance prediction accuracy and business value.
See Also
Related AI and Advanced Enrichments:
- Python - Execute custom Python code for advanced data transformations and analysis
- Representative Case Attribute - Analyze and select representative values from case attributes
- Group Attribute Values - Group and categorize attribute values for analysis
Related Predictive Topics:
- Case Duration Analysis - Analyze historical case durations to inform duration predictions
- Process Simulation - Simulate future process performance using predictive models
- Risk Management Dashboards - Visualize and monitor predicted risks across your process
Machine Learning Best Practices:
- Model Training Guidelines - Best practices for training accurate prediction models
- Feature Engineering - Techniques for selecting and creating effective feature columns
- Model Validation - Methods for testing and validating model accuracy before deployment
- Production Deployment - Strategies for deploying AI predictions in production environments
This documentation is part of the mindzieStudio process mining platform.