ai case prediction

Overview

The AI Case Prediction enrichment enables you to leverage machine learning and artificial intelligence to make predictions about case outcomes, behaviors, or characteristics based on historical patterns in your process data. This powerful enrichment trains predictive models using your existing case attributes and then applies those models to predict unknown values for current or future cases.

Unlike traditional rule-based enrichments, AI Case Prediction uses statistical learning algorithms to discover complex patterns and relationships in your data that may not be immediately apparent. The enrichment supports classification tasks (predicting categories or outcomes) and can handle both training model creation and prediction deployment within your process mining workflow.

This enrichment is particularly valuable for process optimization, risk management, and proactive decision-making. By predicting case outcomes early in the process lifecycle, you can take preventive actions, allocate resources more effectively, and identify potential issues before they occur.

Common Uses

Outcome Prediction: Predict whether a case will be approved or rejected, completed on time or delayed, successful or failed based on early case attributes
Risk Assessment: Identify high-risk cases that are likely to encounter problems, require rework, or result in customer complaints
Duration Forecasting: Predict how long a case will take to complete based on its initial characteristics and current progress
Resource Allocation: Predict which cases will require specialized handling or additional resources based on complexity indicators
Customer Churn Prevention: Predict which customer cases are at risk of cancellation or abandonment based on behavior patterns
Quality Prediction: Forecast whether a case will meet quality standards or require additional inspection based on process execution patterns
Cost Estimation: Predict the final cost of a case based on initial parameters and early activity patterns

Settings

Prediction Type

Prediction Type: Specifies the type of machine learning task to perform. Currently, the enrichment supports Classification, which predicts categorical outcomes or class labels.

Classification: Use for predicting discrete categories or outcomes such as "Approved/Rejected", "High Risk/Low Risk", "On Time/Delayed", or any categorical attribute. The model learns to classify cases into predefined groups based on patterns in the feature columns.
Regression: (Future) Will predict continuous numeric values such as durations, costs, or quantities
Clustering: (Future) Will group similar cases together without predefined categories
Time Series: (Future) Will predict temporal patterns and sequences
Anomaly Detection: (Future) Will identify unusual or outlier cases
Recommendation: (Future) Will suggest optimal next actions or activities

For most business use cases, Classification is the appropriate choice when you want to predict a specific outcome that falls into distinct categories.

Feature Columns

Feature Columns: Select the case attributes that will be used as input features for training and prediction. These are the independent variables that the AI model will analyze to make predictions. Choose attributes that you believe influence or correlate with the outcome you're trying to predict.

Best practices for selecting feature columns:

Include attributes that are known early in the case lifecycle if you want to make early predictions
Select attributes with good data quality (minimal missing values)
Include both categorical and numeric attributes for richer patterns
Avoid selecting the target column (the one you're predicting) as a feature
Consider domain knowledge about which factors influence outcomes
Start with 3-10 relevant features; too many can reduce model accuracy

Examples of useful feature columns:

Customer type, region, or segment
Order amount, priority, or category
Initial request characteristics
Resource assignments or department
Time-based attributes (day of week, month, season)

Predict Value Column

Predict Value Column: Select the case attribute that contains the known outcomes you want the model to learn from during training. This is the dependent variable or target that the model will predict for new cases. This column must have known values in your training data but may be empty for cases where you want to make predictions.

For Classification prediction type, valid columns are:

String attributes (text categories like "Approved", "Rejected", "Pending")
Boolean attributes (true/false outcomes)
Integer attributes (numeric codes representing categories)

The Predict Value Column should:

Contain the actual outcome you want to predict
Have sufficient examples of each category in the training data
Be the key business outcome you want to forecast
Not be available or known at the time you want to make the prediction

Training Filters

Training Filters: Define filter criteria to select which cases will be used to train the AI model. This allows you to use only high-quality, complete cases for model training while excluding cases that may not be representative or have incomplete data.

Common training filter scenarios:

Include only completed cases (exclude in-progress cases)
Include only cases where the predict value is known (not empty)
Exclude cases with data quality issues or missing feature values
Include only recent cases to train on current process patterns
Filter by specific time periods, departments, or regions
Balance the training set by including equal numbers of different outcome categories

Example: "Case End Time is not empty AND Outcome is not empty AND Case Start Time is after 2024-01-01"

Prediction Filters

Prediction Filters: Define filter criteria to select which cases will receive predictions when the enrichment runs. This allows you to predict selectively for cases where predictions are most valuable or where the outcome is not yet known.

Common prediction filter scenarios:

Include only in-progress cases (where outcome is not yet known)
Include only cases where the predict value is empty
Filter to specific time periods or current active cases
Include only cases that meet certain risk criteria
Predict only for high-value or high-priority cases

Example: "Outcome is empty AND Case Status equals 'In Progress' AND Case Start Time is after 2025-01-01"

New Prediction Column

New Prediction Column: Define the name, data type, and display format for the new case attribute that will store the AI predictions. This column will be added to your case table and populated with the predicted values when the enrichment executes.

Configuration options:

Column Name: Internal name for the new attribute (no spaces, use underscores)
Display Name: User-friendly name shown in analysis dashboards
Data Type: Must match the data type of your Predict Value Column (String for text categories, Boolean for true/false, Integer for numeric codes)
Format: How the values should be displayed in visualizations (Text, Number, Percentage, etc.)

Example configurations:

Column Name: "predicted_outcome", Display Name: "Predicted Outcome", Type: String
Column Name: "risk_prediction", Display Name: "Risk Level Prediction", Type: String
Column Name: "will_delay", Display Name: "Predicted to Delay", Type: Boolean

Model Id

Model Id: (Optional) Specify the unique identifier (GUID) of a previously trained model to use for predictions. When you train a model and save it, mindzieStudio assigns it a unique Model Id. By providing this Id, you can reuse the trained model without retraining, ensuring consistent predictions across different datasets or time periods.

Leave this field empty if you want the enrichment to train a new model each time it runs. Provide a Model Id when:

You've already trained and validated a model that performs well
You want to ensure consistency by using the same model over time
You're applying predictions to a new dataset using an existing model
You want to avoid the computational cost of retraining

The Model Id can be found in the enrichment execution logs or model management interface after successful model training.

Python Image

Python Image: Specifies the Python execution environment to use for running the AI model training and prediction scripts. mindzieStudio supports multiple Python execution modes to accommodate different deployment scenarios.

Options:

LOCAL: Uses the local Python installation on the mindzieStudio server. This is the fastest option when Python 3.x is installed locally with required machine learning libraries (pandas, scikit-learn, etc.)
Docker Image Name: Specifies a Docker container image that contains Python and required libraries. Example: "python:3.9-slim" or custom images with pre-installed ML libraries
Python not configured: Indicates that neither local Python nor Docker is available. You'll need to configure Python execution before using this enrichment.

The default behavior:

If local Python is available, it automatically selects "LOCAL"
If Docker is configured but not local Python, it uses the default Docker Python image
If neither is available, it prompts you to configure Python execution

For production use, Docker images are recommended for consistency and isolation, while LOCAL is convenient for development and testing when you have full control over the server environment.

Examples

Example 1: Predicting Purchase Order Approval Outcomes

Scenario: A procurement organization wants to predict whether purchase orders will be approved or rejected based on order characteristics, so they can flag potential rejections early and work proactively with requesters to improve approval rates.

Settings:

Prediction Type: Classification
Feature Columns: Order_Amount, Department, Vendor_Category, Requester_Level, Budget_Available, Previous_Orders_Count, Urgency_Flag
Predict Value Column: Approval_Outcome (contains "Approved" or "Rejected" for completed orders)
Training Filters: "Approval_Outcome is not empty AND Case_End_Time is not empty" (use only completed orders with known outcomes)
Prediction Filters: "Approval_Outcome is empty AND Case_Status equals 'Under Review'" (predict for orders currently being reviewed)
New Prediction Column:
- Column Name: predicted_approval
- Display Name: Predicted Approval Outcome
- Data Type: String
Model Id: (empty - train new model)
Python Image: LOCAL

Output: The enrichment creates a new case attribute called "Predicted Approval Outcome" with values of either "Approved" or "Rejected" for each order under review. The prediction is based on patterns learned from historical orders, such as:

Orders over $50,000 from new vendors are more likely to be rejected
Orders with budget available and requester level "Manager" or higher are more likely to be approved
Urgent orders with previous successful orders from the same vendor have higher approval rates

Insights: By analyzing the predictions, the procurement team discovers that 23% of current orders under review are predicted to be rejected. They proactively reach out to requesters of predicted rejections to gather additional justification, suggest alternative vendors, or split large orders into smaller approvals. This intervention improves the overall approval rate from 78% to 89% and reduces process cycle time by avoiding lengthy rejection-resubmission cycles.

Example 2: Healthcare Patient Readmission Risk Prediction

Scenario: A hospital wants to predict which discharged patients are at high risk of readmission within 30 days, enabling care coordinators to provide targeted follow-up support and reduce readmission rates.

Settings:

Prediction Type: Classification
Feature Columns: Patient_Age, Diagnosis_Category, Length_of_Stay, Comorbidity_Count, Prior_Admissions, Discharge_Destination, Medication_Complexity, Social_Support_Score
Predict Value Column: Readmitted_30_Days (contains "Yes" or "No" for past discharge cases)
Training Filters: "Discharge_Date is not empty AND Days_Since_Discharge >= 30" (use only cases where 30-day outcome is known)
Prediction Filters: "Discharge_Date is not empty AND Days_Since_Discharge < 30" (predict for recent discharges)
New Prediction Column:
- Column Name: readmission_risk_prediction
- Display Name: Predicted Readmission Risk
- Data Type: String
Model Id: (empty)
Python Image: LOCAL

Output: The enrichment adds a "Predicted Readmission Risk" attribute showing "Yes" or "No" for each recently discharged patient. Sample predictions show:

Patient ID 45321: Age 72, Heart Failure, 8-day stay, 3 comorbidities, discharged to home alone = Predicted Risk "Yes"
Patient ID 45322: Age 55, Minor Surgery, 2-day stay, no comorbidities, discharged to home with family = Predicted Risk "No"
Patient ID 45323: Age 68, Pneumonia, 5-day stay, 2 comorbidities, prior admission 3 months ago = Predicted Risk "Yes"

Insights: The model identifies 78 patients in the last 30 days predicted to be at high risk of readmission. The care coordination team prioritizes these patients for home health visits, medication reviews, and follow-up appointments. After 90 days of using the predictions to guide interventions, the actual readmission rate for high-risk patients drops from 22% to 14%, demonstrating the value of proactive, data-driven patient management.

Example 3: Manufacturing Quality Defect Prediction

Scenario: A manufacturing company wants to predict which production orders will result in quality defects based on initial order parameters and early production metrics, allowing them to implement additional quality controls before defects occur.

Settings:

Prediction Type: Classification
Feature Columns: Product_Type, Batch_Size, Material_Supplier, Production_Line, Operator_Experience_Level, Temperature_Variance, First_Pass_Yield, Cycle_Time_Deviation
Predict Value Column: Quality_Defect_Found (contains "Defect" or "Pass" for completed orders)
Training Filters: "Production_Status equals 'Completed' AND Quality_Inspection_Complete equals true" (use only fully inspected completed orders)
Prediction Filters: "Production_Status equals 'In Progress' AND Percent_Complete >= 25 AND Percent_Complete < 100" (predict for orders in production)
New Prediction Column:
- Column Name: defect_prediction
- Display Name: Predicted Quality Outcome
- Data Type: String
Model Id: (empty)
Python Image: LOCAL

Output: The enrichment generates quality predictions for 156 orders currently in production. Example predictions:

Order #10045: Large batch, new material supplier, high temperature variance = Predicted "Defect" (quality alert triggered)
Order #10046: Standard product, experienced operator, normal metrics = Predicted "Pass"
Order #10047: Complex product, Production Line B, cycle time 15% over normal = Predicted "Defect" (quality alert triggered)

The system creates a real-time quality dashboard showing predicted defects alongside actual production status, enabling quality engineers to intervene before orders complete.

Insights: Using the predictions, the quality team implements enhanced inspections and process adjustments for orders predicted to have defects. Over 3 months, they prevent 34 defective orders from reaching final inspection by catching issues early. The defect rate drops from 8.2% to 4.1%, and rework costs decrease by $127,000. The model reveals that orders with new material suppliers combined with high temperature variance have a 67% defect rate, leading to updated supplier qualification procedures and tighter temperature controls.

Example 4: Financial Loan Default Risk Prediction

Scenario: A financial institution wants to predict which approved loan applications are likely to default within the first 12 months, enabling risk managers to adjust loan terms, require additional collateral, or implement more frequent monitoring for high-risk loans.

Settings:

Prediction Type: Classification
Feature Columns: Loan_Amount, Credit_Score, Debt_to_Income_Ratio, Employment_Duration, Loan_Purpose, Property_Value, Down_Payment_Percent, Previous_Loans
Predict Value Column: Defaulted_12_Months (contains "Default" or "Performing" for loans with 12+ months history)
Training Filters: "Loan_Origination_Date < '2024-01-01' AND Months_Since_Origination >= 12" (use only loans with known 12-month outcomes)
Prediction Filters: "Loan_Status equals 'Active' AND Months_Since_Origination < 12" (predict for recent loans)
New Prediction Column:
- Column Name: default_risk_prediction
- Display Name: Predicted Default Risk
- Data Type: String
Model Id: a1b2c3d4-e5f6-7890-a1b2-c3d4e5f6g7h8 (using a previously trained and validated model)
Python Image: LOCAL

Output: The enrichment applies the trained model to 892 active loans originated in the past 12 months, generating default risk predictions:

724 loans predicted as "Performing" (low risk)
168 loans predicted as "Default" (high risk)

Sample high-risk predictions:

Loan #50012: $320K, credit score 640, DTI 42%, employment 8 months = "Default"
Loan #50034: $180K, credit score 680, DTI 38%, previous late payments = "Default"
Loan #50078: $425K, credit score 655, DTI 45%, high loan-to-value ratio = "Default"

Insights: The risk management team segments the portfolio into predicted risk levels and implements differentiated monitoring strategies. High-risk loans receive monthly check-ins versus quarterly for low-risk loans. They also adjust pricing models to account for predicted risk, increasing interest rates by 0.5-1.0% for high-risk profiles. After 12 months, the model's predictions prove 82% accurate, and the proactive monitoring reduces actual default rates in the high-risk segment from 15% to 9%, saving an estimated $2.3 million in losses.

Example 5: Customer Service Case Resolution Prediction

Scenario: A customer service organization wants to predict whether support tickets will be resolved within the target SLA timeframe based on initial ticket characteristics, allowing them to escalate at-risk cases early and improve SLA compliance rates.

Settings:

Prediction Type: Classification
Feature Columns: Issue_Category, Customer_Tier, Complexity_Score, Assigned_Team, Initial_Response_Time, Customer_Sentiment, Product_Version, Similar_Cases_Count
Predict Value Column: Resolved_Within_SLA (contains "Yes" or "No" for closed tickets)
Training Filters: "Ticket_Status equals 'Closed' AND Close_Date is not empty" (use only resolved tickets)
Prediction Filters: "Ticket_Status equals 'Open' AND Hours_Since_Creation >= 2 AND Hours_Since_Creation < 24" (predict for recently opened tickets)
New Prediction Column:
- Column Name: sla_compliance_prediction
- Display Name: Predicted SLA Compliance
- Data Type: String
Model Id: (empty)
Python Image: LOCAL

Output: The enrichment predicts SLA compliance for 234 currently open support tickets. Example predictions:

Ticket #7845: Billing issue, Premium customer, Complexity 2, Team A, 15-min response = Predicted "Yes"
Ticket #7846: Technical bug, Standard customer, Complexity 8, Team B, 45-min response = Predicted "No" (escalation triggered)
Ticket #7847: Password reset, Basic customer, Complexity 1, Team C, 5-min response = Predicted "Yes"

The predictions are displayed in the support team dashboard with color-coding: green for predicted SLA compliance, red for predicted SLA breach.

Insights: Support managers use the predictions to proactively escalate at-risk tickets to senior engineers or allocate additional resources. Over 6 months, the SLA compliance rate improves from 83% to 91%. The model reveals that tickets with high complexity scores assigned to Team B during peak hours have only a 58% chance of meeting SLA, leading to workload rebalancing and additional training for Team B. The organization also discovers that initial response time is the strongest predictor of overall resolution time, prompting new policies to ensure first responses within 15 minutes.

Output

When the AI Case Prediction enrichment executes successfully, it creates a new case attribute in your dataset with the name you specified in the "New Prediction Column" configuration. This attribute is added as a derived column to the case table and appears alongside your other case attributes in all analysis dashboards, filters, and visualizations.

Prediction Values

The values stored in the new prediction column depend on your Predict Value Column data type:

For String (Text) Predictions:

The column contains text values matching the categories from your training data
Example: "Approved", "Rejected", "High Risk", "Low Risk", "Delayed", "On Time"
These values can be used in filters, grouping, and color-coding in dashboards

For Boolean Predictions:

The column contains True or False values
Example: True = "Will Default", False = "Will Not Default"
Ideal for binary outcome predictions and simple yes/no classifications

For Integer Predictions:

The column contains numeric codes representing categories
Example: 0 = "Low Risk", 1 = "Medium Risk", 2 = "High Risk"
Useful when categories have a natural numeric ordering

Using Prediction Results

Once the prediction column is created, you can leverage it throughout mindzieStudio:

In Filters:

Filter cases to show only high-risk predictions: "Predicted Risk equals 'High Risk'"
Exclude low-risk cases from detailed analysis: "Predicted Outcome not equals 'Low Risk'"
Combine predictions with other criteria: "Predicted Delay equals 'Yes' AND Order Amount > $10,000"

In Dashboards:

Create performance charts grouped by predicted outcome
Use predictions as color-coding in process maps to visualize risk across process paths
Build KPI metrics showing prediction accuracy by comparing predicted vs actual outcomes
Create heat maps showing predicted risk by department, product, or time period

In Further Enrichments:

Use predictions as input to calculators (Example: "High Risk Score" calculator that considers predicted risk)
Combine with other enrichments to create composite risk scores
Use as filter criteria for targeted enrichments (Example: "Add compliance check only for predicted non-compliant cases")

For Process Improvement:

Identify process patterns that lead to negative predicted outcomes
Prioritize process redesign efforts on activities that most influence negative predictions
Monitor prediction trends over time to measure process improvement effectiveness
Compare predicted vs actual outcomes to validate and refine your model

Model Training Output

When training a new model (when Model Id is not provided), the enrichment generates additional artifacts:

Training Files:

Training.csv: The filtered case data used for model training
Training.schema: Data type definitions for training columns
Prediction.csv: The filtered case data requiring predictions
Prediction.schema: Data type definitions for prediction columns

Model Files:

script.py: The Python script that trains and applies the model
model_trainer.py: The model training logic
mindzie_helper.py: Utility functions for data loading and processing

Console Output: The enrichment execution logs show:

"Loading training data..." with row counts
"Fitting model to training data..." with progress indicators
"Model training completed successfully!"
"Loading prediction data..." with row counts
"Generating predictions..." with completion status
"Successfully saved predictions to: out/Prediction.csv"

This detailed output helps you verify that training completed successfully and understand the scope of predictions generated.

Prediction Quality Indicators

For production use, consider monitoring these quality indicators:

Prediction Coverage: What percentage of cases received predictions vs failed due to missing feature values
Prediction Distribution: Are predictions balanced or heavily skewed toward one outcome
Validation Accuracy: When comparing predicted vs actual outcomes for historical cases, what is the accuracy rate
Missing Value Handling: Which cases failed to receive predictions due to incomplete feature data

By analyzing these indicators, you can iteratively improve your feature selection, training filters, and data quality to enhance prediction accuracy and business value.

Overview

Common Uses

Settings

Prediction Type

Feature Columns

Predict Value Column

Training Filters

Prediction Filters

New Prediction Column

Model Id

Python Image

Examples

Example 1: Predicting Purchase Order Approval Outcomes

Example 2: Healthcare Patient Readmission Risk Prediction

Example 3: Manufacturing Quality Defect Prediction

Example 4: Financial Loan Default Risk Prediction

Example 5: Customer Service Case Resolution Prediction

Output

Prediction Values

Using Prediction Results

Model Training Output

Prediction Quality Indicators

See Also