Overview
The Predict Value enrichment uses advanced regression techniques to predict numeric attribute values based on historical patterns in your process data. This powerful statistical operator analyzes completed cases with known outcomes to build predictive models that can estimate values for ongoing or new cases. By examining relationships between input attributes and dependent variables, the enrichment identifies patterns and applies them to make data-driven predictions about future values.
This enrichment is particularly valuable for forecasting and planning in process mining scenarios. It enables organizations to predict process outcomes before completion, estimate financial impacts based on early indicators, and make proactive decisions based on likely future values. The enrichment uses configurable aggregation functions and historical case analysis to provide not just predictions but also confidence scores, helping users understand the reliability of each prediction. The operator can handle complex scenarios including minimum value constraints and fallback calculations, ensuring robust predictions even when historical data is limited.
Common Uses
- Predict delivery times based on order characteristics and customer location
- Estimate final invoice amounts from initial purchase order details
- Forecast production output based on input materials and process parameters
- Predict customer satisfaction scores from early interaction indicators
- Estimate project completion dates based on initial milestones
- Forecast resource consumption based on process attributes
- Predict quality scores from production line parameters
Settings
New Attribute Name: Specify the name for the new attribute that will store the predicted value. This attribute will contain the numeric prediction for each case. Choose a descriptive name that clearly indicates what value is being predicted, such as "Predicted_Delivery_Days" or "Estimated_Final_Cost".
Algorithm Name (Optional): Provide a custom name for the prediction algorithm. This name will be stored in a companion attribute (alongside the prediction and confidence score) to help track which method was used for each prediction. Useful when testing different prediction configurations or when multiple prediction enrichments are applied.
Input Attribute Names: Select one or more string attributes that will be used to group cases for prediction. Cases with matching values in these attributes will be considered similar and used together for prediction. For example, selecting "Customer_Region" and "Product_Category" means predictions will be based on historical cases from the same region and product category. If no attributes are selected, all cases with the dependent variable will be used as predictors.
Dependent Attribute Name: Select the numeric attribute you want to predict. This must be a numeric field (integer or decimal) that exists in some completed cases but may be missing in ongoing cases. The enrichment will analyze historical values of this attribute to make predictions for cases where it's not yet available.
Min Value Attribute Name (Optional): Select a numeric attribute that provides a minimum threshold for predictions. When specified, predictions will never be lower than this value. This is useful for business rules like "predicted delivery time cannot be less than current elapsed time" or "estimated cost cannot be below material cost". The attribute must be different from the dependent attribute.
Filter (Optional): Apply filters to limit which historical cases are used for building the prediction model. This allows you to exclude outliers, focus on recent data, or use only high-quality cases for prediction. For example, you might filter to use only cases from the last 6 months or exclude cases with data quality issues.
Aggregate Function: Choose the statistical function used to combine historical values into a prediction:
- Average: Uses the mean of historical values (default, balances all observations)
- Median: Uses the middle value (robust against outliers)
- Max: Uses the highest historical value (conservative for upper bounds)
- Min: Uses the lowest historical value (conservative for lower bounds)
Min Cases: Set the minimum number of historical cases required to make a prediction. Default is 2. If fewer matching cases are available, no prediction will be made unless a minimum value constraint provides a fallback. Higher values increase prediction reliability but may result in fewer predictions.
Max Cases: Set the maximum number of recent cases to use for prediction. Default is 10. The enrichment uses the most recent cases up to this limit, ensuring predictions reflect current patterns rather than outdated historical data. Lower values make predictions more responsive to recent changes.
Min Value Constant: When using minimum value constraints, this constant is added to the minimum value to create a fallback prediction. Default is 0. For example, with a minimum value of 100 and constant of 10, the fallback would be 110. This ensures predictions meet business requirements even when historical data is insufficient.
Min Value Factor: When using minimum value constraints, this factor multiplies the minimum value in the fallback calculation. Default is 1.0. For example, with a minimum value of 100 and factor of 1.2, the fallback would be 120. This allows proportional adjustments based on the minimum threshold.
Examples
Example 1: Predicting Delivery Times in E-commerce
Scenario: An online retailer wants to predict delivery times for new orders based on historical delivery patterns, considering customer location and shipping method to set accurate customer expectations.
Settings:
- New Attribute Name: Predicted_Delivery_Days
- Algorithm Name: Regional_Shipping_Model
- Input Attribute Names: Customer_Region, Shipping_Method
- Dependent Attribute Name: Actual_Delivery_Days
- Min Value Attribute Name: Current_Days_In_Transit
- Filter: Order_Date > 30 days ago
- Aggregate Function: Average
- Min Cases: 5
- Max Cases: 20
- Min Value Constant: 1
- Min Value Factor: 1.1
Output: Creates three new case attributes:
- Predicted_Delivery_Days: The estimated number of days for delivery (e.g., 5.3 days)
- Predicted_Delivery_Days - Confidence: Confidence score between 0 and 1 (e.g., 0.75)
- Predicted_Delivery_Days - Algorithm: Algorithm used ("Regional_Shipping_Model" or "Fixed" for fallback)
For a new order from Region_West using Express_Shipping, the enrichment finds 15 similar historical orders averaging 3.2 days, resulting in prediction of 3.2 days with 0.75 confidence.
Insights: The prediction helps set realistic delivery expectations, identify orders likely to be delayed, and optimize shipping method selection based on predicted versus promised delivery times.
Example 2: Forecasting Invoice Amounts in Procurement
Scenario: A procurement department needs to predict final invoice amounts based on initial purchase requisition details to improve budget planning and identify potential cost overruns early.
Settings:
- New Attribute Name: Predicted_Invoice_Amount
- Input Attribute Names: Vendor_Name, Material_Category
- Dependent Attribute Name: Final_Invoice_Amount
- Min Value Attribute Name: Initial_PO_Amount
- Aggregate Function: Median
- Min Cases: 3
- Max Cases: 15
- Min Value Constant: 0
- Min Value Factor: 1.05
Output: Creates prediction attributes showing estimated final invoice amount. For a new purchase order of $10,000 from Vendor_A for Raw_Materials:
- Predicted_Invoice_Amount: $10,750 (based on historical median of 7.5% above PO amount)
- Confidence: 0.6 (using 9 historical cases)
- Algorithm: Median-based prediction
Insights: Enables proactive budget management, early identification of vendors with consistent overages, and improved accuracy in financial planning.
Example 3: Estimating Manufacturing Quality Scores
Scenario: A manufacturing plant wants to predict quality scores for products currently in production based on early process parameters, enabling early intervention for potential quality issues.
Settings:
- New Attribute Name: Predicted_Quality_Score
- Input Attribute Names: Production_Line, Product_Type, Shift
- Dependent Attribute Name: Final_Quality_Score
- Filter: Production_Date > 60 days ago AND Quality_Score IS NOT NULL
- Aggregate Function: Average
- Min Cases: 10
- Max Cases: 30
Output: For products currently in production on Line_A making Product_Type_X during Day_Shift:
- Predicted_Quality_Score: 92.5 (scale 0-100)
- Confidence: 0.87 (based on 26 similar historical cases)
- Algorithm: Standard prediction
Insights: Allows quality teams to focus inspection efforts on products with low predicted scores, adjust process parameters proactively, and reduce quality-related rework costs.
Example 4: Predicting Patient Length of Stay in Healthcare
Scenario: A hospital wants to predict patient length of stay based on admission diagnosis and initial assessment data to optimize bed management and resource allocation.
Settings:
- New Attribute Name: Predicted_LOS_Days
- Input Attribute Names: Admission_Diagnosis, Patient_Age_Group, Admission_Type
- Dependent Attribute Name: Actual_LOS_Days
- Min Value Attribute Name: Current_LOS_Days
- Aggregate Function: Median
- Min Cases: 8
- Max Cases: 25
- Min Value Constant: 1
- Min Value Factor: 1.0
Output: For a newly admitted elderly patient with pneumonia through emergency admission currently on day 2:
- Predicted_LOS_Days: 7 days (median of similar cases)
- Confidence: 0.72
- Algorithm: Used if less than minimum historical cases, would show "Fixed" with current LOS + 1 day
Insights: Enables better bed capacity planning, helps identify patients likely to have extended stays requiring additional support, and improves discharge planning processes.
Example 5: Forecasting Project Costs in Construction
Scenario: A construction company needs to predict final project costs based on initial project characteristics to improve bidding accuracy and identify projects at risk of cost overruns.
Settings:
- New Attribute Name: Predicted_Total_Cost
- Input Attribute Names: Project_Type, Client_Industry, Project_Region
- Dependent Attribute Name: Final_Project_Cost
- Min Value Attribute Name: Current_Spent_Amount
- Filter: Project_Start_Date > 365 days ago
- Aggregate Function: Average
- Min Cases: 4
- Max Cases: 12
- Min Value Constant: 50000
- Min Value Factor: 1.15
Output: For a new commercial building project in Region_North for a retail client with $2M already spent:
- Predicted_Total_Cost: $3,500,000 (based on 8 similar historical projects)
- Confidence: 0.67
- Algorithm: Shows calculation method used
If historical data is insufficient, uses fallback: $2,000,000 × 1.15 + $50,000 = $2,350,000
Insights: Improves project profitability through accurate cost prediction, enables early intervention for projects trending over budget, and supports more competitive and realistic bidding strategies.
Output
The Predict Value enrichment creates three related case attributes that work together to provide comprehensive prediction information:
Primary Prediction Attribute: Named according to your "New Attribute Name" setting, this attribute contains the predicted numeric value. The data type is always Double (decimal number) to accommodate precise predictions. Values are calculated based on historical patterns or minimum value constraints when applicable.
Confidence Score Attribute: Automatically created with the name format "[New Attribute Name] - Confidence", this attribute contains a confidence score between 0 and 1 indicating prediction reliability. Higher values indicate more historical cases were available for prediction. The confidence is calculated as: (number of cases used) / (maximum cases + 1).
Algorithm Tracking Attribute: Automatically created with the name format "[New Attribute Name] - Algorithm", this string attribute records which method was used for each prediction. It will contain either your custom algorithm name (if specified) for standard predictions, or "Fixed" when fallback calculations based on minimum values were used.
These attributes integrate seamlessly with other mindzieStudio features - use them in filters to identify high-confidence predictions, in calculators to compare predicted versus actual values, or in visualizations to analyze prediction accuracy patterns.
See Also
- Categorize Attribute Values - Group continuous predicted values into categories
- Representative Case Attribute - Alternative approach using most common values
- Python - Create custom prediction models with machine learning
- Add - Combine multiple predictions or add adjustments
- Multiply - Apply scaling factors to predictions
This documentation is part of the mindzie Studio process mining platform.