Set Group Value

Overview

The Set Group Value enrichment creates powerful data aggregations by calculating summary statistics for groups of cases and assigning those calculated values back to each case in the group. This enrichment transforms your process data by computing aggregate metrics like sums, averages, counts, or other statistical functions across cases that share common attribute values, then populates a new attribute with the group's aggregate value for every case in that group. This enables sophisticated group-based analysis where each case carries information about its peer group's collective characteristics.

This enrichment is essential for comparative analysis and benchmarking in process mining. It allows you to enrich individual cases with contextual information about their group's overall performance, enabling insights like "this order's value compared to the average for its product category" or "this patient's treatment duration relative to others with the same diagnosis." By bringing group-level metrics to the case level, you can identify outliers, establish baselines, and understand how individual process instances relate to their peer groups. The enrichment supports various aggregation functions and can work with filtered subsets of data, providing flexibility in defining what constitutes a meaningful group for analysis.

Common Uses

Calculate average processing time per department and assign it to all cases in each department
Determine total order value by customer and populate each order with the customer's total spend
Count the number of cases per vendor and add this count to each case for vendor volume analysis
Find the maximum or minimum values within product categories for pricing analysis
Compute median treatment duration by diagnosis group for healthcare benchmarking
Calculate sum of quantities per warehouse location for inventory distribution insights
Determine average approval time by region for geographic performance comparison

Settings

Filter (Optional): Apply filters to limit which cases are included in the group calculations. Only cases matching the filter criteria will be considered when computing aggregate values. This allows you to calculate group statistics on specific subsets, such as completed cases only, high-priority items, or transactions within a certain time period. Cases excluded by the filter will not receive the new attribute value.

New Attribute Name: Specify the name for the new case attribute that will store the calculated group value. Choose a descriptive name that indicates both the grouping logic and the aggregate function applied. For example, "Avg_Duration_By_Department" or "Total_Orders_Per_Customer". The name must be unique and cannot conflict with existing attributes in your dataset.

Group by column name: Select the attribute used to define groups. Cases with the same value in this attribute will be grouped together for the aggregate calculation. This can be any categorical attribute like department, vendor, product category, customer ID, or region. The grouping attribute determines how your data is segmented for the aggregation. Each unique value in this column creates a separate group.

Value column name: Choose the attribute whose values will be aggregated within each group. This is the source data for your calculation - for example, if calculating average duration by department, this would be your duration attribute. The available aggregation functions will adjust based on the data type of this column. Numeric columns support mathematical operations, while text and date columns have limited aggregation options.

Aggregate Function: Select the statistical function to apply to the values within each group. The available functions depend on the data type of your value column:

Sum: Total all values in the group (numeric and duration attributes only)
Average: Calculate the arithmetic mean of group values (numeric and duration attributes)
Median: Find the middle value when group values are sorted (numeric and duration attributes)
Min: Identify the smallest value in the group (works with numbers, dates, and durations)
Max: Identify the largest value in the group (works with numbers, dates, and durations)
Count: Count non-null values in the group (all data types)
Distinct Count: Count unique values in the group (all data types)
Null Count: Count missing/null values in the group (all data types)

Examples

Example 1: Average Processing Time by Department

Scenario: In a loan approval process, management wants to understand the average processing time for each department to identify performance variations and set realistic SLA targets.

Settings:

Filter: Status = "Completed"
New Attribute Name: Avg_Processing_Hours_By_Dept
Group by column name: Department
Value column name: Total_Processing_Hours
Aggregate Function: Average

Output: For each loan application, adds "Avg_Processing_Hours_By_Dept" containing the average processing time for all completed loans in that department:

Commercial Banking department average: 72.5 hours (assigned to all 150 cases)
Retail Banking department average: 24.3 hours (assigned to all 890 cases)
Private Banking department average: 48.7 hours (assigned to all 75 cases)

Now each case shows both its individual processing time and its department's average, enabling immediate comparison.

Insights: Loan officers can quickly identify if a particular application is taking longer than the department average, and management can see that Commercial Banking has the longest average processing time, suggesting a need for process optimization or additional resources.

Example 2: Total Customer Order Value

Scenario: An e-commerce company needs to identify high-value customers by calculating each customer's total order value across all their purchases and adding this information to each order.

Settings:

Filter: Order_Status NOT IN ("Cancelled", "Returned")
New Attribute Name: Customer_Total_Spend
Group by column name: Customer_ID
Value column name: Order_Amount
Aggregate Function: Sum

Output: Each order now includes the customer's total historical spend:

Customer_ID "C10234": Total spend $15,750 (assigned to all 23 orders)
Customer_ID "C10891": Total spend $3,200 (assigned to all 8 orders)
Customer_ID "C11567": Total spend $45,900 (assigned to all 67 orders)

Insights: Sales teams can immediately see when processing an order from a high-value customer, enabling prioritized service. Marketing can identify VIP customers for special promotions based on total spend thresholds.

Example 3: Case Count by Vendor for Workload Analysis

Scenario: A procurement department wants to understand vendor workload distribution by counting how many purchase orders each vendor handles, adding this count to every PO for context.

Settings:

Filter: PO_Date >= "2024-01-01"
New Attribute Name: Vendor_PO_Count
Group by column name: Vendor_Name
Value column name: Case_ID
Aggregate Function: Count

Output: Every purchase order shows how many total POs that vendor has received:

Vendor "TechSupplies Inc": 145 POs (count added to each of their POs)
Vendor "Office Essentials": 892 POs (count added to each of their POs)
Vendor "Industrial Parts Co": 43 POs (count added to each of their POs)

Insights: Procurement can identify over-reliance on specific vendors (Office Essentials handling 892 POs suggests high dependency) and underutilized vendors who might handle more volume.

Example 4: Maximum Treatment Cost by Diagnosis

Scenario: A hospital wants to identify the highest treatment cost within each diagnosis group to understand cost variations and identify expensive outlier cases.

Settings:

Filter: Treatment_Complete = "Yes" AND Billing_Finalized = "Yes"
New Attribute Name: Max_Cost_In_Diagnosis_Group
Group by column name: Primary_Diagnosis_Code
Value column name: Total_Treatment_Cost
Aggregate Function: Max

Output: Each patient case includes the maximum cost observed for their diagnosis:

Diagnosis "J18.9" (Pneumonia): Max cost $45,000 (all 234 cases show this max)
Diagnosis "I21.9" (Heart Attack): Max cost $125,000 (all 89 cases show this max)
Diagnosis "K35.8" (Appendicitis): Max cost $32,000 (all 156 cases show this max)

Patients can immediately see if their treatment cost is approaching or exceeding the maximum for their diagnosis group.

Insights: Healthcare administrators can identify cases where costs significantly approach the maximum, potentially indicating complications or inefficiencies requiring investigation.

Example 5: Median Resolution Time by Priority Level

Scenario: An IT service desk wants to establish baseline resolution times by calculating the median time to resolve tickets at each priority level.

Settings:

Filter: Ticket_Status = "Resolved" AND Created_Date >= DateAdd(Today(), -90, "days")
New Attribute Name: Median_Resolution_Hours_By_Priority
Group by column name: Priority_Level
Value column name: Resolution_Duration_Hours
Aggregate Function: Median

Output: Each ticket shows the median resolution time for its priority level:

Priority 1 (Critical): Median 2.5 hours (assigned to 145 tickets)
Priority 2 (High): Median 8.0 hours (assigned to 512 tickets)
Priority 3 (Medium): Median 24.0 hours (assigned to 1,234 tickets)
Priority 4 (Low): Median 72.0 hours (assigned to 2,891 tickets)

Insights: Service desk managers can immediately identify tickets that exceed the median resolution time for their priority level, indicating potential SLA violations or process issues requiring attention.

Output

The Set Group Value enrichment creates a new case attribute containing the calculated aggregate value for each case's group. Every case within the same group receives the identical calculated value, enabling group-level comparisons and analysis at the individual case level.

Data Type Determination: The output attribute's data type depends on both the selected aggregate function and the source column type:

Count functions (Count, Distinct Count, Null Count) always produce integer values
Sum, Average, and Median preserve the source column type (numeric values remain numeric, durations remain durations)
Min and Max maintain the exact data type of the source column
When working with TimeSpan columns, Sum, Average, and Median operations return TimeSpan values

Group Calculation Process: The enrichment first identifies all unique values in the grouping column, then calculates the aggregate function separately for each group using only the cases belonging to that group (and matching any applied filters). Finally, it assigns the calculated value back to every case in the corresponding group.

Null Value Handling: If the grouping column contains null values, cases with null form their own group. For the value column, null handling depends on the aggregate function - Count excludes nulls, Null Count specifically counts them, and Sum/Average/Median skip null values in calculations. Cases filtered out or with null grouping values may not receive the new attribute.

Integration Capabilities: The new group value attribute integrates seamlessly with other mindzieStudio features. Use it in filters to identify cases above or below group averages, in calculators to derive additional metrics like "percentage of group total," in process maps to color-code based on group statistics, or in further enrichments to create multi-level aggregations. The attribute is immediately available in all analysis tools and can be exported with your enriched dataset.

Set Group Value

Overview

Common Uses

Settings

Examples

Example 1: Average Processing Time by Department

Example 2: Total Customer Order Value

Example 3: Case Count by Vendor for Workload Analysis

Example 4: Maximum Treatment Cost by Diagnosis

Example 5: Median Resolution Time by Priority Level

Output

See Also