Decision Tree
Overview
The Decision Tree calculator uses AI-driven statistical analysis to identify root causes of specific process behaviors. This powerful calculator compares cases with a target outcome against all cases to discover which attribute values most strongly correlate with that outcome. It automatically calculates risk ratios, likelihood scores, and fraction explained metrics to rank potential root causes by their explanatory power.
This is an AI-powered calculator that requires defining an outcome through filters, then automatically analyzes your data to discover what drives that behavior.
Common Uses
- Identify factors contributing to late payments
- Understand root causes of rework and repeated activities
- Discover what leads to case escalations
- Analyze patterns that cause compliance violations
- Find out what drives extended case durations
- Investigate quality issues and their contributing factors
- Understand customer complaint patterns
Settings
Outcome Definition
Number of Filters: The number of pre-existing filters that define your target outcome. When set to 0, use the Filter List below to define the outcome.
Filter List: When Number of Filters is 0, define filters that select cases exhibiting the behavior you want to analyze. For example, create a filter for "Cases with Rework" or "Cases taking longer than 30 days".
Input Configuration
Input Column Names: Manually specify which case attributes to analyze as potential root causes.
Auto Input: When enabled, automatically selects appropriate columns for analysis based on data types and cardinality.
Analysis Thresholds
Minimum Percent: The minimum fraction of cases that must have an attribute value for it to be considered (default: 0.1% of cases).
Minimum Case Count: The minimum number of cases required for an attribute value to be considered (default: 3 cases).
Likelihood Increase Threshold: The minimum risk ratio required for a root cause to be reported (default: 1.01, meaning 1% increased likelihood).
Percent Explained Threshold: The minimum fraction of outcome cases that must have the attribute value (default: 1%).
Maximum Root Causes: The maximum number of root causes to return (default: 20).
Example
Finding Causes of Payment Delays
Scenario: You want to understand why some invoices are paid late while others are paid on time.
Setup:
- Create a filter defining "late payments" (e.g., Payment Date > Due Date)
- Set the Decision Tree calculator to use this filter as the outcome
- Select attributes to analyze: Vendor, Department, Invoice Amount, Payment Terms
- Run the analysis
Output:
The calculator generates results showing:
| Attribute | Value | Cases with Value | Outcome Likelihood | Risk Ratio | Fraction Explained |
|---|---|---|---|---|---|
| Vendor Category | International | 15% of all cases | 45% are late | 3.2x | 35% of late payments |
| Invoice Amount | > $50,000 | 8% of all cases | 38% are late | 2.7x | 18% of late payments |
| Department | Procurement B | 12% of all cases | 32% are late | 2.3x | 22% of late payments |
Interpretation:
- International vendors are 3.2x more likely to have late payments than the baseline
- 35% of all late payments involve international vendors
- High-value invoices and a specific department also show elevated risk
Insights: The analysis reveals that international vendors, especially for high-value invoices, need different payment processes. The decision tree helps prioritize which process improvements will have the biggest impact.
Understanding the Metrics
Risk Ratio (Likelihood Increase)
The Risk Ratio compares the probability of the outcome when an attribute value is present versus absent:
Risk Ratio = P(Outcome | Value Present) / P(Outcome | Value Absent)
- Risk Ratio = 1.0: The attribute value has no effect
- Risk Ratio = 2.0: Cases with this value are 2x more likely to have the outcome
- Risk Ratio = 0.5: Cases with this value are 50% less likely to have the outcome
Fraction Explained
The Fraction Explained shows what percentage of outcome cases have the attribute value:
Fraction Explained = (Cases with Outcome AND Value) / (Total Cases with Outcome)
This helps prioritize: a root cause with high risk ratio but low fraction explained only affects a small portion of your problem cases.
Priority Ranking
Root causes are ranked as High, Medium, or Low priority based on a combination of:
- Likelihood increase (risk ratio)
- Fraction of outcome explained
- Statistical significance (case volume)
Display Modes
Sentence View
Displays human-readable explanations of each root cause:
"Cases where Vendor Category = International are 3.2 times more likely to be late payments. This attribute explains 35% of all late payments."
Statistics Grid
Shows all calculated metrics in a sortable table for detailed analysis.
Outcome Likelihood View
Focuses on attribute values with the highest risk ratios - what most dramatically increases the chance of the outcome.
Outcome Value View
Focuses on attribute values that affect the most cases - where improvements would have the broadest impact.
How It Works
- Frequency Calculation: Counts occurrences of each attribute value across all cases and outcome cases
- Likelihood Comparison: For each value, calculates the outcome probability when present vs absent
- Risk Ratio: Computes the ratio of these probabilities
- Fraction Explained: Calculates what portion of outcome cases have each value
- Threshold Filtering: Removes results below configured thresholds
- Ranking: Sorts by explanatory power (combination of risk ratio and fraction explained)
Best Practices
Defining Good Outcomes
- Be specific: "Payments more than 7 days late" is better than "Late payments"
- Ensure sufficient case volume: Need enough outcome cases for statistical validity
- Test different definitions to see if root causes change
Selecting Input Columns
- Include categorical attributes (vendor, department, status)
- Include discretized numeric attributes (amount ranges, duration categories)
- Exclude columns with too many unique values (IDs, free text)
- Start with auto-selection, then refine based on results
Interpreting Results
- Look for high risk ratio AND high fraction explained together
- Consider business context: is the identified root cause actionable?
- Validate findings with process experts before acting
- Use drill-down to examine underlying cases
Output
The calculator provides multiple output formats:
- Decision Tree Table: All root causes ranked by explanatory power
- Outcome Likelihood Table: Highest risk ratio values with case drill-down
- Outcome Value Table: Most impactful values by case count with drill-down
- Chat Text: Human-readable summary for presentations
Interactive features:
- Click on rows to see underlying cases
- Sort by different metrics
- Export findings for further analysis
- Create recommendations from identified root causes
This documentation is part of the mindzie Studio process mining platform.