Decision Tree

Overview

The Decision Tree calculator uses AI-driven statistical analysis to identify root causes of specific process behaviors. This powerful calculator compares cases with a target outcome against all cases to discover which attribute values most strongly correlate with that outcome. It automatically calculates risk ratios, likelihood scores, and fraction explained metrics to rank potential root causes by their explanatory power.

This is an AI-powered calculator that requires defining an outcome through filters, then automatically analyzes your data to discover what drives that behavior.

Common Uses

  • Identify factors contributing to late payments
  • Understand root causes of rework and repeated activities
  • Discover what leads to case escalations
  • Analyze patterns that cause compliance violations
  • Find out what drives extended case durations
  • Investigate quality issues and their contributing factors
  • Understand customer complaint patterns

Settings

Outcome Definition

Number of Filters: The number of pre-existing filters that define your target outcome. When set to 0, use the Filter List below to define the outcome.

Filter List: When Number of Filters is 0, define filters that select cases exhibiting the behavior you want to analyze. For example, create a filter for "Cases with Rework" or "Cases taking longer than 30 days".

Input Configuration

Input Column Names: Manually specify which case attributes to analyze as potential root causes.

Auto Input: When enabled, automatically selects appropriate columns for analysis based on data types and cardinality.

Analysis Thresholds

Minimum Percent: The minimum fraction of cases that must have an attribute value for it to be considered (default: 0.1% of cases).

Minimum Case Count: The minimum number of cases required for an attribute value to be considered (default: 3 cases).

Likelihood Increase Threshold: The minimum risk ratio required for a root cause to be reported (default: 1.01, meaning 1% increased likelihood).

Percent Explained Threshold: The minimum fraction of outcome cases that must have the attribute value (default: 1%).

Maximum Root Causes: The maximum number of root causes to return (default: 20).

Example

Finding Causes of Payment Delays

Scenario: You want to understand why some invoices are paid late while others are paid on time.

Setup:

  1. Create a filter defining "late payments" (e.g., Payment Date > Due Date)
  2. Set the Decision Tree calculator to use this filter as the outcome
  3. Select attributes to analyze: Vendor, Department, Invoice Amount, Payment Terms
  4. Run the analysis

Output:

The calculator generates results showing:

Attribute Value Cases with Value Outcome Likelihood Risk Ratio Fraction Explained
Vendor Category International 15% of all cases 45% are late 3.2x 35% of late payments
Invoice Amount > $50,000 8% of all cases 38% are late 2.7x 18% of late payments
Department Procurement B 12% of all cases 32% are late 2.3x 22% of late payments

Interpretation:

  • International vendors are 3.2x more likely to have late payments than the baseline
  • 35% of all late payments involve international vendors
  • High-value invoices and a specific department also show elevated risk

Insights: The analysis reveals that international vendors, especially for high-value invoices, need different payment processes. The decision tree helps prioritize which process improvements will have the biggest impact.

Understanding the Metrics

Risk Ratio (Likelihood Increase)

The Risk Ratio compares the probability of the outcome when an attribute value is present versus absent:

Risk Ratio = P(Outcome | Value Present) / P(Outcome | Value Absent)
  • Risk Ratio = 1.0: The attribute value has no effect
  • Risk Ratio = 2.0: Cases with this value are 2x more likely to have the outcome
  • Risk Ratio = 0.5: Cases with this value are 50% less likely to have the outcome

Fraction Explained

The Fraction Explained shows what percentage of outcome cases have the attribute value:

Fraction Explained = (Cases with Outcome AND Value) / (Total Cases with Outcome)

This helps prioritize: a root cause with high risk ratio but low fraction explained only affects a small portion of your problem cases.

Priority Ranking

Root causes are ranked as High, Medium, or Low priority based on a combination of:

  • Likelihood increase (risk ratio)
  • Fraction of outcome explained
  • Statistical significance (case volume)

Display Modes

Sentence View

Displays human-readable explanations of each root cause:

"Cases where Vendor Category = International are 3.2 times more likely to be late payments. This attribute explains 35% of all late payments."

Statistics Grid

Shows all calculated metrics in a sortable table for detailed analysis.

Outcome Likelihood View

Focuses on attribute values with the highest risk ratios - what most dramatically increases the chance of the outcome.

Outcome Value View

Focuses on attribute values that affect the most cases - where improvements would have the broadest impact.

How It Works

  1. Frequency Calculation: Counts occurrences of each attribute value across all cases and outcome cases
  2. Likelihood Comparison: For each value, calculates the outcome probability when present vs absent
  3. Risk Ratio: Computes the ratio of these probabilities
  4. Fraction Explained: Calculates what portion of outcome cases have each value
  5. Threshold Filtering: Removes results below configured thresholds
  6. Ranking: Sorts by explanatory power (combination of risk ratio and fraction explained)

Best Practices

Defining Good Outcomes

  • Be specific: "Payments more than 7 days late" is better than "Late payments"
  • Ensure sufficient case volume: Need enough outcome cases for statistical validity
  • Test different definitions to see if root causes change

Selecting Input Columns

  • Include categorical attributes (vendor, department, status)
  • Include discretized numeric attributes (amount ranges, duration categories)
  • Exclude columns with too many unique values (IDs, free text)
  • Start with auto-selection, then refine based on results

Interpreting Results

  • Look for high risk ratio AND high fraction explained together
  • Consider business context: is the identified root cause actionable?
  • Validate findings with process experts before acting
  • Use drill-down to examine underlying cases

Output

The calculator provides multiple output formats:

  • Decision Tree Table: All root causes ranked by explanatory power
  • Outcome Likelihood Table: Highest risk ratio values with case drill-down
  • Outcome Value Table: Most impactful values by case count with drill-down
  • Chat Text: Human-readable summary for presentations

Interactive features:

  • Click on rows to see underlying cases
  • Sort by different metrics
  • Export findings for further analysis
  • Create recommendations from identified root causes

This documentation is part of the mindzie Studio process mining platform.