Data Selector

Overview

The Data Selector calculator is a data post-processing tool that selects specific columns from another calculator's output and optionally sorts and limits the results. This calculator is essential for creating focused data views by choosing relevant columns, ordering the data, and displaying only the top N rows.

Unlike most calculators that analyze process data directly, Data Selector works with the output tables from other calculators, making it ideal for refining analysis results for dashboards, reports, and exports.

Common Uses

  • Prepare specific data subsets for email delivery or export to stakeholders
  • Create simplified dashboard views showing only key metrics from complex analysis
  • Select and sort top N results from large analysis outputs (e.g., top 10 slowest cases)
  • Focus reports on relevant columns by removing unnecessary detail
  • Transform comprehensive analysis results into executive-friendly summaries
  • Create data pipelines by chaining multiple calculators and selecting specific outputs at each stage

Settings

Source Calculator: Select the calculator block whose output you want to work with. This calculator must have already been executed in the current notebook.

Source Table: Choose which table to use if the source calculator produces multiple result tables. Most calculators produce a single table (index 0), but some calculators return multiple tables with different types of information.

Columns to Include: Select which columns from the source table should appear in the output. You can select multiple columns, and they will appear in the order you specify. Column names must match exactly as they appear in the source calculator output.

Sort Column: Optionally choose a column to sort the results by. If you don't specify a sort column, the data will maintain the same order as the source calculator output.

Sort Direction: When sorting is enabled, choose whether to sort in:

  • Ascending order: Lowest to highest (A-Z, 0-9, oldest to newest)
  • Descending order: Highest to lowest (Z-A, 9-0, newest to oldest)

Maximum Rows: Specify the maximum number of rows to include in the output. Set to 0 or leave blank for no limit. When combined with sorting, this allows you to select "top N" results (e.g., top 20 slowest cases when sorted by duration descending).

Examples

Example 1: Top 10 Slowest Purchase Orders for Executive Report

Scenario: Your Case Duration calculator has analyzed 2,500 purchase orders, but you want to create an executive dashboard showing only the 10 slowest cases for immediate attention.

Settings:

  • Source Calculator: "Purchase Order Duration Analysis"
  • Source Table: 0 (primary results table)
  • Columns to Include: ["Case ID", "Supplier Name", "Duration", "Total Value"]
  • Sort Column: Duration
  • Sort Direction: Descending
  • Maximum Rows: 10

Output:

The calculator displays a focused table with exactly 4 columns and 10 rows:

Case ID Supplier Name Duration Total Value
PO-2024-8821 Acme Manufacturing 47.3 days $125,400
PO-2024-9156 Global Supplies Inc 42.8 days $89,200
PO-2024-7633 TechParts Ltd 38.5 days $156,800
... ... ... ...

Insights: By selecting only the essential columns and limiting to 10 rows, you've created an actionable dashboard that highlights problematic cases without overwhelming executives with 2,500 rows of data. The sorting by duration ensures the most urgent cases appear first. The inclusion of Total Value shows the financial impact of these delays.

Example 2: Weekly Activity Summary for Email Distribution

Scenario: You run a weekly activity frequency analysis that generates detailed statistics for 45 different activities. You want to email the process owner just the top 15 most frequent activities with simplified metrics.

Settings:

  • Source Calculator: "Weekly Activity Frequency Report"
  • Source Table: 0
  • Columns to Include: ["Activity Name", "Event Count", "Percentage of Total Events"]
  • Sort Column: Event Count
  • Sort Direction: Descending
  • Maximum Rows: 15

Output:

A clean, focused table perfect for email:

Activity Name Event Count Percentage of Total Events
Create Purchase Requisition 1,847 18.2%
Manager Approval 1,823 17.9%
Vendor Selection 1,792 17.6%
... ... ...

Insights: This simplified view removes columns like "First Occurrence" and "Last Occurrence" that clutter the email, while keeping the essential metrics that show which activities dominate the process. The recipient immediately sees that the top 3 activities account for over half of all process events.

Example 3: Customer Analysis Dashboard Simplification

Scenario: Your Breakdown by Categories calculator analyzed customers across 12 different metrics, but your dashboard widget only has space to show 5 columns for the top 20 customers.

Settings:

  • Source Calculator: "Customer Performance Analysis"
  • Source Table: 0
  • Columns to Include: ["Customer Name", "Case Count", "Average Duration", "Total Revenue", "On-Time Percentage"]
  • Sort Column: Total Revenue
  • Sort Direction: Descending
  • Maximum Rows: 20

Output:

Dashboard-ready table with focused metrics:

Customer Name Case Count Average Duration Total Revenue On-Time Percentage
MegaCorp Industries 487 8.2 days $4,850,000 92%
TechStart Solutions 356 7.5 days $3,240,000 95%
Global Systems Inc 298 9.1 days $2,870,000 88%
... ... ... ... ...

Insights: You've transformed a comprehensive 12-column analysis into a dashboard-friendly 5-column view showing exactly what stakeholders need to know: which customers generate the most revenue, how many orders they place, how long processing takes, and their delivery performance. Sorting by revenue ensures the most important customers are visible at a glance.

Example 4: Variant Analysis - Top Variants by Frequency

Scenario: Your variant analysis identified 284 unique process variants. You want to focus your improvement efforts on the top 25 most common variants, which typically represent 80% of your case volume.

Settings:

  • Source Calculator: "Process Variant Analysis"
  • Source Table: 0
  • Columns to Include: ["Variant ID", "Frequency", "Cumulative Percentage", "Average Duration", "Contains Rework"]
  • Sort Column: Frequency
  • Sort Direction: Descending
  • Maximum Rows: 25

Output:

Variant ID Frequency Cumulative Percentage Average Duration Contains Rework
VAR-001 1,245 24.8% 6.2 days No
VAR-002 876 42.2% 8.5 days Yes
VAR-003 623 54.6% 5.8 days No
... ... ... ... ...

Insights: The top 25 variants represent the core of your process, and the cumulative percentage column shows that focusing on these variants covers the majority of cases. The "Contains Rework" column immediately flags which common variants include inefficient rework steps, helping prioritize improvement opportunities.

Scenario: Your rate-over-time calculator generated daily statistics for 90 days, but you want to display just the key metrics in chronological order without any row limits for a complete trend analysis.

Settings:

  • Source Calculator: "90-Day Completion Rate Analysis"
  • Source Table: 0
  • Columns to Include: ["Date", "Cases Completed", "Completion Rate"]
  • Sort Column: Date
  • Sort Direction: Ascending
  • Maximum Rows: 0 (no limit)

Output:

All 90 rows displayed in chronological order:

Date Cases Completed Completion Rate
2024-10-01 23 87.4%
2024-10-02 28 91.2%
2024-10-03 31 89.7%
... ... ...

Insights: By sorting by date ascending and not limiting rows, you maintain the complete time series for charting or export. You've simplified the output by removing statistical columns (like "Standard Deviation" and "Min/Max") that aren't needed for basic trend visualization, making the data cleaner for graphing tools.

Example 6: Multi-Table Source Selection

Scenario: Your conformance checker returns two tables: table 0 contains summary statistics, and table 1 contains detailed violation listings. You want to create a report from the detailed violations table.

Settings:

  • Source Calculator: "Standard Process Conformance Check"
  • Source Table: 1 (detail table, not summary)
  • Columns to Include: ["Case ID", "Violation Type", "Activity Name", "Timestamp"]
  • Sort Column: Violation Type
  • Sort Direction: Ascending
  • Maximum Rows: 100

Output:

Case ID Violation Type Activity Name Timestamp
CS-1234 Missing Required Step Invoice Approval 2024-11-15 14:22
CS-5678 Missing Required Step Purchase Approval 2024-11-16 09:15
CS-9012 Out of Sequence Goods Receipt 2024-11-16 11:45
... ... ... ...

Insights: By selecting table 1 instead of the default table 0, you access the detailed violation data rather than just summary counts. Sorting by violation type groups similar problems together, making it easier to identify patterns. The 100-row limit ensures the report remains manageable while covering the most important violations.

Output

The Data Selector calculator displays a table with the exact columns you specified, in the order you selected them. The table structure is dynamic and depends on your column selections.

Output Characteristics

Column Structure: Only the columns you selected from "Columns to Include" appear in the output. Column names, data types, and formatting are preserved from the source calculator.

Row Count: Determined by the Maximum Rows setting:

  • If Maximum Rows = 0 or blank: All rows from the source table
  • If Maximum Rows > 0: Up to that many rows (may be fewer if source has fewer rows)

Row Order: Determined by sorting settings:

  • If no sort column specified: Maintains the same order as the source calculator
  • If sort column specified: Rows are ordered according to the sort column and direction

Interactive Features

Click on rows: In many cases, clicking on a row will drill down to show the underlying cases or details, just as you could in the source calculator.

Export capabilities: The refined output can be exported to Excel or CSV files, making it ideal for sharing with stakeholders who don't have access to the mindzie platform.

Email integration: This calculator's output is commonly used with automated email delivery to send focused data subsets to process owners and executives on a scheduled basis.

Dashboard widgets: The simplified, focused output is perfect for embedding in dashboard widgets where space is limited.

Usage Tips

  • Always ensure the source calculator has executed successfully before running Data Selector
  • Use the preview feature in the calculator configuration to see available columns from your source
  • Column names are case-sensitive - they must match exactly as they appear in the source
  • When combining sorting with row limits, sorting is applied first, then the row limit (enabling "top N" selections)
  • If the source calculator has no results or an error, Data Selector will produce an empty table
  • Multiple Data Selector calculators can be used in sequence to progressively refine data

Common Patterns

Dashboard Pattern: Complex calculator -> Data Selector (select key columns, top N rows) -> Dashboard widget

Email Pattern: Analysis calculator -> Data Selector (focus on actionable data) -> Automated email delivery

Export Pattern: Comprehensive analysis -> Data Selector (simplify for external stakeholders) -> Excel export

Pipeline Pattern: Calculator A -> Data Selector 1 (refine) -> Calculator B (further analysis) -> Data Selector 2 (final output)

The Data Selector is particularly valuable when you need to present analysis results to stakeholders who need focused, actionable information rather than comprehensive analytical detail. It bridges the gap between detailed process mining analysis and clear, decision-ready reporting.


This documentation is part of the mindzie Studio process mining platform.

An error has occurred. This application may no longer respond until reloaded. Reload ??