data selector

Overview

The Data Selector calculator is a data post-processing tool that selects specific columns from another calculator's output and optionally sorts and limits the results. This calculator is essential for creating focused data views by choosing relevant columns, ordering the data, and displaying only the top N rows.

Unlike most calculators that analyze process data directly, Data Selector works with the output tables from other calculators, making it ideal for refining analysis results for dashboards, reports, and exports.

Common Uses

Prepare specific data subsets for email delivery or export to stakeholders
Create simplified dashboard views showing only key metrics from complex analysis
Select and sort top N results from large analysis outputs (e.g., top 10 slowest cases)
Focus reports on relevant columns by removing unnecessary detail
Transform comprehensive analysis results into executive-friendly summaries
Create data pipelines by chaining multiple calculators and selecting specific outputs at each stage

Settings

Source Calculator: Select the calculator block whose output you want to work with. This calculator must have already been executed in the current notebook.

Source Table: Choose which table to use if the source calculator produces multiple result tables. Most calculators produce a single table (index 0), but some calculators return multiple tables with different types of information.

Columns to Include: Select which columns from the source table should appear in the output. You can select multiple columns, and they will appear in the order you specify. Column names must match exactly as they appear in the source calculator output.

Sort Column: Optionally choose a column to sort the results by. If you don't specify a sort column, the data will maintain the same order as the source calculator output.

Sort Direction: When sorting is enabled, choose whether to sort in:

Ascending order: Lowest to highest (A-Z, 0-9, oldest to newest)
Descending order: Highest to lowest (Z-A, 9-0, newest to oldest)

Maximum Rows: Specify the maximum number of rows to include in the output. Set to 0 or leave blank for no limit. When combined with sorting, this allows you to select "top N" results (e.g., top 20 slowest cases when sorted by duration descending).

Examples

Example 1: Top 10 Slowest Purchase Orders for Executive Report

Scenario: Your Case Duration calculator has analyzed 2,500 purchase orders, but you want to create an executive dashboard showing only the 10 slowest cases for immediate attention.

Settings:

Source Calculator: "Purchase Order Duration Analysis"
Source Table: 0 (primary results table)
Columns to Include: ["Case ID", "Supplier Name", "Duration", "Total Value"]
Sort Column: Duration
Sort Direction: Descending
Maximum Rows: 10

Output:

The calculator displays a focused table with exactly 4 columns and 10 rows:

Case ID	Supplier Name	Duration	Total Value
PO-2024-8821	Acme Manufacturing	47.3 days	$125,400
PO-2024-9156	Global Supplies Inc	42.8 days	$89,200
PO-2024-7633	TechParts Ltd	38.5 days	$156,800
...	...	...	...

Insights: By selecting only the essential columns and limiting to 10 rows, you've created an actionable dashboard that highlights problematic cases without overwhelming executives with 2,500 rows of data. The sorting by duration ensures the most urgent cases appear first. The inclusion of Total Value shows the financial impact of these delays.

Example 2: Weekly Activity Summary for Email Distribution

Scenario: You run a weekly activity frequency analysis that generates detailed statistics for 45 different activities. You want to email the process owner just the top 15 most frequent activities with simplified metrics.

Settings:

Source Calculator: "Weekly Activity Frequency Report"
Source Table: 0
Columns to Include: ["Activity Name", "Event Count", "Percentage of Total Events"]
Sort Column: Event Count
Sort Direction: Descending
Maximum Rows: 15

Output:

A clean, focused table perfect for email:

Activity Name	Event Count	Percentage of Total Events
Create Purchase Requisition	1,847	18.2%
Manager Approval	1,823	17.9%
Vendor Selection	1,792	17.6%
...	...	...

Insights: This simplified view removes columns like "First Occurrence" and "Last Occurrence" that clutter the email, while keeping the essential metrics that show which activities dominate the process. The recipient immediately sees that the top 3 activities account for over half of all process events.

Example 3: Customer Analysis Dashboard Simplification

Scenario: Your Breakdown by Categories calculator analyzed customers across 12 different metrics, but your dashboard widget only has space to show 5 columns for the top 20 customers.

Settings:

Source Calculator: "Customer Performance Analysis"
Source Table: 0
Columns to Include: ["Customer Name", "Case Count", "Average Duration", "Total Revenue", "On-Time Percentage"]
Sort Column: Total Revenue
Sort Direction: Descending
Maximum Rows: 20

Output:

Dashboard-ready table with focused metrics:

Customer Name	Case Count	Average Duration	Total Revenue	On-Time Percentage
MegaCorp Industries	487	8.2 days	$4,850,000	92%
TechStart Solutions	356	7.5 days	$3,240,000	95%
Global Systems Inc	298	9.1 days	$2,870,000	88%
...	...	...	...	...

Insights: You've transformed a comprehensive 12-column analysis into a dashboard-friendly 5-column view showing exactly what stakeholders need to know: which customers generate the most revenue, how many orders they place, how long processing takes, and their delivery performance. Sorting by revenue ensures the most important customers are visible at a glance.

Example 4: Variant Analysis - Top Variants by Frequency

Scenario: Your variant analysis identified 284 unique process variants. You want to focus your improvement efforts on the top 25 most common variants, which typically represent 80% of your case volume.

Settings:

Source Calculator: "Process Variant Analysis"
Source Table: 0
Columns to Include: ["Variant ID", "Frequency", "Cumulative Percentage", "Average Duration", "Contains Rework"]
Sort Column: Frequency
Sort Direction: Descending
Maximum Rows: 25

Output:

Variant ID	Frequency	Cumulative Percentage	Average Duration	Contains Rework
VAR-001	1,245	24.8%	6.2 days	No
VAR-002	876	42.2%	8.5 days	Yes
VAR-003	623	54.6%	5.8 days	No
...	...	...	...	...

Insights: The top 25 variants represent the core of your process, and the cumulative percentage column shows that focusing on these variants covers the majority of cases. The "Contains Rework" column immediately flags which common variants include inefficient rework steps, helping prioritize improvement opportunities.

Scenario: Your rate-over-time calculator generated daily statistics for 90 days, but you want to display just the key metrics in chronological order without any row limits for a complete trend analysis.

Settings:

Source Calculator: "90-Day Completion Rate Analysis"
Source Table: 0
Columns to Include: ["Date", "Cases Completed", "Completion Rate"]
Sort Column: Date
Sort Direction: Ascending
Maximum Rows: 0 (no limit)

Output:

All 90 rows displayed in chronological order:

Date	Cases Completed	Completion Rate
2024-10-01	23	87.4%
2024-10-02	28	91.2%
2024-10-03	31	89.7%
...	...	...

Insights: By sorting by date ascending and not limiting rows, you maintain the complete time series for charting or export. You've simplified the output by removing statistical columns (like "Standard Deviation" and "Min/Max") that aren't needed for basic trend visualization, making the data cleaner for graphing tools.

Example 6: Multi-Table Source Selection

Scenario: Your conformance checker returns two tables: table 0 contains summary statistics, and table 1 contains detailed violation listings. You want to create a report from the detailed violations table.

Settings:

Source Calculator: "Standard Process Conformance Check"
Source Table: 1 (detail table, not summary)
Columns to Include: ["Case ID", "Violation Type", "Activity Name", "Timestamp"]
Sort Column: Violation Type
Sort Direction: Ascending
Maximum Rows: 100

Output:

Case ID	Violation Type	Activity Name	Timestamp
CS-1234	Missing Required Step	Invoice Approval	2024-11-15 14:22
CS-5678	Missing Required Step	Purchase Approval	2024-11-16 09:15
CS-9012	Out of Sequence	Goods Receipt	2024-11-16 11:45
...	...	...	...

Insights: By selecting table 1 instead of the default table 0, you access the detailed violation data rather than just summary counts. Sorting by violation type groups similar problems together, making it easier to identify patterns. The 100-row limit ensures the report remains manageable while covering the most important violations.

Output

The Data Selector calculator displays a table with the exact columns you specified, in the order you selected them. The table structure is dynamic and depends on your column selections.

Output Characteristics

Column Structure: Only the columns you selected from "Columns to Include" appear in the output. Column names, data types, and formatting are preserved from the source calculator.

Row Count: Determined by the Maximum Rows setting:

If Maximum Rows = 0 or blank: All rows from the source table
If Maximum Rows > 0: Up to that many rows (may be fewer if source has fewer rows)

Row Order: Determined by sorting settings:

If no sort column specified: Maintains the same order as the source calculator
If sort column specified: Rows are ordered according to the sort column and direction

Interactive Features

Click on rows: In many cases, clicking on a row will drill down to show the underlying cases or details, just as you could in the source calculator.

Export capabilities: The refined output can be exported to Excel or CSV files, making it ideal for sharing with stakeholders who don't have access to the mindzie platform.

Email integration: This calculator's output is commonly used with automated email delivery to send focused data subsets to process owners and executives on a scheduled basis.

Dashboard widgets: The simplified, focused output is perfect for embedding in dashboard widgets where space is limited.

Usage Tips

Always ensure the source calculator has executed successfully before running Data Selector
Use the preview feature in the calculator configuration to see available columns from your source
Column names are case-sensitive - they must match exactly as they appear in the source
When combining sorting with row limits, sorting is applied first, then the row limit (enabling "top N" selections)
If the source calculator has no results or an error, Data Selector will produce an empty table
Multiple Data Selector calculators can be used in sequence to progressively refine data

Common Patterns

Dashboard Pattern: Complex calculator -> Data Selector (select key columns, top N rows) -> Dashboard widget

Email Pattern: Analysis calculator -> Data Selector (focus on actionable data) -> Automated email delivery

Export Pattern: Comprehensive analysis -> Data Selector (simplify for external stakeholders) -> Excel export

Pipeline Pattern: Calculator A -> Data Selector 1 (refine) -> Calculator B (further analysis) -> Data Selector 2 (final output)

The Data Selector is particularly valuable when you need to present analysis results to stakeholders who need focused, actionable information rather than comprehensive analytical detail. It bridges the gap between detailed process mining analysis and clear, decision-ready reporting.

This documentation is part of the mindzie Studio process mining platform.

Overview

Common Uses

Settings

Examples

Example 1: Top 10 Slowest Purchase Orders for Executive Report

Example 2: Weekly Activity Summary for Email Distribution

Example 3: Customer Analysis Dashboard Simplification

Example 4: Variant Analysis - Top Variants by Frequency

Example 5: Date Range Analysis for Trending

Example 6: Multi-Table Source Selection

Output

Output Characteristics

Interactive Features

Usage Tips

Common Patterns