Overview
The Data Selector calculator is a data post-processing tool that selects specific columns from another calculator's output and optionally sorts and limits the results. This calculator is essential for creating focused data views by choosing relevant columns, ordering the data, and displaying only the top N rows.
Unlike most calculators that analyze process data directly, Data Selector works with the output tables from other calculators, making it ideal for refining analysis results for dashboards, reports, and exports.
Common Uses
- Prepare specific data subsets for email delivery or export to stakeholders
- Create simplified dashboard views showing only key metrics from complex analysis
- Select and sort top N results from large analysis outputs (e.g., top 10 slowest cases)
- Focus reports on relevant columns by removing unnecessary detail
- Transform comprehensive analysis results into executive-friendly summaries
- Create data pipelines by chaining multiple calculators and selecting specific outputs at each stage
Settings
Source Calculator: Select the calculator block whose output you want to work with. This calculator must have already been executed in the current notebook.
Source Table: Choose which table to use if the source calculator produces multiple result tables. Most calculators produce a single table (index 0), but some calculators return multiple tables with different types of information.
Columns to Include: Select which columns from the source table should appear in the output. You can select multiple columns, and they will appear in the order you specify. Column names must match exactly as they appear in the source calculator output.
Sort Column: Optionally choose a column to sort the results by. If you don't specify a sort column, the data will maintain the same order as the source calculator output.
Sort Direction: When sorting is enabled, choose whether to sort in:
- Ascending order: Lowest to highest (A-Z, 0-9, oldest to newest)
- Descending order: Highest to lowest (Z-A, 9-0, newest to oldest)
Maximum Rows: Specify the maximum number of rows to include in the output. Set to 0 or leave blank for no limit. When combined with sorting, this allows you to select "top N" results (e.g., top 20 slowest cases when sorted by duration descending).
Examples
Example 1: Top 10 Slowest Purchase Orders for Executive Report
Scenario: Your Case Duration calculator has analyzed 2,500 purchase orders, but you want to create an executive dashboard showing only the 10 slowest cases for immediate attention.
Settings:
- Source Calculator: "Purchase Order Duration Analysis"
- Source Table: 0 (primary results table)
- Columns to Include: ["Case ID", "Supplier Name", "Duration", "Total Value"]
- Sort Column: Duration
- Sort Direction: Descending
- Maximum Rows: 10
Output:
The calculator displays a focused table with exactly 4 columns and 10 rows:
| Case ID | Supplier Name | Duration | Total Value |
|---|---|---|---|
| PO-2024-8821 | Acme Manufacturing | 47.3 days | $125,400 |
| PO-2024-9156 | Global Supplies Inc | 42.8 days | $89,200 |
| PO-2024-7633 | TechParts Ltd | 38.5 days | $156,800 |
| ... | ... | ... | ... |
Insights: By selecting only the essential columns and limiting to 10 rows, you've created an actionable dashboard that highlights problematic cases without overwhelming executives with 2,500 rows of data. The sorting by duration ensures the most urgent cases appear first. The inclusion of Total Value shows the financial impact of these delays.
Example 2: Weekly Activity Summary for Email Distribution
Scenario: You run a weekly activity frequency analysis that generates detailed statistics for 45 different activities. You want to email the process owner just the top 15 most frequent activities with simplified metrics.
Settings:
- Source Calculator: "Weekly Activity Frequency Report"
- Source Table: 0
- Columns to Include: ["Activity Name", "Event Count", "Percentage of Total Events"]
- Sort Column: Event Count
- Sort Direction: Descending
- Maximum Rows: 15
Output:
A clean, focused table perfect for email:
| Activity Name | Event Count | Percentage of Total Events |
|---|---|---|
| Create Purchase Requisition | 1,847 | 18.2% |
| Manager Approval | 1,823 | 17.9% |
| Vendor Selection | 1,792 | 17.6% |
| ... | ... | ... |
Insights: This simplified view removes columns like "First Occurrence" and "Last Occurrence" that clutter the email, while keeping the essential metrics that show which activities dominate the process. The recipient immediately sees that the top 3 activities account for over half of all process events.
Example 3: Customer Analysis Dashboard Simplification
Scenario: Your Breakdown by Categories calculator analyzed customers across 12 different metrics, but your dashboard widget only has space to show 5 columns for the top 20 customers.
Settings:
- Source Calculator: "Customer Performance Analysis"
- Source Table: 0
- Columns to Include: ["Customer Name", "Case Count", "Average Duration", "Total Revenue", "On-Time Percentage"]
- Sort Column: Total Revenue
- Sort Direction: Descending
- Maximum Rows: 20
Output:
Dashboard-ready table with focused metrics:
| Customer Name | Case Count | Average Duration | Total Revenue | On-Time Percentage |
|---|---|---|---|---|
| MegaCorp Industries | 487 | 8.2 days | $4,850,000 | 92% |
| TechStart Solutions | 356 | 7.5 days | $3,240,000 | 95% |
| Global Systems Inc | 298 | 9.1 days | $2,870,000 | 88% |
| ... | ... | ... | ... | ... |
Insights: You've transformed a comprehensive 12-column analysis into a dashboard-friendly 5-column view showing exactly what stakeholders need to know: which customers generate the most revenue, how many orders they place, how long processing takes, and their delivery performance. Sorting by revenue ensures the most important customers are visible at a glance.
Example 4: Variant Analysis - Top Variants by Frequency
Scenario: Your variant analysis identified 284 unique process variants. You want to focus your improvement efforts on the top 25 most common variants, which typically represent 80% of your case volume.
Settings:
- Source Calculator: "Process Variant Analysis"
- Source Table: 0
- Columns to Include: ["Variant ID", "Frequency", "Cumulative Percentage", "Average Duration", "Contains Rework"]
- Sort Column: Frequency
- Sort Direction: Descending
- Maximum Rows: 25
Output:
| Variant ID | Frequency | Cumulative Percentage | Average Duration | Contains Rework |
|---|---|---|---|---|
| VAR-001 | 1,245 | 24.8% | 6.2 days | No |
| VAR-002 | 876 | 42.2% | 8.5 days | Yes |
| VAR-003 | 623 | 54.6% | 5.8 days | No |
| ... | ... | ... | ... | ... |
Insights: The top 25 variants represent the core of your process, and the cumulative percentage column shows that focusing on these variants covers the majority of cases. The "Contains Rework" column immediately flags which common variants include inefficient rework steps, helping prioritize improvement opportunities.
Example 5: Date Range Analysis for Trending
Scenario: Your rate-over-time calculator generated daily statistics for 90 days, but you want to display just the key metrics in chronological order without any row limits for a complete trend analysis.
Settings:
- Source Calculator: "90-Day Completion Rate Analysis"
- Source Table: 0
- Columns to Include: ["Date", "Cases Completed", "Completion Rate"]
- Sort Column: Date
- Sort Direction: Ascending
- Maximum Rows: 0 (no limit)
Output:
All 90 rows displayed in chronological order:
| Date | Cases Completed | Completion Rate |
|---|---|---|
| 2024-10-01 | 23 | 87.4% |
| 2024-10-02 | 28 | 91.2% |
| 2024-10-03 | 31 | 89.7% |
| ... | ... | ... |
Insights: By sorting by date ascending and not limiting rows, you maintain the complete time series for charting or export. You've simplified the output by removing statistical columns (like "Standard Deviation" and "Min/Max") that aren't needed for basic trend visualization, making the data cleaner for graphing tools.
Example 6: Multi-Table Source Selection
Scenario: Your conformance checker returns two tables: table 0 contains summary statistics, and table 1 contains detailed violation listings. You want to create a report from the detailed violations table.
Settings:
- Source Calculator: "Standard Process Conformance Check"
- Source Table: 1 (detail table, not summary)
- Columns to Include: ["Case ID", "Violation Type", "Activity Name", "Timestamp"]
- Sort Column: Violation Type
- Sort Direction: Ascending
- Maximum Rows: 100
Output:
| Case ID | Violation Type | Activity Name | Timestamp |
|---|---|---|---|
| CS-1234 | Missing Required Step | Invoice Approval | 2024-11-15 14:22 |
| CS-5678 | Missing Required Step | Purchase Approval | 2024-11-16 09:15 |
| CS-9012 | Out of Sequence | Goods Receipt | 2024-11-16 11:45 |
| ... | ... | ... | ... |
Insights: By selecting table 1 instead of the default table 0, you access the detailed violation data rather than just summary counts. Sorting by violation type groups similar problems together, making it easier to identify patterns. The 100-row limit ensures the report remains manageable while covering the most important violations.
Output
The Data Selector calculator displays a table with the exact columns you specified, in the order you selected them. The table structure is dynamic and depends on your column selections.
Output Characteristics
Column Structure: Only the columns you selected from "Columns to Include" appear in the output. Column names, data types, and formatting are preserved from the source calculator.
Row Count: Determined by the Maximum Rows setting:
- If Maximum Rows = 0 or blank: All rows from the source table
- If Maximum Rows > 0: Up to that many rows (may be fewer if source has fewer rows)
Row Order: Determined by sorting settings:
- If no sort column specified: Maintains the same order as the source calculator
- If sort column specified: Rows are ordered according to the sort column and direction
Interactive Features
Click on rows: In many cases, clicking on a row will drill down to show the underlying cases or details, just as you could in the source calculator.
Export capabilities: The refined output can be exported to Excel or CSV files, making it ideal for sharing with stakeholders who don't have access to the mindzie platform.
Email integration: This calculator's output is commonly used with automated email delivery to send focused data subsets to process owners and executives on a scheduled basis.
Dashboard widgets: The simplified, focused output is perfect for embedding in dashboard widgets where space is limited.
Usage Tips
- Always ensure the source calculator has executed successfully before running Data Selector
- Use the preview feature in the calculator configuration to see available columns from your source
- Column names are case-sensitive - they must match exactly as they appear in the source
- When combining sorting with row limits, sorting is applied first, then the row limit (enabling "top N" selections)
- If the source calculator has no results or an error, Data Selector will produce an empty table
- Multiple Data Selector calculators can be used in sequence to progressively refine data
Common Patterns
Dashboard Pattern: Complex calculator -> Data Selector (select key columns, top N rows) -> Dashboard widget
Email Pattern: Analysis calculator -> Data Selector (focus on actionable data) -> Automated email delivery
Export Pattern: Comprehensive analysis -> Data Selector (simplify for external stakeholders) -> Excel export
Pipeline Pattern: Calculator A -> Data Selector 1 (refine) -> Calculator B (further analysis) -> Data Selector 2 (final output)
The Data Selector is particularly valuable when you need to present analysis results to stakeholders who need focused, actionable information rather than comprehensive analytical detail. It bridges the gap between detailed process mining analysis and clear, decision-ready reporting.
This documentation is part of the mindzie Studio process mining platform.