Overview
The LLM Prompts calculator generates comprehensive, AI-ready summaries of your process mining data that can be consumed by Large Language Models (LLMs). This calculator serves as the data bridge between mindzieStudio and AI chatbot systems, powering features like mindzie Copilot.
IMPORTANT: This is an administrator-only calculator designed for AI integration and chatbot functionality. It creates structured prompts containing process statistics, activity patterns, and performance metrics specifically formatted for consumption by AI assistants. Regular users interact with AI capabilities through the mindzie Copilot interface rather than using this calculator directly.
This calculator intelligently controls data sharing through five privacy levels, ensuring you maintain control over what information is shared with external LLM services while enabling natural language analysis of your process data.
Common Uses
- Power AI chatbot assistants that answer natural language questions about your process data
- Enable users to ask questions like "Which activity causes the most delays?" and get AI-generated insights
- Provide context to Large Language Models for automated process analysis and recommendations
- Generate comprehensive dataset summaries optimized for AI consumption and interpretation
- Control privacy by limiting what process data is shared with external LLM services
- Support different trust levels for on-premise versus cloud-based AI services
Settings
Data Level: Controls how much process data is shared with the LLM. This is the primary privacy control.
- Level 0 (Off) - Disables AI capabilities entirely. No data shared with LLM services.
- Level 1 (No Data) - AI can answer generic process mining questions but has no access to your dataset.
- Level 2 (Activity and Attribute Names) - Shares only column names and data types. AI understands your dataset structure but not values.
- Level 3 (Activities, Attributes, and Calculated Values) - Shares aggregated statistics like durations and frequencies. No raw case data.
- Level 4 (All Data) - Complete statistical profile including all calculated metrics. Maximum AI capability. Note: Raw case records are never shared at any level.
Include Activities and Attributes: When enabled, shares activity names with case counts and percentages, plus complete lists of case and event attributes with data types. Active at data levels 2, 3, and 4. This helps the AI understand what activities and attributes exist in your process.
Include Attribute Breakdown: When enabled, provides detailed value distributions for categorical attributes, showing counts and percentages for each value. Active at data levels 3 and 4. Attributes with over 100 categories are automatically skipped to avoid overwhelming the AI with too much detail.
Include Time Between Activities: When enabled, shares activity pair performance data including time between activities, case counts, percentages, and mean durations. Limited to the top 100 activity pairs. Active at data levels 3 and 4. This helps the AI identify bottlenecks and delays in your process.
Include Duration Histogram: When enabled, provides the distribution of case durations organized into buckets. Active at data levels 3 and 4. This helps the AI understand typical versus outlier case durations in your process.
Include Dataset Information: When enabled, shares overall dataset statistics including start and end times, case counts, event counts, duration statistics, and attribute counts. Active at data levels 3 and 4. This gives the AI a high-level view of your dataset's scope and characteristics.
Include Start and End Frequencies: When enabled, shows which activities cases start and end with, along with percentages. Active at data levels 3 and 4. This helps the AI understand process entry and exit points and identify common starting and ending patterns.
Include Resource Frequency: When enabled, provides case percentages for each resource, limited to the top 100 resources. Active at data levels 3 and 4. Only included if a Resource column exists in your dataset. This helps the AI identify workload distribution and potential resource bottlenecks.
Include Variant Information: When enabled, provides process variant statistics including variant sequences, case percentages, and mean durations for each variant. Limited to the top 100 variants. Active at data levels 3 and 4. This helps the AI understand which process paths are most common and their relative performance.
Prefix Text: Optional text to prepend to the generated prompt. Can be used to add custom context or instructions before the main data sections. Currently stored but not actively used in the main calculation.
Postfix Text: Optional text to append to the generated prompt. Can be used to add custom context or instructions after the main data sections. Currently stored but not actively used in the main calculation.
Examples
Example 1: Enabling AI-Powered Process Analysis
Scenario: You want to enable the mindzie Copilot AI assistant to answer natural language questions about your order-to-cash process. You trust the LLM service provider and want to share comprehensive process statistics to maximize the AI's analytical capabilities.
Settings:
- Data Level: Level 4 (All Data)
- Include Activities and Attributes: Enabled
- Include Attribute Breakdown: Enabled
- Include Time Between Activities: Enabled
- Include Duration Histogram: Enabled
- Include Dataset Information: Enabled
- Include Start and End Frequencies: Enabled
- Include Resource Frequency: Enabled
- Include Variant Information: Enabled
Output:
The calculator generates a comprehensive prompt containing:
Dataset Information:
- 2,456 cases covering October 1 to December 31, 2024
- Average case duration: 8.5 days
- 18 unique activities
- 15 case attributes and 12 event attributes
Activity Statistics:
- Create Order: 100% of cases
- Check Inventory: 98% of cases
- Ship: 95% of cases
- Invoice: 94% of cases
- Payment: 89% of cases
Time Between Activities (showing delays):
- Invoice to Payment: Mean 4.2 days
- Check Inventory to Ship: Mean 3.1 days
- Create Order to Check Inventory: Mean 1.8 days
Variant Analysis:
- Top variant (32% of cases): Create Order, Check Inventory, Ship, Invoice, Payment - 3.2 days average
- Second variant (28% of cases): Create Order, Check Inventory, Backorder, Ship, Invoice, Payment - 8.5 days average
Resource Distribution:
- Order Processing Team: 45% of cases
- Warehouse Team: 38% of cases
- Finance Team: 35% of cases
Estimated tokens: 6,200 tokens (4.8% of 128K LLM capacity)
Insights: With all data sections enabled, the AI assistant has comprehensive context about your order-to-cash process. Users can now ask questions like "Why do some orders take twice as long as others?" and the AI can identify that the second variant includes a backorder step that adds 5.3 days on average. The AI can spot that the Invoice-to-Payment delay of 4.2 days represents nearly half the average case duration, suggesting payment collection as an improvement opportunity. The token count of 6,200 represents only 5% of modern LLM capacity, leaving ample room for conversation history and complex questions.
Example 2: Privacy-Aware Metadata Sharing
Scenario: Your company policy requires that sensitive process data cannot be shared with external cloud-based LLM services. However, you want to enable basic AI assistance that can guide users on how to use mindzieStudio features based on understanding your dataset structure without seeing actual values.
Settings:
- Data Level: Level 2 (Activity and Attribute Names)
- Include Activities and Attributes: Enabled
- All other sections: Disabled (automatically excluded at Level 2)
Output:
The calculator generates a minimal prompt containing:
Activity Names:
- Create Invoice (2,156 cases - 100%)
- Match PO (2,089 cases - 96.9%)
- Match Receipt (1,867 cases - 86.6%)
- Approve Invoice (2,145 cases - 99.5%)
- Pay Invoice (2,001 cases - 92.8%)
Case Attributes:
- Invoice_Number (String)
- Vendor_Name (String)
- Invoice_Amount (Decimal)
- Currency (String)
- Payment_Terms (String)
- Department (String)
Event Attributes:
- Activity (String)
- Timestamp (DateTime)
- Resource (String)
- Approval_Level (String)
Estimated tokens: 450 tokens
Insights: At Level 2, the AI can understand your dataset structure and help users navigate mindzieStudio features. For example, when a user asks "How can I analyze invoice processing by vendor?", the AI can see that a Vendor_Name attribute exists and recommend using the Breakdown by Categories calculator with Vendor_Name as the category. However, the AI cannot answer questions about specific vendors or actual processing statistics because no values or calculated metrics are shared. This privacy-aware approach enables helpful guidance while maintaining data confidentiality and complying with strict data governance policies.
Example 3: Selective Data Sharing for Performance
Scenario: You want to enable AI analysis focused on process flow and bottleneck identification, but you want to minimize token usage to reduce LLM API costs and improve response times. You don't need resource or attribute analysis for your current use case.
Settings:
- Data Level: Level 3 (Activities, Attributes, and Calculated Values)
- Include Activities and Attributes: Enabled
- Include Attribute Breakdown: Disabled
- Include Time Between Activities: Enabled
- Include Duration Histogram: Enabled
- Include Dataset Information: Enabled
- Include Start and End Frequencies: Enabled
- Include Resource Frequency: Disabled
- Include Variant Information: Enabled
Output:
The calculator generates a focused prompt containing process flow data:
Dataset Overview:
- 1,847 purchase orders
- October 1 - December 31, 2024
- Average duration: 8.5 days
Time Between Activities:
- Submit Request to First Approval: Mean 3.2 days (bottleneck identified)
- First Approval to Second Approval: Mean 1.1 days
- Second Approval to PO Creation: Mean 0.8 days
- PO Creation to Vendor Confirmation: Mean 2.4 days
Duration Histogram:
- 0-3 days: 412 cases (22%)
- 3-7 days: 628 cases (34%)
- 7-14 days: 521 cases (28%)
- 14+ days: 286 cases (16%)
Process Variants:
- Standard approval path (65%): 7.2 days average
- Expedited path (20%): 3.1 days average
- Escalation path (15%): 15.8 days average
Estimated tokens: 2,100 tokens (67% reduction from full data)
Insights: By disabling Attribute Breakdown and Resource Frequency sections, you reduce token consumption by 67% while maintaining full capability for process flow analysis. The AI can still identify that the Submit-to-First-Approval delay of 3.2 days is the primary bottleneck, and that escalation cases take more than twice as long as standard cases. This selective sharing approach reduces LLM API costs from approximately $0.062 per query to $0.021 per query (assuming $0.01 per 1,000 tokens), making AI-assisted analysis more cost-effective for organizations processing thousands of queries monthly.
Example 4: Token Budget Management and Cost Estimation
Scenario: As a system administrator, you need to understand the token consumption and estimated costs for different data sharing configurations before enabling AI features organization-wide.
Settings:
- Data Level: Level 4 (All Data)
- All sections: Enabled
Output:
The calculator provides comprehensive token metrics:
Section Breakdown:
- Activities and Attributes: 1,240 tokens (3,100 characters)
- Attribute Breakdown: 2,341 tokens (5,852 characters)
- Time Between Activities: 892 tokens (2,230 characters)
- Duration Histogram: 324 tokens (810 characters)
- Dataset Information: 187 tokens (468 characters)
- Start and End Frequencies: 156 tokens (390 characters)
- Resource Frequency: 412 tokens (1,030 characters)
- Variant Information: 621 tokens (1,552 characters)
Total Statistics:
- Total characters: 15,432
- Total words: 3,124
- Estimated tokens: 6,173 tokens
- Capacity used: 4.8% of 128K token window
- Estimated cost per query: $0.062 (at $0.01 per 1K tokens)
Insights: The token usage analysis reveals that Attribute Breakdown is the most expensive section at 2,341 tokens, consuming 38% of the total budget. If cost reduction is needed, disabling this single section would cut token usage by 38% while maintaining process flow analysis capabilities. The 6,173 token prompt uses less than 5% of modern LLM context windows (128K tokens for GPT-4 or Claude), leaving ample capacity for conversation history and complex multi-turn interactions. At an estimated $0.062 per query with current OpenAI pricing, an organization expecting 1,000 AI queries per month should budget approximately $62 monthly for LLM API costs, not including response tokens.
Example 5: Troubleshooting AI Assistant Responses
Scenario: Users report that the AI assistant cannot answer questions about resource workload distribution. You need to verify what data the AI has access to and identify the issue.
Settings:
- Data Level: Level 4 (All Data)
- Include Resource Frequency: Disabled (this is the problem)
- All other sections: Enabled
Output:
When the calculator runs without resource frequency data, the generated prompt contains:
Resource Information:
- "There are no resources selected for this dataset."
Insights: The diagnostic output reveals why the AI cannot answer resource-related questions - the Include Resource Frequency toggle is disabled. Even at Level 4 (All Data), individual sections must be explicitly enabled to be shared with the AI. After enabling the Include Resource Frequency setting, the calculator generates comprehensive resource statistics showing that Jane Smith handles 42% of all cases while other resources average only 12%, explaining the workload imbalance users were asking about. This highlights that the Data Level setting controls the privacy boundary, while the individual section toggles control which specific analyses are available to the AI within that privacy level.
Example 6: Monitoring AI Data Sharing in Regulated Industries
Scenario: Your healthcare organization uses mindzieStudio to analyze patient treatment processes. Compliance requires that no patient-identifiable information or specific case data be shared with external AI services, but you want to enable AI assistance for aggregate process analysis that could improve patient care efficiency.
Settings:
- Data Level: Level 3 (Activities, Attributes, and Calculated Values)
- Include Activities and Attributes: Enabled
- Include Attribute Breakdown: Disabled (avoids sharing specific attribute values)
- Include Time Between Activities: Enabled
- Include Duration Histogram: Enabled
- Include Dataset Information: Enabled
- Include Start and End Frequencies: Enabled
- Include Resource Frequency: Disabled (avoids sharing clinician names)
- Include Variant Information: Enabled
Output:
The calculator generates a compliance-friendly prompt:
Dataset Summary:
- 845 treatment episodes
- January 1 - March 31, 2025
- Average duration: 4.2 days
Process Flow:
- Patient Registration to Initial Assessment: Mean 2.1 hours
- Initial Assessment to Treatment Plan: Mean 8.4 hours
- Treatment Plan to Treatment Start: Mean 14.2 hours
Variant Analysis:
- Standard treatment path (72%): 3.8 days average
- Complex care path (18%): 7.2 days average
- Emergency accelerated path (10%): 1.5 days average
No patient names, case identifiers, or resource names are included in the prompt.
Insights: This configuration enables the AI to identify that the Treatment-Plan-to-Treatment-Start delay of 14.2 hours represents a significant bottleneck in patient care, potentially delaying treatment initiation. The AI can recommend focusing improvement efforts on this specific transition without ever receiving patient-identifiable information. By operating at Level 3 with Attribute Breakdown and Resource Frequency disabled, the organization complies with healthcare data privacy regulations while still benefiting from AI-powered process analysis. The AI can suggest "Focus on reducing the 14-hour delay between treatment planning and treatment initiation" without knowing which specific patients experienced delays or which clinicians were involved, enabling evidence-based process improvement while maintaining patient confidentiality.
Output
The LLM Prompts calculator generates a structured output designed for consumption by AI assistants and Large Language Models:
Message Sections: The calculator organizes data into multiple named sections, each with its own statistics. Each section includes metadata about word count, character count, and estimated token consumption. This modular structure allows the AI to understand which type of information comes from which analysis.
Comprehensive Statistics: At the bottom of the output, the calculator displays aggregate metrics including total word count, total character count, and estimated token count. These metrics help administrators understand the capacity requirements and estimate API costs when integrating with commercial LLM services.
Token Estimation: The calculator estimates token consumption using a 2.5 characters per token ratio, which is empirically accurate for English text mixed with JSON data structures. This estimation helps organizations budget for LLM API costs and ensure prompts fit within the context window limits of their chosen AI service (typically 128,000 tokens for modern models like GPT-4 or Claude).
JSON-Formatted Tables: All data sections are formatted as JSON structures that LLMs can easily parse and understand. This structured format enables the AI to accurately interpret activity frequencies, duration statistics, variant information, and other process metrics without ambiguity.
Capacity Indicators: For sections with large volumes of data (resources, variants, activity pairs), the calculator automatically limits output to the top 100 items and includes a note explaining the limitation. This prevents overwhelming the LLM with excessive detail while focusing on the most significant process elements.
Privacy Status Messages: When Data Level is set to Level 0 or Level 1, the calculator generates a message stating "The settings do not allow to share any data with the Copilot" instead of process statistics. This makes it clear to both administrators and AI systems why no data is available.
Section-Specific Content: Depending on the Data Level and enabled sections, the output may include activities and attribute names (Level 2+), attribute value distributions (Level 3+), time between activities (Level 3+), duration histograms (Level 3+), dataset summary statistics (Level 3+), process start and end patterns (Level 3+), resource workload distribution (Level 3+), and variant performance metrics (Level 3+).
Interactive Integration: While this calculator's output is designed for AI consumption, the results appear in mindzieStudio's standard calculator output format. Administrators can review the generated prompts to understand exactly what information is being shared with LLM services and verify compliance with data governance policies.
This documentation is part of the mindzie Studio process mining platform.