Metadata

Overview

The Metadata calculator displays comprehensive technical information about how your dataset was generated, extracted, and configured. This zero-configuration calculator provides essential metadata including versioning details, ETL configuration, timezone settings, and core column mappings.

Unlike calculators that analyze process data, Metadata reveals the technical foundation of your dataset - when it was extracted, which versions of the ETL pipeline were used, how timestamps are interpreted, and what column names map to core process mining concepts like case ID and activity.

Common Uses

  • Verify data freshness by checking extraction timestamp and hours since last update
  • Troubleshoot timezone issues by reviewing timezone configuration and local time settings
  • Document data lineage for compliance and audit requirements
  • Validate ETL configuration by confirming transformer version and settings
  • Support technical troubleshooting by identifying core column names for custom scripts
  • Track dataset versioning across multiple environments (development, test, production)

Settings

This calculator requires no configuration. It automatically retrieves all metadata from your dataset and displays it in a comprehensive table.

The only standard fields available are:

Title: Optional custom title for the output (defaults to "Metadata")

Description: Optional description to provide context about this metadata view

Examples

Example 1: Verifying Data Freshness for Decision-Making

Scenario: Your finance team is preparing for a monthly business review meeting and needs to confirm they're analyzing the most current accounts payable data. Stale data could lead to incorrect conclusions about payment performance.

Settings:

  • Title: "Data Currency Check"
  • Description: "AP Process - Monthly Review"

Output:

The calculator displays a two-column table showing all dataset metadata. Key metrics for data freshness include:

  • Last successful data extraction: 2025-10-19 6:00:00 AM
  • Hours since last extraction: 2.5
  • Extraction Version: 3.2.1
  • Current Time: 2025-10-19 8:30:00 AM
  • TimeZoneName: Eastern Standard Time
  • ProcessDisplayName: Accounts Payable Process

Insights: The data was extracted just 2.5 hours ago at 6:00 AM this morning, confirming it reflects yesterday's completed work. The team can confidently proceed with their analysis knowing they're working with current data. If the "Hours since last extraction" had shown several days, they would need to request a data refresh before the meeting.

Example 2: Troubleshooting Timezone Discrepancies

Scenario: Users report that process timestamps don't match the times they see in the source ERP system. Some cases appear to start at 4:00 AM when the business doesn't open until 8:00 AM. You suspect a timezone configuration issue.

Settings:

  • Title: "Timezone Configuration Review"
  • Description: "Investigating timestamp interpretation issues"

Output:

The Metadata calculator reveals the timezone configuration:

  • TimeZoneName: UTC
  • IsLocalTime: False
  • Current Time: 2025-10-19 12:30:00 PM
  • Start Time: StartTime
  • End Time: EndTime
  • UseDateOnlySorting: False

Insights: The dataset is configured to use UTC time, not local time (IsLocalTime: False), which explains the 4-hour discrepancy. The business operates in Eastern Time (UTC-4), so what appears as 4:00 AM in the data is actually 8:00 AM local time. The team needs to either reconfigure the ETL to use Eastern Time or educate users that all times are displayed in UTC. This prevents misinterpretation of process timing and performance metrics.

Example 3: Data Lineage Documentation for Audit Compliance

Scenario: Your company's internal audit team requires documentation of data sources, extraction methods, and versioning for all process mining analyses used in compliance reporting. They need to verify the traceability and reliability of your invoice processing analysis.

Settings:

  • Title: "Data Lineage - Q4 2025 Compliance Report"
  • Description: "Invoice Processing Analysis Metadata"

Output:

The Metadata table provides comprehensive lineage information:

  • ProcessDisplayName: Invoice Processing
  • TransformerFilename: InvoiceProcessing_SAP_Config.json
  • TransformerVersion: 2.1.0
  • Extraction Version: 1.8.3
  • EngineAttributeVersion: 8.0.2
  • ProcessAttributeVersion: 3.4.1
  • Last successful data extraction: 2025-10-15 11:45:00 PM
  • Etl Notes: Full extraction from SAP ECC Production
  • Description: Q4 2025 invoice processing for compliance reporting
  • BaseCurrency: USD

Insights: The audit team can now trace exactly how the data was generated: extracted from SAP ECC Production on October 15th using transformer configuration version 2.1.0 and extraction pipeline version 1.8.3. The documented versions allow them to verify that approved, validated ETL processes were used. The "Etl Notes" confirm the data source was the production environment, not a test system. This complete lineage trail satisfies audit requirements for data provenance.

Example 4: Supporting Custom Python Script Development

Scenario: A data analyst is developing a custom Python script to export specific case attributes for further analysis in R. They need to know the exact column names used in the dataset to write correct queries.

Settings:

  • Title: "Column Mapping Reference"
  • Description: "Core column names for custom scripts"

Output:

The Metadata calculator displays the core column mappings:

  • CaseId: PurchaseOrderNumber
  • Activity: ProcessStep
  • Start Time: EventTimestamp
  • End Time: EventTimestamp
  • Resource: PerformedBy
  • ExpectedOrder: StepSequence

Insights: The analyst discovers that this dataset uses custom column names rather than defaults. The case identifier is stored in "PurchaseOrderNumber" (not "CaseId"), activities are in "ProcessStep" (not "Activity"), and resources are in "PerformedBy" (not "Resource"). Armed with these exact column names, the analyst can write accurate SQL queries and Python scripts that reference the correct fields. Without this information, the script would fail with column-not-found errors.

Example 5: Version Compatibility Check Across Environments

Scenario: Your organization maintains three process mining environments: development, test, and production. Before promoting a new dashboard to production, you need to verify that all environments use compatible versions of the data extraction pipeline to ensure consistent behavior.

Settings:

  • Title: "Version Compatibility - Production Environment"
  • Description: "Pre-deployment verification"

Output:

Production environment metadata shows:

  • Derived Attribute Version: 2.3.1
  • Extraction Version: 1.9.0
  • ProcessAttributeVersion: 3.5.0
  • EngineAttributeVersion: 8.1.0
  • TransformerVersion: 2.2.0

Compared against test environment (from a separate Metadata calculator):

  • Derived Attribute Version: 2.3.1 (MATCH)
  • Extraction Version: 1.9.0 (MATCH)
  • ProcessAttributeVersion: 3.4.1 (MISMATCH - Production newer)
  • EngineAttributeVersion: 8.1.0 (MATCH)
  • TransformerVersion: 2.2.0 (MATCH)

Insights: The environments are largely compatible, with four out of five versions matching exactly. However, production has a newer ProcessAttributeVersion (3.5.0 vs 3.4.1), indicating that production has additional or modified process-specific attributes. Before deploying the dashboard from test to production, the team needs to verify whether it depends on attributes that exist in test but may have changed in production. This proactive check prevents deployment failures and ensures consistent analysis across environments.

Example 6: Monitoring Automated ETL Pipeline Health

Scenario: Your data engineering team runs a nightly ETL job that should refresh process mining data by 6:00 AM each morning. The operations team needs a way to quickly verify the pipeline ran successfully without checking log files.

Settings:

  • Title: "ETL Pipeline Status"
  • Description: "Nightly extraction monitoring - Order-to-Cash"

Output:

The Metadata calculator shows:

  • Last successful data extraction: 2025-10-18 5:45:00 AM
  • Hours since last extraction: 26.5
  • Extraction Version: 1.9.0
  • Etl Notes: Incremental extraction completed successfully
  • Current Time: 2025-10-19 8:15:00 AM

Insights: The "Hours since last extraction" shows 26.5 hours, meaning the last successful extraction was yesterday morning, not this morning. The nightly job has failed. The operations team immediately investigates and discovers a database connection timeout that prevented last night's extraction from completing. By catching this early in the morning, they can rerun the extraction before business users notice they're looking at day-old data. Without this monitoring, users might make operational decisions based on stale information without realizing it.

Output

The Metadata calculator produces a single table with two columns displaying all available dataset metadata.

Table Structure:

Name: The name of each metadata property or configuration setting

Value: The corresponding value for that property

Categories of Information

The metadata is organized into several logical groups:

Versioning Information:

  • Derived Attribute Version: Version of derived attributes schema
  • Extraction Version: Version identifier from ETL extraction
  • ProcessAttributeVersion: Process-specific attribute schema version
  • EngineAttributeVersion: Engine attribute schema version
  • TransformerVersion: Version of the data transformer used

Process Configuration:

  • ProcessName: Internal process identifier
  • ProcessDisplayName: Human-readable process name
  • BaseCurrency: Currency used for monetary calculations

Time Configuration:

  • TimeZoneName: Configured timezone for the dataset
  • IsLocalTime: Whether timestamps are in local time (versus UTC)
  • Current Time: Current time based on timezone settings
  • UseDateOnlySorting: Whether events are sorted by date only (ignoring time)

Core Column Mapping:

  • CaseId: Name of the case identifier column
  • Activity: Name of the activity column
  • Start Time: Name of the start time column
  • End Time: Name of the end time column
  • Resource: Name of the resource column
  • ExpectedOrder: Name of the expected order column

ETL Configuration:

  • TransformerFilename: Name of the transformer/configuration file
  • Order Event Algorithm: Algorithm used for event ordering
  • Last successful data extraction: Timestamp of last successful ETL run
  • Hours since last extraction: Calculated age of the data
  • Etl Notes: Notes from the ETL process
  • Notes: General dataset notes
  • Description: Dataset description

Understanding the Output

Data Freshness: Check "Hours since last extraction" to determine if your data is current. Values over 24-48 hours may indicate ETL pipeline issues requiring investigation.

Timezone Interpretation: The combination of "TimeZoneName" and "IsLocalTime" determines how timestamps are displayed. If IsLocalTime is False, all times are shown in UTC regardless of the TimeZoneName setting.

Version Tracking: All version fields (Extraction Version, TransformerVersion, etc.) help track which ETL pipeline and schema versions generated the data. This is critical for troubleshooting issues across environment deployments.

Column Names: The core column mappings show the actual column names used in your dataset, which may differ from standard defaults if custom mapping was configured during data extraction.

Null Values: Some properties may show empty values or "Unknown" if that information wasn't available during extraction or hasn't been configured.


This documentation is part of the mindzie Studio process mining platform.

An error has occurred. This application may no longer respond until reloaded. Reload ??