Formats

Supported Data Formats

Learn about supported file formats, data structures, and column mapping requirements for process mining datasets.

CSV (Comma-Separated Values)

The most commonly used format for process mining data with flexible parsing options.

Format Specifications

Option Description Default Example
delimiter Field separator character comma (,) semicolon (;), tab (\t)
encoding Character encoding UTF-8 ISO-8859-1, Windows-1252
hasHeader First row contains column names true true, false
quoteChar Text qualifier character double quote (") single quote (')

Sample CSV Structure

CaseID,Activity,Timestamp,Resource,Amount
PO-001,Create Order,2024-01-15T09:00:00Z,buyer.smith,1500.00
PO-001,Approve Order,2024-01-15T10:30:00Z,manager.jones,1500.00
PO-001,Send to Supplier,2024-01-15T11:00:00Z,system.auto,1500.00
PO-002,Create Order,2024-01-15T09:15:00Z,buyer.brown,2750.50

Column Mapping Configuration

{
  "mapping": [
    {
      "sourceColumn": "CaseID",
      "targetColumn": "CaseID",
      "dataType": "string",
      "role": "case_id"
    },
    {
      "sourceColumn": "Activity",
      "targetColumn": "Activity",
      "dataType": "string",
      "role": "activity"
    },
    {
      "sourceColumn": "Timestamp",
      "targetColumn": "Timestamp",
      "dataType": "datetime",
      "role": "timestamp",
      "format": "ISO8601"
    }
  ],
  "options": {
    "hasHeader": true,
    "delimiter": ",",
    "encoding": "UTF-8"
  }
}

Excel Files (.xlsx, .xls)

Microsoft Excel workbooks with support for multiple worksheets and advanced formatting.

Supported Features

File Types

  • .xlsx (Excel 2007+)
  • .xls (Excel 97-2003)
  • .xlsm (Macro-enabled)

Worksheet Handling

  • Multiple worksheet support
  • Specific sheet selection
  • Range-based import

Data Recognition

  • Automatic date/time detection
  • Numeric format preservation
  • Text formatting cleanup

Excel Import Configuration

{
  "worksheetName": "ProcessEvents",
  "range": "A1:E1000",
  "hasHeader": true,
  "startRow": 1,
  "mapping": [
    {
      "sourceColumn": "Order ID",
      "targetColumn": "CaseID",
      "dataType": "string"
    },
    {
      "sourceColumn": "Event Date",
      "targetColumn": "Timestamp",
      "dataType": "datetime",
      "format": "MM/dd/yyyy HH:mm:ss"
    }
  ]
}

XES (eXtensible Event Stream)

IEEE standard format for process mining with full support for event attributes and extensions.

XES Specification Support

Element Support Level Description
Log Full Log-level attributes and metadata
Trace Full Case-level attributes and events
Event Full Activity-level data and attributes
Extensions Partial Standard extensions (concept, time, lifecycle)

Sample XES Structure

<?xml version="1.0" encoding="UTF-8" ?>
<log xes.version="1.0" xmlns="http://www.xes-standard.org/">
  <extension name="Concept" prefix="concept" uri="http://www.xes-standard.org/concept.xesext"/>
  <extension name="Time" prefix="time" uri="http://www.xes-standard.org/time.xesext"/>

  <trace>
    <string key="concept:name" value="PO-001"/>

    <event>
      <string key="concept:name" value="Create Order"/>
      <date key="time:timestamp" value="2024-01-15T09:00:00.000Z"/>
      <string key="org:resource" value="buyer.smith"/>
    </event>

    <event>
      <string key="concept:name" value="Approve Order"/>
      <date key="time:timestamp" value="2024-01-15T10:30:00.000Z"/>
      <string key="org:resource" value="manager.jones"/>
    </event>
  </trace>
</log>

JSON (JavaScript Object Notation)

Structured JSON format for complex event data with nested attributes and flexible schema.

JSON Schema Options

Array of Events

Simple flat structure with event objects.

[
  {
    "caseId": "PO-001",
    "activity": "Create Order",
    "timestamp": "2024-01-15T09:00:00Z",
    "resource": "buyer.smith"
  }
]

Nested Structure

Hierarchical data with case and event nesting.

{
  "cases": [
    {
      "caseId": "PO-001",
      "events": [
        {
          "activity": "Create Order",
          "timestamp": "2024-01-15T09:00:00Z"
        }
      ]
    }
  ]
}

JSON Mapping Configuration

{
  "schema": "flat",
  "mapping": [
    {
      "jsonPath": "$.caseId",
      "targetColumn": "CaseID",
      "dataType": "string"
    },
    {
      "jsonPath": "$.activity",
      "targetColumn": "Activity",
      "dataType": "string"
    },
    {
      "jsonPath": "$.timestamp",
      "targetColumn": "Timestamp",
      "dataType": "datetime"
    }
  ]
}

Data Type Requirements

Understanding data types and validation rules for proper dataset structure:

String Fields

Text data with length and character validation.

  • UTF-8 encoding required
  • Maximum length: 1000 characters
  • Special character handling
  • Null value support

DateTime Fields

Timestamp data with timezone support.

  • ISO 8601 format preferred
  • Custom format support
  • Timezone conversion
  • Precision to milliseconds

Numeric Fields

Integer and decimal number handling.

  • 64-bit integer support
  • Double precision decimals
  • Scientific notation
  • Currency formatting

Boolean Fields

True/false value interpretation.

  • true/false (case insensitive)
  • 1/0 numeric values
  • yes/no text values
  • Null handling options

Format Validation and Errors

Common validation rules and error handling for different file formats:

Required Columns

Every process mining dataset must include these essential columns:

  • Case ID: Unique identifier for each process instance
  • Activity: Name or description of the process step
  • Timestamp: When the activity occurred (with timezone)

Common Validation Errors

Error Type Description Resolution
Missing Required Column CaseID, Activity, or Timestamp not found Add missing column or update mapping
Invalid Date Format Timestamp not in recognized format Specify custom date format pattern
Empty Case ID Null or empty values in Case ID column Clean data or use row filtering
Duplicate Headers Multiple columns with same name Rename columns or use column indices

Best Practices

  • Data Quality: Validate data before import using built-in validation options
  • Performance: Use streaming uploads for files larger than 100MB
  • Encoding: Always specify UTF-8 encoding for international character support
  • Timestamps: Include timezone information in all timestamp data
  • Testing: Use small sample files to test column mappings before full import
  • Documentation: Document custom formats and mappings for future reference
An error has occurred. This application may no longer respond until reloaded. Reload ??