Supported Data Formats
Learn about supported file formats, data structures, and column mapping requirements for process mining datasets.
CSV (Comma-Separated Values)
The most commonly used format for process mining data with flexible parsing options.
Format Specifications
| Option | Description | Default | Example | 
|---|---|---|---|
| delimiter | Field separator character | comma (,) | semicolon (;), tab (\t) | 
| encoding | Character encoding | UTF-8 | ISO-8859-1, Windows-1252 | 
| hasHeader | First row contains column names | true | true, false | 
| quoteChar | Text qualifier character | double quote (") | single quote (') | 
Sample CSV Structure
CaseID,Activity,Timestamp,Resource,Amount
PO-001,Create Order,2024-01-15T09:00:00Z,buyer.smith,1500.00
PO-001,Approve Order,2024-01-15T10:30:00Z,manager.jones,1500.00
PO-001,Send to Supplier,2024-01-15T11:00:00Z,system.auto,1500.00
PO-002,Create Order,2024-01-15T09:15:00Z,buyer.brown,2750.50
Column Mapping Configuration
{
  "mapping": [
    {
      "sourceColumn": "CaseID",
      "targetColumn": "CaseID",
      "dataType": "string",
      "role": "case_id"
    },
    {
      "sourceColumn": "Activity",
      "targetColumn": "Activity",
      "dataType": "string",
      "role": "activity"
    },
    {
      "sourceColumn": "Timestamp",
      "targetColumn": "Timestamp",
      "dataType": "datetime",
      "role": "timestamp",
      "format": "ISO8601"
    }
  ],
  "options": {
    "hasHeader": true,
    "delimiter": ",",
    "encoding": "UTF-8"
  }
}
Excel Files (.xlsx, .xls)
Microsoft Excel workbooks with support for multiple worksheets and advanced formatting.
Supported Features
File Types
- .xlsx (Excel 2007+)
- .xls (Excel 97-2003)
- .xlsm (Macro-enabled)
Worksheet Handling
- Multiple worksheet support
- Specific sheet selection
- Range-based import
Data Recognition
- Automatic date/time detection
- Numeric format preservation
- Text formatting cleanup
Excel Import Configuration
{
  "worksheetName": "ProcessEvents",
  "range": "A1:E1000",
  "hasHeader": true,
  "startRow": 1,
  "mapping": [
    {
      "sourceColumn": "Order ID",
      "targetColumn": "CaseID",
      "dataType": "string"
    },
    {
      "sourceColumn": "Event Date",
      "targetColumn": "Timestamp",
      "dataType": "datetime",
      "format": "MM/dd/yyyy HH:mm:ss"
    }
  ]
}
XES (eXtensible Event Stream)
IEEE standard format for process mining with full support for event attributes and extensions.
XES Specification Support
| Element | Support Level | Description | 
|---|---|---|
| Log | Full | Log-level attributes and metadata | 
| Trace | Full | Case-level attributes and events | 
| Event | Full | Activity-level data and attributes | 
| Extensions | Partial | Standard extensions (concept, time, lifecycle) | 
Sample XES Structure
<?xml version="1.0" encoding="UTF-8" ?>
<log xes.version="1.0" xmlns="http://www.xes-standard.org/">
  <extension name="Concept" prefix="concept" uri="http://www.xes-standard.org/concept.xesext"/>
  <extension name="Time" prefix="time" uri="http://www.xes-standard.org/time.xesext"/>
  <trace>
    <string key="concept:name" value="PO-001"/>
    <event>
      <string key="concept:name" value="Create Order"/>
      <date key="time:timestamp" value="2024-01-15T09:00:00.000Z"/>
      <string key="org:resource" value="buyer.smith"/>
    </event>
    <event>
      <string key="concept:name" value="Approve Order"/>
      <date key="time:timestamp" value="2024-01-15T10:30:00.000Z"/>
      <string key="org:resource" value="manager.jones"/>
    </event>
  </trace>
</log>
JSON (JavaScript Object Notation)
Structured JSON format for complex event data with nested attributes and flexible schema.
JSON Schema Options
Array of Events
Simple flat structure with event objects.
[
  {
    "caseId": "PO-001",
    "activity": "Create Order",
    "timestamp": "2024-01-15T09:00:00Z",
    "resource": "buyer.smith"
  }
]
Nested Structure
Hierarchical data with case and event nesting.
{
  "cases": [
    {
      "caseId": "PO-001",
      "events": [
        {
          "activity": "Create Order",
          "timestamp": "2024-01-15T09:00:00Z"
        }
      ]
    }
  ]
}
JSON Mapping Configuration
{
  "schema": "flat",
  "mapping": [
    {
      "jsonPath": "$.caseId",
      "targetColumn": "CaseID",
      "dataType": "string"
    },
    {
      "jsonPath": "$.activity",
      "targetColumn": "Activity",
      "dataType": "string"
    },
    {
      "jsonPath": "$.timestamp",
      "targetColumn": "Timestamp",
      "dataType": "datetime"
    }
  ]
}
Data Type Requirements
Understanding data types and validation rules for proper dataset structure:
String Fields
Text data with length and character validation.
- UTF-8 encoding required
- Maximum length: 1000 characters
- Special character handling
- Null value support
DateTime Fields
Timestamp data with timezone support.
- ISO 8601 format preferred
- Custom format support
- Timezone conversion
- Precision to milliseconds
Numeric Fields
Integer and decimal number handling.
- 64-bit integer support
- Double precision decimals
- Scientific notation
- Currency formatting
Boolean Fields
True/false value interpretation.
- true/false (case insensitive)
- 1/0 numeric values
- yes/no text values
- Null handling options
Format Validation and Errors
Common validation rules and error handling for different file formats:
Required Columns
Every process mining dataset must include these essential columns:
- Case ID: Unique identifier for each process instance
- Activity: Name or description of the process step
- Timestamp: When the activity occurred (with timezone)
Common Validation Errors
| Error Type | Description | Resolution | 
|---|---|---|
| Missing Required Column | CaseID, Activity, or Timestamp not found | Add missing column or update mapping | 
| Invalid Date Format | Timestamp not in recognized format | Specify custom date format pattern | 
| Empty Case ID | Null or empty values in Case ID column | Clean data or use row filtering | 
| Duplicate Headers | Multiple columns with same name | Rename columns or use column indices | 
Best Practices
- Data Quality: Validate data before import using built-in validation options
- Performance: Use streaming uploads for files larger than 100MB
- Encoding: Always specify UTF-8 encoding for international character support
- Timestamps: Include timezone information in all timestamp data
- Testing: Use small sample files to test column mappings before full import
- Documentation: Document custom formats and mappings for future reference