Supported Data Formats
Learn about supported file formats, data structures, and column mapping requirements for process mining datasets.
CSV (Comma-Separated Values)
The most commonly used format for process mining data with flexible parsing options.
Format Specifications
| Option | Description | Default | Example |
|---|---|---|---|
delimiter |
Field separator character | comma (,) | semicolon (;), tab (\t) |
encoding |
Character encoding | UTF-8 | ISO-8859-1, Windows-1252 |
hasHeader |
First row contains column names | true | true, false |
quoteChar |
Text qualifier character | double quote (") | single quote (') |
Sample CSV Structure
CaseID,Activity,Timestamp,Resource,Amount
PO-001,Create Order,2024-01-15T09:00:00Z,buyer.smith,1500.00
PO-001,Approve Order,2024-01-15T10:30:00Z,manager.jones,1500.00
PO-001,Send to Supplier,2024-01-15T11:00:00Z,system.auto,1500.00
PO-002,Create Order,2024-01-15T09:15:00Z,buyer.brown,2750.50
Column Mapping Configuration
{
"mapping": [
{
"sourceColumn": "CaseID",
"targetColumn": "CaseID",
"dataType": "string",
"role": "case_id"
},
{
"sourceColumn": "Activity",
"targetColumn": "Activity",
"dataType": "string",
"role": "activity"
},
{
"sourceColumn": "Timestamp",
"targetColumn": "Timestamp",
"dataType": "datetime",
"role": "timestamp",
"format": "ISO8601"
}
],
"options": {
"hasHeader": true,
"delimiter": ",",
"encoding": "UTF-8"
}
}
Excel Files (.xlsx, .xls)
Microsoft Excel workbooks with support for multiple worksheets and advanced formatting.
Supported Features
File Types
- .xlsx (Excel 2007+)
- .xls (Excel 97-2003)
- .xlsm (Macro-enabled)
Worksheet Handling
- Multiple worksheet support
- Specific sheet selection
- Range-based import
Data Recognition
- Automatic date/time detection
- Numeric format preservation
- Text formatting cleanup
Excel Import Configuration
{
"worksheetName": "ProcessEvents",
"range": "A1:E1000",
"hasHeader": true,
"startRow": 1,
"mapping": [
{
"sourceColumn": "Order ID",
"targetColumn": "CaseID",
"dataType": "string"
},
{
"sourceColumn": "Event Date",
"targetColumn": "Timestamp",
"dataType": "datetime",
"format": "MM/dd/yyyy HH:mm:ss"
}
]
}
XES (eXtensible Event Stream)
IEEE standard format for process mining with full support for event attributes and extensions.
XES Specification Support
| Element | Support Level | Description |
|---|---|---|
| Log | Full | Log-level attributes and metadata |
| Trace | Full | Case-level attributes and events |
| Event | Full | Activity-level data and attributes |
| Extensions | Partial | Standard extensions (concept, time, lifecycle) |
Sample XES Structure
<?xml version="1.0" encoding="UTF-8" ?>
<log xes.version="1.0" xmlns="http://www.xes-standard.org/">
<extension name="Concept" prefix="concept" uri="http://www.xes-standard.org/concept.xesext"/>
<extension name="Time" prefix="time" uri="http://www.xes-standard.org/time.xesext"/>
<trace>
<string key="concept:name" value="PO-001"/>
<event>
<string key="concept:name" value="Create Order"/>
<date key="time:timestamp" value="2024-01-15T09:00:00.000Z"/>
<string key="org:resource" value="buyer.smith"/>
</event>
<event>
<string key="concept:name" value="Approve Order"/>
<date key="time:timestamp" value="2024-01-15T10:30:00.000Z"/>
<string key="org:resource" value="manager.jones"/>
</event>
</trace>
</log>
JSON (JavaScript Object Notation)
Structured JSON format for complex event data with nested attributes and flexible schema.
JSON Schema Options
Array of Events
Simple flat structure with event objects.
[
{
"caseId": "PO-001",
"activity": "Create Order",
"timestamp": "2024-01-15T09:00:00Z",
"resource": "buyer.smith"
}
]
Nested Structure
Hierarchical data with case and event nesting.
{
"cases": [
{
"caseId": "PO-001",
"events": [
{
"activity": "Create Order",
"timestamp": "2024-01-15T09:00:00Z"
}
]
}
]
}
JSON Mapping Configuration
{
"schema": "flat",
"mapping": [
{
"jsonPath": "$.caseId",
"targetColumn": "CaseID",
"dataType": "string"
},
{
"jsonPath": "$.activity",
"targetColumn": "Activity",
"dataType": "string"
},
{
"jsonPath": "$.timestamp",
"targetColumn": "Timestamp",
"dataType": "datetime"
}
]
}
Data Type Requirements
Understanding data types and validation rules for proper dataset structure:
String Fields
Text data with length and character validation.
- UTF-8 encoding required
- Maximum length: 1000 characters
- Special character handling
- Null value support
DateTime Fields
Timestamp data with timezone support.
- ISO 8601 format preferred
- Custom format support
- Timezone conversion
- Precision to milliseconds
Numeric Fields
Integer and decimal number handling.
- 64-bit integer support
- Double precision decimals
- Scientific notation
- Currency formatting
Boolean Fields
True/false value interpretation.
- true/false (case insensitive)
- 1/0 numeric values
- yes/no text values
- Null handling options
Format Validation and Errors
Common validation rules and error handling for different file formats:
Required Columns
Every process mining dataset must include these essential columns:
- Case ID: Unique identifier for each process instance
- Activity: Name or description of the process step
- Timestamp: When the activity occurred (with timezone)
Common Validation Errors
| Error Type | Description | Resolution |
|---|---|---|
| Missing Required Column | CaseID, Activity, or Timestamp not found | Add missing column or update mapping |
| Invalid Date Format | Timestamp not in recognized format | Specify custom date format pattern |
| Empty Case ID | Null or empty values in Case ID column | Clean data or use row filtering |
| Duplicate Headers | Multiple columns with same name | Rename columns or use column indices |
Best Practices
- Data Quality: Validate data before import using built-in validation options
- Performance: Use streaming uploads for files larger than 100MB
- Encoding: Always specify UTF-8 encoding for international character support
- Timestamps: Include timezone information in all timestamp data
- Testing: Use small sample files to test column mappings before full import
- Documentation: Document custom formats and mappings for future reference