Dataset Creation
Create New Datasets
Create datasets from CSV files, ZIP packages, or binary files. Each format requires specific column mappings for process mining analysis.
Connectivity Testing
Unauthorized Ping
GET /api/{tenantId}/{projectId}/dataset/unauthorized-ping
Test endpoint that does not require authentication.
Response
Ping Successful
Authenticated Ping
GET /api/{tenantId}/{projectId}/dataset/ping
Authenticated ping endpoint to verify API access.
Response (200 OK)
Ping Successful (tenant id: {tenantId})
List All Datasets
GET /api/{tenantId}/{projectId}/dataset
Retrieves all datasets within the specified project.
Response (200 OK)
{
"datasets": [
{
"datasetId": "550e8400-e29b-41d4-a716-446655440000",
"datasetName": "Purchase Order Process",
"datasetDescription": "Event log from SAP",
"projectId": "660e8400-e29b-41d4-a716-446655440000",
"caseIdColumnName": "CaseID",
"activityColumnName": "Activity",
"timeColumnName": "Timestamp",
"resourceColumnName": "Resource",
"beginTimeColumnName": null,
"expectedOrderColumnName": null,
"useDateOnlySorting": false,
"useOnlyEventColumns": false,
"dateCreated": "2024-01-15T10:30:00Z",
"dateModified": "2024-01-15T14:45:00Z",
"createdBy": "user@example.com",
"modifiedBy": "user@example.com"
}
]
}
Create Dataset from CSV
POST /api/{tenantId}/{projectId}/dataset/csv
Creates a new dataset from a CSV file upload with column mappings.
Request (multipart/form-data)
| Field | Type | Required | Description |
|---|---|---|---|
file |
file | Yes | CSV file to upload (max 1GB) |
datasetName |
string | Yes | Name for the new dataset |
caseIdColumn |
string | Yes | Column name containing case IDs |
activityNameColumn |
string | Yes | Column name containing activity names |
activityTimeColumn |
string | Yes | Column name containing timestamps |
resourceColumn |
string | No | Column name containing resource/performer |
startTimeColumn |
string | No | Column name for activity start times |
cultureInfo |
string | No | Culture for parsing (default: "en-US") |
Response (200 OK)
{
"datasetId": "550e8400-e29b-41d4-a716-446655440000",
"caseCount": 5200,
"eventCount": 150000,
"invalidValueCount": 0,
"skippedRowsCount": 0,
"errors": [],
"rowIssues": [],
"statusCode": 200
}
Error Response (422 Unprocessable Entity)
{
"errors": ["Column 'CaseID' not found in CSV file"],
"statusCode": 422
}
Create Dataset from ZIP Package
POST /api/{tenantId}/{projectId}/dataset/package
Creates a new dataset from a ZIP package containing data files.
Request (multipart/form-data)
| Field | Type | Required | Description |
|---|---|---|---|
file |
file | Yes | ZIP package file (max 1GB) |
datasetName |
string | Yes | Name for the new dataset |
cultureInfo |
string | No | Culture for parsing (default: "en-US") |
Response (200 OK)
{
"datasetId": "550e8400-e29b-41d4-a716-446655440000",
"caseCount": 5200,
"eventCount": 150000,
"invalidValueCount": 0,
"skippedRowsCount": 0,
"errors": [],
"rowIssues": [],
"statusCode": 200
}
Create Dataset from Binary
POST /api/{tenantId}/{projectId}/dataset/binary
Creates a new dataset from a binary format file with column mappings.
Request (multipart/form-data)
| Field | Type | Required | Description |
|---|---|---|---|
file |
file | Yes | Binary file to upload (max 1GB) |
datasetName |
string | Yes | Name for the new dataset |
caseIdColumn |
string | Yes | Column name containing case IDs |
activityNameColumn |
string | Yes | Column name containing activity names |
activityTimeColumn |
string | Yes | Column name containing timestamps |
resourceColumn |
string | No | Column name containing resource/performer |
startTimeColumn |
string | No | Column name for activity start times |
Response (200 OK)
Same structure as CSV creation response.
Implementation Examples
cURL - CSV Upload
curl -X POST "https://your-mindzie-instance.com/api/12345678-1234-1234-1234-123456789012/87654321-4321-4321-4321-210987654321/dataset/csv" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-F "file=@event_log.csv" \
-F "datasetName=Purchase Orders" \
-F "caseIdColumn=CaseID" \
-F "activityNameColumn=Activity" \
-F "activityTimeColumn=Timestamp" \
-F "resourceColumn=User" \
-F "cultureInfo=en-US"
cURL - ZIP Package Upload
curl -X POST "https://your-mindzie-instance.com/api/12345678-1234-1234-1234-123456789012/87654321-4321-4321-4321-210987654321/dataset/package" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-F "file=@data_package.zip" \
-F "datasetName=SAP Export" \
-F "cultureInfo=en-US"
Python
import requests
TENANT_ID = '12345678-1234-1234-1234-123456789012'
PROJECT_ID = '87654321-4321-4321-4321-210987654321'
BASE_URL = 'https://your-mindzie-instance.com'
class DatasetUploader:
def __init__(self, token):
self.headers = {'Authorization': f'Bearer {token}'}
def create_from_csv(self, file_path, dataset_name, case_id_col, activity_col, time_col,
resource_col=None, start_time_col=None, culture='en-US'):
"""Create dataset from CSV file."""
url = f'{BASE_URL}/api/{TENANT_ID}/{PROJECT_ID}/dataset/csv'
with open(file_path, 'rb') as f:
files = {'file': (file_path, f, 'text/csv')}
data = {
'datasetName': dataset_name,
'caseIdColumn': case_id_col,
'activityNameColumn': activity_col,
'activityTimeColumn': time_col,
'cultureInfo': culture
}
if resource_col:
data['resourceColumn'] = resource_col
if start_time_col:
data['startTimeColumn'] = start_time_col
response = requests.post(url, headers=self.headers, files=files, data=data)
if response.ok:
return response.json()
else:
raise Exception(f'Upload failed: {response.text}')
def create_from_package(self, file_path, dataset_name, culture='en-US'):
"""Create dataset from ZIP package."""
url = f'{BASE_URL}/api/{TENANT_ID}/{PROJECT_ID}/dataset/package'
with open(file_path, 'rb') as f:
files = {'file': (file_path, f, 'application/zip')}
data = {
'datasetName': dataset_name,
'cultureInfo': culture
}
response = requests.post(url, headers=self.headers, files=files, data=data)
if response.ok:
return response.json()
else:
raise Exception(f'Upload failed: {response.text}')
def list_datasets(self):
"""List all datasets in the project."""
url = f'{BASE_URL}/api/{TENANT_ID}/{PROJECT_ID}/dataset'
response = requests.get(url, headers=self.headers)
return response.json()
# Usage
uploader = DatasetUploader('your-auth-token')
# Create from CSV
result = uploader.create_from_csv(
'event_log.csv',
'Purchase Order Process',
'CaseID',
'Activity',
'Timestamp',
resource_col='User'
)
print(f"Created dataset: {result['datasetId']}")
print(f"Cases: {result['caseCount']}, Events: {result['eventCount']}")
# List all datasets
datasets = uploader.list_datasets()
for ds in datasets['datasets']:
print(f"- {ds['datasetName']} ({ds['datasetId']})")
JavaScript/Node.js
const TENANT_ID = '12345678-1234-1234-1234-123456789012';
const PROJECT_ID = '87654321-4321-4321-4321-210987654321';
const BASE_URL = 'https://your-mindzie-instance.com';
class DatasetUploader {
constructor(token) {
this.token = token;
}
async createFromCsv(file, datasetName, caseIdCol, activityCol, timeCol, options = {}) {
const url = `${BASE_URL}/api/${TENANT_ID}/${PROJECT_ID}/dataset/csv`;
const formData = new FormData();
formData.append('file', file);
formData.append('datasetName', datasetName);
formData.append('caseIdColumn', caseIdCol);
formData.append('activityNameColumn', activityCol);
formData.append('activityTimeColumn', timeCol);
formData.append('cultureInfo', options.culture || 'en-US');
if (options.resourceColumn) {
formData.append('resourceColumn', options.resourceColumn);
}
if (options.startTimeColumn) {
formData.append('startTimeColumn', options.startTimeColumn);
}
const response = await fetch(url, {
method: 'POST',
headers: { 'Authorization': `Bearer ${this.token}` },
body: formData
});
if (response.ok) {
return await response.json();
} else {
throw new Error(`Upload failed: ${await response.text()}`);
}
}
async listDatasets() {
const url = `${BASE_URL}/api/${TENANT_ID}/${PROJECT_ID}/dataset`;
const response = await fetch(url, {
headers: { 'Authorization': `Bearer ${this.token}` }
});
return await response.json();
}
}
// Usage (browser)
const uploader = new DatasetUploader('your-auth-token');
const fileInput = document.getElementById('csvFile');
fileInput.addEventListener('change', async (e) => {
const file = e.target.files[0];
const result = await uploader.createFromCsv(
file,
'My Dataset',
'CaseID',
'Activity',
'Timestamp',
{ resourceColumn: 'User' }
);
console.log(`Created: ${result.datasetId}`);
console.log(`Cases: ${result.caseCount}, Events: ${result.eventCount}`);
});
Response Fields
| Field | Type | Description |
|---|---|---|
datasetId |
GUID | ID of the created dataset |
caseCount |
integer | Number of unique cases imported |
eventCount |
integer | Total number of events imported |
invalidValueCount |
integer | Number of invalid values encountered |
skippedRowsCount |
integer | Number of rows skipped due to errors |
errors |
array | List of error messages |
rowIssues |
array | Detailed information about row-level issues |
statusCode |
integer | HTTP status code |
Best Practices
- Validate Column Names: Ensure column names match exactly (case-sensitive)
- Check Culture Settings: Use appropriate culture for date/number formats
- Handle Large Files: Monitor upload progress for files approaching 1GB
- Review Row Issues: Check
rowIssuesarray for data quality problems - Unique Dataset Names: Dataset names must be unique within a project