Dataset Updates
Update Existing Datasets
Update existing datasets with new data from CSV files, ZIP packages, or binary files. Updates preserve the dataset ID and associated configurations.
Update Dataset from CSV
PUT /api/{tenantId}/{projectId}/dataset/{datasetId}/csv
Replaces the data in an existing dataset with new data from a CSV file. The system automatically detects column mappings from the dataset configuration.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
tenantId |
GUID | Yes | The tenant identifier |
projectId |
GUID | Yes | The project identifier |
datasetId |
GUID | Yes | The dataset identifier to update |
Request (multipart/form-data)
| Field | Type | Required | Description |
|---|---|---|---|
file |
file | Yes | CSV file with new data (max 1GB) |
cultureInfo |
string | No | Culture for parsing (default: "en-US") |
Response (200 OK)
{
"datasetId": "550e8400-e29b-41d4-a716-446655440000",
"caseCount": 5500,
"eventCount": 165000,
"invalidValueCount": 0,
"skippedRowsCount": 0,
"errors": [],
"rowIssues": [],
"statusCode": 200
}
Error Responses
Bad Request (400):
dataset with id '{datasetId}' not found
can't update '{datasetName}' because it's not an original dataset
Update Dataset from ZIP Package
PUT /api/{tenantId}/{projectId}/dataset/{datasetId}/package
Replaces the data in an existing dataset with new data from a ZIP package.
Request (multipart/form-data)
| Field | Type | Required | Description |
|---|---|---|---|
file |
file | Yes | ZIP package file with new data (max 1GB) |
cultureInfo |
string | No | Culture for parsing (default: "en-US") |
Response (200 OK)
Same structure as CSV update response.
Error Response (422 Unprocessable Entity)
{
"errors": ["Invalid package structure"],
"rowIssues": [
{
"rowIndex": 15,
"columnName": "Timestamp",
"errorType": "ParseError",
"outcome": "Skipped",
"message": "Unable to parse date value"
}
],
"statusCode": 422
}
Update Dataset from Binary
PUT /api/{tenantId}/{projectId}/dataset/{datasetId}/binary
Replaces the data in an existing dataset with new data from a binary format file.
Request (multipart/form-data)
| Field | Type | Required | Description |
|---|---|---|---|
file |
file | Yes | Binary file with new data (max 1GB) |
Response (200 OK)
Same structure as CSV update response.
Update Restrictions
- Original Datasets Only: Only original datasets can be updated. Datasets derived from filters or other transformations cannot be updated directly.
- Preserve Configuration: Updates preserve the dataset ID and all associated configurations (notebooks, blocks, etc.)
- Column Consistency: The new data should have the same column structure as the original dataset
Implementation Examples
cURL - Update from CSV
curl -X PUT "https://your-mindzie-instance.com/api/12345678-1234-1234-1234-123456789012/87654321-4321-4321-4321-210987654321/dataset/550e8400-e29b-41d4-a716-446655440000/csv" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-F "file=@updated_event_log.csv" \
-F "cultureInfo=en-US"
cURL - Update from ZIP Package
curl -X PUT "https://your-mindzie-instance.com/api/12345678-1234-1234-1234-123456789012/87654321-4321-4321-4321-210987654321/dataset/550e8400-e29b-41d4-a716-446655440000/package" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-F "file=@updated_data_package.zip" \
-F "cultureInfo=en-US"
Python
import requests
TENANT_ID = '12345678-1234-1234-1234-123456789012'
PROJECT_ID = '87654321-4321-4321-4321-210987654321'
BASE_URL = 'https://your-mindzie-instance.com'
class DatasetUpdater:
def __init__(self, token):
self.headers = {'Authorization': f'Bearer {token}'}
def update_from_csv(self, dataset_id, file_path, culture='en-US'):
"""Update dataset from CSV file."""
url = f'{BASE_URL}/api/{TENANT_ID}/{PROJECT_ID}/dataset/{dataset_id}/csv'
with open(file_path, 'rb') as f:
files = {'file': (file_path, f, 'text/csv')}
data = {'cultureInfo': culture}
response = requests.put(url, headers=self.headers, files=files, data=data)
if response.ok:
return response.json()
else:
raise Exception(f'Update failed: {response.text}')
def update_from_package(self, dataset_id, file_path, culture='en-US'):
"""Update dataset from ZIP package."""
url = f'{BASE_URL}/api/{TENANT_ID}/{PROJECT_ID}/dataset/{dataset_id}/package'
with open(file_path, 'rb') as f:
files = {'file': (file_path, f, 'application/zip')}
data = {'cultureInfo': culture}
response = requests.put(url, headers=self.headers, files=files, data=data)
if response.ok:
return response.json()
elif response.status_code == 422:
result = response.json()
print(f"Validation errors: {result['errors']}")
for issue in result.get('rowIssues', []):
print(f" Row {issue['rowIndex']}: {issue['message']}")
raise Exception('Data validation failed')
else:
raise Exception(f'Update failed: {response.text}')
def update_from_binary(self, dataset_id, file_path):
"""Update dataset from binary file."""
url = f'{BASE_URL}/api/{TENANT_ID}/{PROJECT_ID}/dataset/{dataset_id}/binary'
with open(file_path, 'rb') as f:
files = {'file': (file_path, f, 'application/octet-stream')}
response = requests.put(url, headers=self.headers, files=files)
if response.ok:
return response.json()
else:
raise Exception(f'Update failed: {response.text}')
# Usage
updater = DatasetUpdater('your-auth-token')
# Update from CSV
result = updater.update_from_csv(
'550e8400-e29b-41d4-a716-446655440000',
'updated_event_log.csv'
)
print(f"Updated dataset: {result['datasetId']}")
print(f"New case count: {result['caseCount']}")
print(f"New event count: {result['eventCount']}")
# Check for issues
if result['skippedRowsCount'] > 0:
print(f"Warning: {result['skippedRowsCount']} rows were skipped")
if result['invalidValueCount'] > 0:
print(f"Warning: {result['invalidValueCount']} invalid values found")
JavaScript/Node.js
const TENANT_ID = '12345678-1234-1234-1234-123456789012';
const PROJECT_ID = '87654321-4321-4321-4321-210987654321';
const BASE_URL = 'https://your-mindzie-instance.com';
class DatasetUpdater {
constructor(token) {
this.token = token;
}
async updateFromCsv(datasetId, file, culture = 'en-US') {
const url = `${BASE_URL}/api/${TENANT_ID}/${PROJECT_ID}/dataset/${datasetId}/csv`;
const formData = new FormData();
formData.append('file', file);
formData.append('cultureInfo', culture);
const response = await fetch(url, {
method: 'PUT',
headers: { 'Authorization': `Bearer ${this.token}` },
body: formData
});
if (response.ok) {
return await response.json();
} else if (response.status === 422) {
const result = await response.json();
throw new Error(`Validation failed: ${result.errors.join(', ')}`);
} else {
throw new Error(`Update failed: ${await response.text()}`);
}
}
async updateFromPackage(datasetId, file, culture = 'en-US') {
const url = `${BASE_URL}/api/${TENANT_ID}/${PROJECT_ID}/dataset/${datasetId}/package`;
const formData = new FormData();
formData.append('file', file);
formData.append('cultureInfo', culture);
const response = await fetch(url, {
method: 'PUT',
headers: { 'Authorization': `Bearer ${this.token}` },
body: formData
});
if (response.ok) {
return await response.json();
} else {
const error = await response.json();
throw new Error(`Update failed: ${error.errors?.join(', ') || response.statusText}`);
}
}
}
// Usage (browser)
const updater = new DatasetUpdater('your-auth-token');
const fileInput = document.getElementById('updateFile');
fileInput.addEventListener('change', async (e) => {
const file = e.target.files[0];
const datasetId = '550e8400-e29b-41d4-a716-446655440000';
try {
const result = await updater.updateFromCsv(datasetId, file);
console.log(`Updated: ${result.datasetId}`);
console.log(`New cases: ${result.caseCount}`);
console.log(`New events: ${result.eventCount}`);
if (result.skippedRowsCount > 0) {
console.warn(`Skipped ${result.skippedRowsCount} rows`);
}
} catch (error) {
console.error('Update failed:', error.message);
}
});
Response Fields
| Field | Type | Description |
|---|---|---|
datasetId |
GUID | ID of the updated dataset |
caseCount |
integer | Number of unique cases in updated data |
eventCount |
integer | Total number of events in updated data |
invalidValueCount |
integer | Number of invalid values encountered |
skippedRowsCount |
integer | Number of rows skipped due to errors |
errors |
array | List of error messages |
rowIssues |
array | Detailed information about row-level issues |
statusCode |
integer | HTTP status code |
Row Issue Structure
{
"rowIndex": 15,
"columnName": "Timestamp",
"errorType": "ParseError",
"outcome": "Skipped",
"message": "Unable to parse date value '2024-13-45'"
}
| Field | Description |
|---|---|
rowIndex |
Row number with the issue |
columnName |
Column containing the problematic value |
errorType |
Type of error (ParseError, ValidationError, etc.) |
outcome |
What happened (Skipped, DefaultValue, etc.) |
message |
Human-readable error description |
Best Practices
- Backup First: Consider exporting current data before updates
- Verify Structure: Ensure new data has the same column structure
- Check Results: Review
rowIssuesandskippedRowsCountafter updates - Test First: Test updates on a non-production dataset first
- Culture Settings: Use the correct culture for date and number formats