Dataset Updates

Update Existing Datasets

Update existing datasets with new data from CSV files, ZIP packages, or binary files. Updates preserve the dataset ID and associated configurations.

Update Dataset from CSV

PUT /api/{tenantId}/{projectId}/dataset/{datasetId}/csv

Replaces the data in an existing dataset with new data from a CSV file. The system automatically detects column mappings from the dataset configuration.

Path Parameters

Parameter Type Required Description
tenantId GUID Yes The tenant identifier
projectId GUID Yes The project identifier
datasetId GUID Yes The dataset identifier to update

Request (multipart/form-data)

Field Type Required Description
file file Yes CSV file with new data (max 1GB)
cultureInfo string No Culture for parsing (default: "en-US")

Response (200 OK)

{
  "datasetId": "550e8400-e29b-41d4-a716-446655440000",
  "caseCount": 5500,
  "eventCount": 165000,
  "invalidValueCount": 0,
  "skippedRowsCount": 0,
  "errors": [],
  "rowIssues": [],
  "statusCode": 200
}

Error Responses

Bad Request (400):

dataset with id '{datasetId}' not found
can't update '{datasetName}' because it's not an original dataset

Update Dataset from ZIP Package

PUT /api/{tenantId}/{projectId}/dataset/{datasetId}/package

Replaces the data in an existing dataset with new data from a ZIP package.

Request (multipart/form-data)

Field Type Required Description
file file Yes ZIP package file with new data (max 1GB)
cultureInfo string No Culture for parsing (default: "en-US")

Response (200 OK)

Same structure as CSV update response.

Error Response (422 Unprocessable Entity)

{
  "errors": ["Invalid package structure"],
  "rowIssues": [
    {
      "rowIndex": 15,
      "columnName": "Timestamp",
      "errorType": "ParseError",
      "outcome": "Skipped",
      "message": "Unable to parse date value"
    }
  ],
  "statusCode": 422
}

Update Dataset from Binary

PUT /api/{tenantId}/{projectId}/dataset/{datasetId}/binary

Replaces the data in an existing dataset with new data from a binary format file.

Request (multipart/form-data)

Field Type Required Description
file file Yes Binary file with new data (max 1GB)

Response (200 OK)

Same structure as CSV update response.

Update Restrictions

  • Original Datasets Only: Only original datasets can be updated. Datasets derived from filters or other transformations cannot be updated directly.
  • Preserve Configuration: Updates preserve the dataset ID and all associated configurations (notebooks, blocks, etc.)
  • Column Consistency: The new data should have the same column structure as the original dataset

Implementation Examples

cURL - Update from CSV

curl -X PUT "https://your-mindzie-instance.com/api/12345678-1234-1234-1234-123456789012/87654321-4321-4321-4321-210987654321/dataset/550e8400-e29b-41d4-a716-446655440000/csv" \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -F "file=@updated_event_log.csv" \
  -F "cultureInfo=en-US"

cURL - Update from ZIP Package

curl -X PUT "https://your-mindzie-instance.com/api/12345678-1234-1234-1234-123456789012/87654321-4321-4321-4321-210987654321/dataset/550e8400-e29b-41d4-a716-446655440000/package" \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -F "file=@updated_data_package.zip" \
  -F "cultureInfo=en-US"

Python

import requests

TENANT_ID = '12345678-1234-1234-1234-123456789012'
PROJECT_ID = '87654321-4321-4321-4321-210987654321'
BASE_URL = 'https://your-mindzie-instance.com'

class DatasetUpdater:
    def __init__(self, token):
        self.headers = {'Authorization': f'Bearer {token}'}

    def update_from_csv(self, dataset_id, file_path, culture='en-US'):
        """Update dataset from CSV file."""
        url = f'{BASE_URL}/api/{TENANT_ID}/{PROJECT_ID}/dataset/{dataset_id}/csv'

        with open(file_path, 'rb') as f:
            files = {'file': (file_path, f, 'text/csv')}
            data = {'cultureInfo': culture}

            response = requests.put(url, headers=self.headers, files=files, data=data)

        if response.ok:
            return response.json()
        else:
            raise Exception(f'Update failed: {response.text}')

    def update_from_package(self, dataset_id, file_path, culture='en-US'):
        """Update dataset from ZIP package."""
        url = f'{BASE_URL}/api/{TENANT_ID}/{PROJECT_ID}/dataset/{dataset_id}/package'

        with open(file_path, 'rb') as f:
            files = {'file': (file_path, f, 'application/zip')}
            data = {'cultureInfo': culture}

            response = requests.put(url, headers=self.headers, files=files, data=data)

        if response.ok:
            return response.json()
        elif response.status_code == 422:
            result = response.json()
            print(f"Validation errors: {result['errors']}")
            for issue in result.get('rowIssues', []):
                print(f"  Row {issue['rowIndex']}: {issue['message']}")
            raise Exception('Data validation failed')
        else:
            raise Exception(f'Update failed: {response.text}')

    def update_from_binary(self, dataset_id, file_path):
        """Update dataset from binary file."""
        url = f'{BASE_URL}/api/{TENANT_ID}/{PROJECT_ID}/dataset/{dataset_id}/binary'

        with open(file_path, 'rb') as f:
            files = {'file': (file_path, f, 'application/octet-stream')}

            response = requests.put(url, headers=self.headers, files=files)

        if response.ok:
            return response.json()
        else:
            raise Exception(f'Update failed: {response.text}')

# Usage
updater = DatasetUpdater('your-auth-token')

# Update from CSV
result = updater.update_from_csv(
    '550e8400-e29b-41d4-a716-446655440000',
    'updated_event_log.csv'
)
print(f"Updated dataset: {result['datasetId']}")
print(f"New case count: {result['caseCount']}")
print(f"New event count: {result['eventCount']}")

# Check for issues
if result['skippedRowsCount'] > 0:
    print(f"Warning: {result['skippedRowsCount']} rows were skipped")
if result['invalidValueCount'] > 0:
    print(f"Warning: {result['invalidValueCount']} invalid values found")

JavaScript/Node.js

const TENANT_ID = '12345678-1234-1234-1234-123456789012';
const PROJECT_ID = '87654321-4321-4321-4321-210987654321';
const BASE_URL = 'https://your-mindzie-instance.com';

class DatasetUpdater {
  constructor(token) {
    this.token = token;
  }

  async updateFromCsv(datasetId, file, culture = 'en-US') {
    const url = `${BASE_URL}/api/${TENANT_ID}/${PROJECT_ID}/dataset/${datasetId}/csv`;

    const formData = new FormData();
    formData.append('file', file);
    formData.append('cultureInfo', culture);

    const response = await fetch(url, {
      method: 'PUT',
      headers: { 'Authorization': `Bearer ${this.token}` },
      body: formData
    });

    if (response.ok) {
      return await response.json();
    } else if (response.status === 422) {
      const result = await response.json();
      throw new Error(`Validation failed: ${result.errors.join(', ')}`);
    } else {
      throw new Error(`Update failed: ${await response.text()}`);
    }
  }

  async updateFromPackage(datasetId, file, culture = 'en-US') {
    const url = `${BASE_URL}/api/${TENANT_ID}/${PROJECT_ID}/dataset/${datasetId}/package`;

    const formData = new FormData();
    formData.append('file', file);
    formData.append('cultureInfo', culture);

    const response = await fetch(url, {
      method: 'PUT',
      headers: { 'Authorization': `Bearer ${this.token}` },
      body: formData
    });

    if (response.ok) {
      return await response.json();
    } else {
      const error = await response.json();
      throw new Error(`Update failed: ${error.errors?.join(', ') || response.statusText}`);
    }
  }
}

// Usage (browser)
const updater = new DatasetUpdater('your-auth-token');
const fileInput = document.getElementById('updateFile');

fileInput.addEventListener('change', async (e) => {
  const file = e.target.files[0];
  const datasetId = '550e8400-e29b-41d4-a716-446655440000';

  try {
    const result = await updater.updateFromCsv(datasetId, file);

    console.log(`Updated: ${result.datasetId}`);
    console.log(`New cases: ${result.caseCount}`);
    console.log(`New events: ${result.eventCount}`);

    if (result.skippedRowsCount > 0) {
      console.warn(`Skipped ${result.skippedRowsCount} rows`);
    }
  } catch (error) {
    console.error('Update failed:', error.message);
  }
});

Response Fields

Field Type Description
datasetId GUID ID of the updated dataset
caseCount integer Number of unique cases in updated data
eventCount integer Total number of events in updated data
invalidValueCount integer Number of invalid values encountered
skippedRowsCount integer Number of rows skipped due to errors
errors array List of error messages
rowIssues array Detailed information about row-level issues
statusCode integer HTTP status code

Row Issue Structure

{
  "rowIndex": 15,
  "columnName": "Timestamp",
  "errorType": "ParseError",
  "outcome": "Skipped",
  "message": "Unable to parse date value '2024-13-45'"
}
Field Description
rowIndex Row number with the issue
columnName Column containing the problematic value
errorType Type of error (ParseError, ValidationError, etc.)
outcome What happened (Skipped, DefaultValue, etc.)
message Human-readable error description

Best Practices

  • Backup First: Consider exporting current data before updates
  • Verify Structure: Ensure new data has the same column structure
  • Check Results: Review rowIssues and skippedRowsCount after updates
  • Test First: Test updates on a non-production dataset first
  • Culture Settings: Use the correct culture for date and number formats