Overview
The Attribute-Activity Matrix calculator provides a comprehensive cross-tabulation showing the relationship between attributes and activities in your event log. For each combination of attribute and activity, it displays the number of cases that have values, helping administrators understand data completeness patterns and identify data quality issues.
IMPORTANT: This is an administrator-only calculator designed for technical analysis and data quality assessment. It generates a matrix showing how attributes are populated across different activities, which is essential for understanding data extraction patterns, identifying missing data, and validating event log structure.
This calculator is primarily used by system administrators and data quality specialists who need to understand attribute population patterns across process activities for troubleshooting, validation, or dataset optimization.
Common Uses
- Identify which activities populate specific attributes to understand data flow through the process
- Detect missing attribute values for specific activities that should have data
- Validate that critical attributes are populated at the expected process stages
- Diagnose data extraction issues by identifying systematic gaps in attribute population
- Understand attribute dependencies on specific activities for ETL design
- Document which activities contribute data to which attributes for technical specifications
Settings
This calculator requires no specific configuration settings. When executed, it automatically generates a matrix showing all attributes (both case-level and event-level) against all activities, with cell values indicating the number of cases where that attribute has a value for that activity.
Note: For datasets with many attributes and activities, this matrix can be very large. The calculator displays the complete matrix, which may require scrolling to review all combinations.
Examples
Example 1: Validating Approval Data Completeness
Scenario: You have implemented a new approval tracking system and need to verify that approval-related attributes are being populated correctly at each approval stage in your purchase order process.
Settings:
- Title: "Approval Attribute Population Analysis"
- Description: "Validate approval data capture across P2P process"
Output:
The calculator displays a matrix with activities as columns and attributes as rows. For the approval-related attributes, you see:
| Attribute | Create PO | Submit for Approval | L1 Approval | L2 Approval | Finance Approval | Send to Vendor |
|---|---|---|---|---|---|---|
| ApproverName | 0 | 0 | 1,847 | 456 | 234 | 0 |
| ApprovalLevel | 0 | 0 | 1,847 | 456 | 234 | 0 |
| ApprovalTimestamp | 0 | 0 | 1,847 | 456 | 234 | 0 |
| ApprovalComments | 0 | 0 | 1,523 | 398 | 189 | 0 |
| DelegatedBy | 0 | 0 | 234 | 67 | 23 | 0 |
Insights: The matrix confirms that approval attributes are correctly populated only during approval activities (L1, L2, and Finance Approval), with zero population during other activities as expected. All 1,847 cases that reach L1 Approval have ApproverName, ApprovalLevel, and ApprovalTimestamp populated, indicating complete data capture. However, ApprovalComments shows lower population (1,523 cases instead of 1,847 at L1), revealing that 324 cases lack approval comments - this may be acceptable if comments are optional, but warrants investigation. The DelegatedBy attribute appears only for a subset of approvals, correctly capturing delegation scenarios.
Example 2: Identifying Data Extraction Gaps
Scenario: After merging data from multiple source systems in your order-to-cash process, you suspect that some attributes are not being populated consistently across all expected activities.
Settings:
- Title: "Multi-Source Data Completeness Check"
- Description: "Validate attribute population from CRM, ERP, and shipping systems"
Output:
| Attribute | Create Order | Credit Check | Pick Items | Pack Items | Ship | Generate Invoice | Receive Payment |
|---|---|---|---|---|---|---|---|
| CustomerName | 2,456 | 2,456 | 2,456 | 2,456 | 2,456 | 2,456 | 2,456 |
| CreditScore | 2,456 | 2,456 | 2,456 | 2,456 | 2,456 | 2,456 | 2,456 |
| WarehouseLocation | 0 | 0 | 2,456 | 2,456 | 2,456 | 0 | 0 |
| CarrierName | 0 | 0 | 0 | 0 | 2,456 | 0 | 0 |
| TrackingNumber | 0 | 0 | 0 | 0 | 2,234 | 0 | 0 |
| InvoiceAmount | 0 | 0 | 0 | 0 | 0 | 2,456 | 2,456 |
| PaymentMethod | 0 | 0 | 0 | 0 | 0 | 0 | 1,987 |
Insights: The matrix reveals several data quality issues. CustomerName and CreditScore are case-level attributes (populated across all activities for all cases), which is expected. WarehouseLocation correctly appears only for warehouse activities (Pick, Pack, Ship). However, TrackingNumber shows only 2,234 cases instead of the expected 2,456 at Ship activity, revealing that 222 shipments lack tracking numbers - a critical gap requiring investigation. PaymentMethod shows only 1,987 cases at Receive Payment instead of the expected 2,456, indicating that 469 payments lack payment method data, suggesting an integration issue with the payment system.
Example 3: Understanding Attribute Lifecycle
Scenario: You need to document when specific attributes become available during the process lifecycle to guide downstream analytics and reporting design.
Settings:
- Title: "Attribute Lifecycle Documentation"
- Description: "Map when each attribute is populated in invoice processing"
Output:
| Attribute | Receive Invoice | Validate Invoice | Match to PO | Approve Payment | Schedule Payment | Make Payment | Close Case |
|---|---|---|---|---|---|---|---|
| InvoiceNumber | 3,456 | 3,456 | 3,456 | 3,456 | 3,456 | 3,456 | 3,456 |
| VendorID | 3,456 | 3,456 | 3,456 | 3,456 | 3,456 | 3,456 | 3,456 |
| PONumber | 0 | 0 | 3,456 | 3,456 | 3,456 | 3,456 | 3,456 |
| MatchStatus | 0 | 0 | 3,456 | 3,456 | 3,456 | 3,456 | 3,456 |
| ApprovedAmount | 0 | 0 | 0 | 3,456 | 3,456 | 3,456 | 3,456 |
| PaymentDate | 0 | 0 | 0 | 0 | 3,456 | 3,456 | 3,456 |
| ActualPaymentDate | 0 | 0 | 0 | 0 | 0 | 3,456 | 3,456 |
| ClosureReason | 0 | 0 | 0 | 0 | 0 | 0 | 3,456 |
Insights: This matrix clearly shows the attribute lifecycle. InvoiceNumber and VendorID are populated from the beginning (case-level attributes set at invoice receipt). PONumber and MatchStatus become available only after the Match to PO activity, making them unavailable for earlier process stages. ApprovedAmount appears at Approve Payment and persists through subsequent activities. PaymentDate (scheduled date) appears at Schedule Payment, while ActualPaymentDate only appears at Make Payment, distinguishing planned from actual dates. ClosureReason is populated only at the final activity. This lifecycle understanding is critical for designing analytics that depend on specific attributes.
Example 4: Detecting Systematic Data Quality Issues
Scenario: Users report inconsistent data availability in analyses. You need to identify whether certain activities systematically fail to populate expected attributes.
Settings:
- Title: "Systematic Data Gap Analysis"
- Description: "Identify activities with missing attribute population"
Output:
| Attribute | Verify Request | Assign Resource | Start Work | Quality Check | Complete Work | Document Results |
|---|---|---|---|---|---|---|
| RequestID | 5,678 | 5,678 | 5,678 | 5,678 | 5,678 | 5,678 |
| AssignedTo | 0 | 5,678 | 5,678 | 5,678 | 5,678 | 5,678 |
| WorkCategory | 0 | 5,678 | 5,678 | 5,678 | 5,678 | 5,678 |
| StartTime | 0 | 0 | 5,678 | 5,678 | 5,678 | 5,678 |
| QualityScore | 0 | 0 | 0 | 4,234 | 4,234 | 4,234 |
| CompletionNotes | 0 | 0 | 0 | 0 | 5,678 | 5,678 |
| DocumentationLink | 0 | 0 | 0 | 0 | 0 | 3,456 |
Insights: The matrix reveals a critical data quality issue. QualityScore should be populated at Quality Check for all cases (5,678), but only 4,234 cases have this attribute, meaning 1,444 cases (25%) are missing quality scores. This is a systematic gap that could indicate a problem with the quality inspection system or data extraction. Additionally, DocumentationLink is missing for 2,222 cases (39%) at the Document Results activity, suggesting that documentation is being skipped for a significant portion of work. These systematic gaps need immediate attention to ensure data integrity.
Example 5: Validating Multi-System Integration
Scenario: Your process integrates data from three different systems (CRM, ERP, and logistics), and you need to verify that attributes from each system are correctly associated with the appropriate activities.
Settings:
- Title: "Multi-System Integration Validation"
- Description: "Verify attribute population from CRM, ERP, and logistics systems"
Output:
| Attribute | Enter Order (CRM) | Reserve Inventory (ERP) | Allocate Stock (ERP) | Dispatch (Logistics) | Deliver (Logistics) | Confirm Receipt (CRM) |
|---|---|---|---|---|---|---|
| CustomerID (CRM) | 8,945 | 8,945 | 8,945 | 8,945 | 8,945 | 8,945 |
| SalesRepID (CRM) | 8,945 | 8,945 | 8,945 | 8,945 | 8,945 | 8,945 |
| SKU (ERP) | 8,945 | 8,945 | 8,945 | 8,945 | 8,945 | 8,945 |
| InventoryLocation (ERP) | 0 | 8,945 | 8,945 | 8,945 | 8,945 | 8,945 |
| StockLevel (ERP) | 0 | 8,945 | 8,945 | 8,945 | 8,945 | 8,945 |
| CarrierID (Logistics) | 0 | 0 | 0 | 8,945 | 8,945 | 8,945 |
| DeliveryStatus (Logistics) | 0 | 0 | 0 | 8,945 | 8,945 | 8,945 |
| ReceivedBy (CRM) | 0 | 0 | 0 | 0 | 0 | 7,234 |
Insights: The matrix validates that most system integrations are working correctly. CRM attributes (CustomerID, SalesRepID) are available throughout the process as expected for case-level attributes. ERP attributes (InventoryLocation, StockLevel) correctly appear starting from Reserve Inventory activity. Logistics attributes (CarrierID, DeliveryStatus) properly appear from Dispatch onward. However, there is a significant issue with ReceivedBy attribute - only 7,234 cases out of 8,945 have this populated at Confirm Receipt, meaning 1,711 deliveries (19%) lack confirmation of who received the order. This requires investigation into the CRM confirmation workflow.
Example 6: Planning Attribute Enrichment Strategy
Scenario: You want to identify which attributes have sparse population and might benefit from enrichment with reference data or improved data capture processes.
Settings:
- Title: "Attribute Enrichment Opportunity Analysis"
- Description: "Identify sparse attributes needing enrichment"
Output:
| Attribute | Submit Claim | Review Documents | Assess Damage | Approve Amount | Issue Payment | Close Claim |
|---|---|---|---|---|---|---|
| ClaimNumber | 12,456 | 12,456 | 12,456 | 12,456 | 12,456 | 12,456 |
| PolicyNumber | 12,456 | 12,456 | 12,456 | 12,456 | 12,456 | 12,456 |
| AdjusterID | 0 | 12,456 | 12,456 | 12,456 | 12,456 | 12,456 |
| AdjusterName | 0 | 0 | 0 | 0 | 0 | 0 |
| DamageCategory | 0 | 0 | 12,456 | 12,456 | 12,456 | 12,456 |
| EstimatedCost | 0 | 0 | 12,456 | 12,456 | 12,456 | 12,456 |
| ApprovalReason | 0 | 0 | 0 | 12,456 | 12,456 | 12,456 |
| PaymentMethodCode | 0 | 0 | 0 | 0 | 12,456 | 12,456 |
| PaymentMethodName | 0 | 0 | 0 | 0 | 0 | 0 |
Insights: The matrix reveals excellent enrichment opportunities. AdjusterID is populated for all cases from Review Documents onward (12,456 cases), but AdjusterName is never populated. Enriching AdjusterID with adjuster names from an employee lookup table would make analyses more user-friendly. Similarly, PaymentMethodCode is populated for all payments (12,456 cases) but PaymentMethodName is missing. Enriching payment method codes with descriptive names would significantly improve reporting readability. These enrichments would add substantial value with minimal effort since the reference IDs are already present.
Output
The Attribute-Activity Matrix calculator displays a comprehensive matrix table with the following structure:
Rows: Each row represents one attribute from your event log, including both case-level attributes (which apply to the entire case) and event-level attributes (which may vary by activity).
Columns: Each column represents one unique activity from your process.
Cell Values: Each cell contains a number representing how many cases have a value for that attribute at that activity. A value of 0 means the attribute is not populated for any cases at that activity.
Understanding Cell Values
Case-Level Attributes: For case-level attributes (like CustomerID, OrderNumber, etc.), the cell value will be the same across all activities for that row, showing the total number of cases where the attribute has a value.
Event-Level Attributes: For event-level attributes (like ApproverName, WarehouseLocation, etc.), the cell values vary by activity, showing where in the process that attribute gets populated.
Zero Values: A cell value of 0 indicates that the attribute is never populated at that activity, which may be expected behavior or may indicate a data quality issue depending on your process.
Interactive Features
Sort and Filter: Click column headers to sort the matrix by activity. Use browser search to quickly locate specific attributes of interest.
Export Results: Export the complete matrix to Excel or CSV for detailed offline analysis, documentation, or sharing with technical teams.
Large Matrices: For processes with many activities and attributes, the matrix may be very large. Consider using horizontal and vertical scrolling to navigate the full matrix.
Interpreting Population Patterns
Consistent Population: If an attribute shows the same non-zero value across all activities, it is a case-level attribute populated early in the process.
Progressive Population: If an attribute shows zero values for early activities and non-zero values for later activities, it indicates the attribute is populated at a specific process stage.
Partial Population: If an attribute shows a value less than the total case count, some cases are missing that attribute, indicating potential data quality issues or optional fields.
Activity-Specific Population: If an attribute shows non-zero values only for specific activities, it is an event-level attribute relevant only to those activities.
Performance Considerations
- Large Datasets: For datasets with hundreds of attributes and activities, this calculator may require significant time to process
- Resource Usage: The calculator scans all attribute-activity combinations, which is computationally intensive
- Best Practices: Run this calculator during off-peak hours for very large datasets
Administrative Access
This calculator is restricted to users with Administrator role. Regular users who need to understand dataset characteristics should use the Dataset Information calculator instead, which provides summary metrics without the detailed attribute-activity breakdown.
This documentation is part of the mindzie Studio process mining platform.