Attribute-Activity Matrix

Overview

The Attribute-Activity Matrix calculator provides a comprehensive cross-tabulation showing the relationship between attributes and activities in your event log. For each combination of attribute and activity, it displays the number of cases that have values, helping administrators understand data completeness patterns and identify data quality issues.

IMPORTANT: This is an administrator-only calculator designed for technical analysis and data quality assessment. It generates a matrix showing how attributes are populated across different activities, which is essential for understanding data extraction patterns, identifying missing data, and validating event log structure.

This calculator is primarily used by system administrators and data quality specialists who need to understand attribute population patterns across process activities for troubleshooting, validation, or dataset optimization.

Common Uses

Identify which activities populate specific attributes to understand data flow through the process
Detect missing attribute values for specific activities that should have data
Validate that critical attributes are populated at the expected process stages
Diagnose data extraction issues by identifying systematic gaps in attribute population
Understand attribute dependencies on specific activities for ETL design
Document which activities contribute data to which attributes for technical specifications

Settings

This calculator requires no specific configuration settings. When executed, it automatically generates a matrix showing all attributes (both case-level and event-level) against all activities, with cell values indicating the number of cases where that attribute has a value for that activity.

Note: For datasets with many attributes and activities, this matrix can be very large. The calculator displays the complete matrix, which may require scrolling to review all combinations.

Examples

Example 1: Validating Approval Data Completeness

Scenario: You have implemented a new approval tracking system and need to verify that approval-related attributes are being populated correctly at each approval stage in your purchase order process.

Settings:

Title: "Approval Attribute Population Analysis"
Description: "Validate approval data capture across P2P process"

Output:

The calculator displays a matrix with activities as columns and attributes as rows. For the approval-related attributes, you see:

Attribute	L1 Approval	L2 Approval	Finance Approval
ApproverName	1,847	456	234
ApprovalLevel	1,847	456	234
ApprovalTimestamp	1,847	456	234
ApprovalComments	1,523	398	189
DelegatedBy	234	67	23

Insights: The matrix confirms that approval attributes are correctly populated only during approval activities (L1, L2, and Finance Approval), with zero population during other activities as expected. All 1,847 cases that reach L1 Approval have ApproverName, ApprovalLevel, and ApprovalTimestamp populated, indicating complete data capture. However, ApprovalComments shows lower population (1,523 cases instead of 1,847 at L1), revealing that 324 cases lack approval comments - this may be acceptable if comments are optional, but warrants investigation. The DelegatedBy attribute appears only for a subset of approvals, correctly capturing delegation scenarios.

Example 2: Identifying Data Extraction Gaps

Scenario: After merging data from multiple source systems in your order-to-cash process, you suspect that some attributes are not being populated consistently across all expected activities.

Settings:

Title: "Multi-Source Data Completeness Check"
Description: "Validate attribute population from CRM, ERP, and shipping systems"

Output:

Attribute	Create Order	Credit Check	Pick Items	Pack Items	Ship	Generate Invoice	Receive Payment
CustomerName	2,456	2,456	2,456	2,456	2,456	2,456	2,456
CreditScore	2,456	2,456	2,456	2,456	2,456	2,456	2,456
WarehouseLocation	0	0	2,456	2,456	2,456	0	0
CarrierName	0	0	0	0	2,456	0	0
TrackingNumber	0	0	0	0	2,234	0	0
InvoiceAmount	0	0	0	0	0	2,456	2,456
PaymentMethod	0	0	0	0	0	0	1,987

Insights: The matrix reveals several data quality issues. CustomerName and CreditScore are case-level attributes (populated across all activities for all cases), which is expected. WarehouseLocation correctly appears only for warehouse activities (Pick, Pack, Ship). However, TrackingNumber shows only 2,234 cases instead of the expected 2,456 at Ship activity, revealing that 222 shipments lack tracking numbers - a critical gap requiring investigation. PaymentMethod shows only 1,987 cases at Receive Payment instead of the expected 2,456, indicating that 469 payments lack payment method data, suggesting an integration issue with the payment system.

Example 3: Understanding Attribute Lifecycle

Scenario: You need to document when specific attributes become available during the process lifecycle to guide downstream analytics and reporting design.

Settings:

Title: "Attribute Lifecycle Documentation"
Description: "Map when each attribute is populated in invoice processing"

Output:

Attribute	Receive Invoice	Validate Invoice	Match to PO	Approve Payment	Schedule Payment	Make Payment	Close Case
InvoiceNumber	3,456	3,456	3,456	3,456	3,456	3,456	3,456
VendorID	3,456	3,456	3,456	3,456	3,456	3,456	3,456
PONumber	0	0	3,456	3,456	3,456	3,456	3,456
MatchStatus	0	0	3,456	3,456	3,456	3,456	3,456
ApprovedAmount	0	0	0	3,456	3,456	3,456	3,456
PaymentDate	0	0	0	0	3,456	3,456	3,456
ActualPaymentDate	0	0	0	0	0	3,456	3,456
ClosureReason	0	0	0	0	0	0	3,456

Insights: This matrix clearly shows the attribute lifecycle. InvoiceNumber and VendorID are populated from the beginning (case-level attributes set at invoice receipt). PONumber and MatchStatus become available only after the Match to PO activity, making them unavailable for earlier process stages. ApprovedAmount appears at Approve Payment and persists through subsequent activities. PaymentDate (scheduled date) appears at Schedule Payment, while ActualPaymentDate only appears at Make Payment, distinguishing planned from actual dates. ClosureReason is populated only at the final activity. This lifecycle understanding is critical for designing analytics that depend on specific attributes.

Example 4: Detecting Systematic Data Quality Issues

Scenario: Users report inconsistent data availability in analyses. You need to identify whether certain activities systematically fail to populate expected attributes.

Settings:

Title: "Systematic Data Gap Analysis"
Description: "Identify activities with missing attribute population"

Output:

Attribute	Verify Request	Assign Resource	Start Work	Quality Check	Complete Work	Document Results
RequestID	5,678	5,678	5,678	5,678	5,678	5,678
AssignedTo	0	5,678	5,678	5,678	5,678	5,678
WorkCategory	0	5,678	5,678	5,678	5,678	5,678
StartTime	0	0	5,678	5,678	5,678	5,678
QualityScore	0	0	0	4,234	4,234	4,234
CompletionNotes	0	0	0	0	5,678	5,678
DocumentationLink	0	0	0	0	0	3,456

Insights: The matrix reveals a critical data quality issue. QualityScore should be populated at Quality Check for all cases (5,678), but only 4,234 cases have this attribute, meaning 1,444 cases (25%) are missing quality scores. This is a systematic gap that could indicate a problem with the quality inspection system or data extraction. Additionally, DocumentationLink is missing for 2,222 cases (39%) at the Document Results activity, suggesting that documentation is being skipped for a significant portion of work. These systematic gaps need immediate attention to ensure data integrity.

Example 5: Validating Multi-System Integration

Scenario: Your process integrates data from three different systems (CRM, ERP, and logistics), and you need to verify that attributes from each system are correctly associated with the appropriate activities.

Settings:

Title: "Multi-System Integration Validation"
Description: "Verify attribute population from CRM, ERP, and logistics systems"

Output:

Attribute	Enter Order (CRM)	Reserve Inventory (ERP)	Allocate Stock (ERP)	Dispatch (Logistics)	Deliver (Logistics)	Confirm Receipt (CRM)
CustomerID (CRM)	8,945	8,945	8,945	8,945	8,945	8,945
SalesRepID (CRM)	8,945	8,945	8,945	8,945	8,945	8,945
SKU (ERP)	8,945	8,945	8,945	8,945	8,945	8,945
InventoryLocation (ERP)	0	8,945	8,945	8,945	8,945	8,945
StockLevel (ERP)	0	8,945	8,945	8,945	8,945	8,945
CarrierID (Logistics)	0	0	0	8,945	8,945	8,945
DeliveryStatus (Logistics)	0	0	0	8,945	8,945	8,945
ReceivedBy (CRM)	0	0	0	0	0	7,234

Insights: The matrix validates that most system integrations are working correctly. CRM attributes (CustomerID, SalesRepID) are available throughout the process as expected for case-level attributes. ERP attributes (InventoryLocation, StockLevel) correctly appear starting from Reserve Inventory activity. Logistics attributes (CarrierID, DeliveryStatus) properly appear from Dispatch onward. However, there is a significant issue with ReceivedBy attribute - only 7,234 cases out of 8,945 have this populated at Confirm Receipt, meaning 1,711 deliveries (19%) lack confirmation of who received the order. This requires investigation into the CRM confirmation workflow.

Example 6: Planning Attribute Enrichment Strategy

Scenario: You want to identify which attributes have sparse population and might benefit from enrichment with reference data or improved data capture processes.

Settings:

Title: "Attribute Enrichment Opportunity Analysis"
Description: "Identify sparse attributes needing enrichment"

Output:

Attribute	Submit Claim	Review Documents	Assess Damage	Approve Amount	Issue Payment	Close Claim
ClaimNumber	12,456	12,456	12,456	12,456	12,456	12,456
PolicyNumber	12,456	12,456	12,456	12,456	12,456	12,456
AdjusterID	0	12,456	12,456	12,456	12,456	12,456
AdjusterName	0	0	0	0	0	0
DamageCategory	0	0	12,456	12,456	12,456	12,456
EstimatedCost	0	0	12,456	12,456	12,456	12,456
ApprovalReason	0	0	0	12,456	12,456	12,456
PaymentMethodCode	0	0	0	0	12,456	12,456
PaymentMethodName	0	0	0	0	0	0

Insights: The matrix reveals excellent enrichment opportunities. AdjusterID is populated for all cases from Review Documents onward (12,456 cases), but AdjusterName is never populated. Enriching AdjusterID with adjuster names from an employee lookup table would make analyses more user-friendly. Similarly, PaymentMethodCode is populated for all payments (12,456 cases) but PaymentMethodName is missing. Enriching payment method codes with descriptive names would significantly improve reporting readability. These enrichments would add substantial value with minimal effort since the reference IDs are already present.

Output

The Attribute-Activity Matrix calculator displays a comprehensive matrix table with the following structure:

Rows: Each row represents one attribute from your event log, including both case-level attributes (which apply to the entire case) and event-level attributes (which may vary by activity).

Columns: Each column represents one unique activity from your process.

Cell Values: Each cell contains a number representing how many cases have a value for that attribute at that activity. A value of 0 means the attribute is not populated for any cases at that activity.

Understanding Cell Values

Case-Level Attributes: For case-level attributes (like CustomerID, OrderNumber, etc.), the cell value will be the same across all activities for that row, showing the total number of cases where the attribute has a value.

Event-Level Attributes: For event-level attributes (like ApproverName, WarehouseLocation, etc.), the cell values vary by activity, showing where in the process that attribute gets populated.

Zero Values: A cell value of 0 indicates that the attribute is never populated at that activity, which may be expected behavior or may indicate a data quality issue depending on your process.

Interactive Features

Sort and Filter: Click column headers to sort the matrix by activity. Use browser search to quickly locate specific attributes of interest.

Export Results: Export the complete matrix to Excel or CSV for detailed offline analysis, documentation, or sharing with technical teams.

Large Matrices: For processes with many activities and attributes, the matrix may be very large. Consider using horizontal and vertical scrolling to navigate the full matrix.

Interpreting Population Patterns

Consistent Population: If an attribute shows the same non-zero value across all activities, it is a case-level attribute populated early in the process.

Progressive Population: If an attribute shows zero values for early activities and non-zero values for later activities, it indicates the attribute is populated at a specific process stage.

Partial Population: If an attribute shows a value less than the total case count, some cases are missing that attribute, indicating potential data quality issues or optional fields.

Activity-Specific Population: If an attribute shows non-zero values only for specific activities, it is an event-level attribute relevant only to those activities.

Performance Considerations

Large Datasets: For datasets with hundreds of attributes and activities, this calculator may require significant time to process
Resource Usage: The calculator scans all attribute-activity combinations, which is computationally intensive
Best Practices: Run this calculator during off-peak hours for very large datasets

Administrative Access

This calculator is restricted to users with Administrator role. Regular users who need to understand dataset characteristics should use the Dataset Information calculator instead, which provides summary metrics without the detailed attribute-activity breakdown.

This documentation is part of the mindzie Studio process mining platform.