Snapshot Analysis
sz_snapshot takes a point-in-time snapshot of the entity resolution results and generates summary reports. These reports answer high-level questions about the data: how many entities exist, how records are distributed across data sources, and where cross-source matches occur.
ENTITY_ID values in your database will most likely differ from those shown here, as they depend on load order. Use the ENTITY_ID values returned by your commands in subsequent steps. If you are using the truth set, DATA_SOURCE and RECORD_ID values will be the same.Viewing snapshot results in sz_explorer
To view the snapshot reports interactively, load the snapshot file when starting sz_explorer:
sz_explorer -s truthset_snapshot.json
Or load it after sz_explorer is already running:
load truthset_snapshot.json
This unlocks the snapshot report commands: data_source_summary, cross_source_summary, entity_source_summary, entity_size_breakdown, and principles_used. Each command displays a table with a prompt to drill into specific rows for more detail.
data_source_summary
The data_source_summary command shows how records from each data source resolved into entities.

| Column | Description |
|---|---|
| Data Source | Name of the data source |
| Records | Total number of records from this data source |
| Entities | Number of distinct entities these records resolved to |
| Compression | Percentage of records that were duplicates (higher means more deduplication) |
| Matched Records | Number of records that matched at least one other record |
| Matched Entities | Number of entities containing matched records |
| Ambiguous Matches | Entities where a record could plausibly belong to more than one entity |
| Possible Matches | Entities sharing important attributes but with some disagreements |
| Possible Relationships | Entities related through lesser attributes like shared addresses |
An entity count significantly lower than the record count indicates many records belong to the same real-world person or organization. The Compression column shows the percentage of records that were duplicates within each data source.
sz_explorer
to drill into specific entities and understand why records resolved together.Selecting a data source drills into its match level breakdown:

This shows how the matched CUSTOMERS records break down by match level. Most matches are high-confidence Matches, with smaller counts for Possible Matches and Possible Relationships.
Selecting a match level drills into the match keys that fired:

Each match key shows which combination of attributes caused the match. For example, +NAME+DOB means a name and date of birth matched, while +NAME+ADDRESS means a name and address matched.
Selecting a match key drills into individual entities:

The entity detail view shows all records resolved to this entity, including the specific attribute values that drove the match.
cross_source_summary
The cross_source_summary command shows matches between records from different data sources, revealing connections that span data silos.

| Column | Description |
|---|---|
| From Data Source | The originating data source in the cross-source pair |
| To Data Source | The target data source in the cross-source pair |
| Matched Records | Number of records that matched across these two data sources |
| Matched Entities | Number of entities containing cross-source matched records |
| Ambiguous Matches | Cross-source matches where entity membership is uncertain |
| Possible Matches | Cross-source entities sharing important attributes but with disagreements |
| Possible Relationships | Cross-source entities related through lesser attributes |
Cross-source matches drive compliance screening, fraud detection, and risk assessment workflows. For example, CUSTOMERS-to-WATCHLIST matches represent customer records that resolved to the same entity as a known risk entry.
Selecting a data source pair drills into its match level breakdown:

The CUSTOMERS-to-WATCHLIST pair shows the count of matches at each confidence level. High-confidence Matches indicate records that Senzing is confident belong to the same real-world entity across these two data sources.
Selecting a match level drills into the match keys:

The match keys show which attributes drove each cross-source match and what types of identifying information connect records across data sources.
Selecting a match key drills into individual entities:

This entity resolved records from both CUSTOMERS and WATCHLIST, meaning the same real-world person appeared in both data sources.
entity_source_summary
The entity_source_summary command groups entities by which combination of data sources contributed records.

| Column | Description |
|---|---|
| Data Sources | The combination of data sources that contributed records to entities in this group |
| Entities | Number of entities composed of records from exactly this combination of sources |
Selecting a row drills into the entities for that source combination, showing a paginated entity list with full entity detail:

entity_size_breakdown
The entity_size_breakdown command shows the distribution of how many records make up each entity.

| Column | Description |
|---|---|
| Size Group | Number of records per entity in this group |
| Entity Count | Number of entities with this many records |
| Review Count | Number of entities flagged for review due to feature anomalies |
| Review Features | Feature types that triggered the review flag (e.g., GENDER, DOB, ADDRESS) |
The Entity Count column shows how many entities exist at each size. Entities in Size Group 1 are “singletons,” records that did not match any other record in the system. Large entities (those with many records) should be reviewed. They may represent:
- Legitimate matches: A long-time customer with records across multiple systems and name/address changes over the years.
- Over-resolution: Records that Senzing matched but that belong to different people. Use the
howcommand in sz_explorer to review how records joined an entity step by step. - Data quality issues: Duplicate submissions, test records, or data entry errors inflating entity size.
The Review Count and Review Features columns flag entities that contain more of a specific attribute than expected. For example, 5 entities in Size Group 2 have conflicting GENDER values, 1 entity in Size Group 4 has conflicting DOB values, and 1 entity in Size Group 5 has conflicting ADDRESS values.
Entities flagged in the Review Features column typically indicate one of three situations:
- Data quality problems: Typos, misspellings, or bad data causing attribute conflicts within a legitimately resolved entity.
- Intentional obfuscation: Someone altered identifying information to avoid detection, resulting in conflicting attributes.
- Overmatching: Records that belong to separate people were incorrectly resolved into the same entity.
Selecting a row drills into the entities of that size, showing a paginated entity list with full entity detail:

From the entity detail view, press H to run how and see the step-by-step resolution path:

The how decision tree shows how each record entered the entity, including the MATCH_KEY and principle that fired at each step. This is useful for large or flagged entities to see which attributes drove the resolution. For a full walkthrough of the how command and its different views, see Using how
.
principles_used
The principles_used report shows how many entity relationships were established at each match level. Senzing performs Principle Based Entity Resolution
where each match level represents a different level of confidence.

| Column | Description |
|---|---|
| Match level | The confidence level at which entities were related |
| Count | Number of entity relationships at this match level |
Selecting a match level drills into the specific principles, then into match keys, and finally into individual entity detail:

| Match Level | What It Means |
|---|---|
| Matches | Records resolved to the same entity with high confidence. |
| Possible matches | Records that share enough in common to warrant review but did not resolve. |
| Possibly related | Records that appear to be related (such as family members) but are distinct entities. |
| Ambiguous matches | Records that could plausibly belong to the same entity but the evidence is not definitive. |
| Disclosed relations | Known relationships declared in the input data (for example, a company and its subsidiaries). |
Using snapshot analysis effectively
Snapshot reports are most valuable when compared over time or against known baselines:
- Initial assessment: Run a snapshot after loading data to establish a baseline.
- After changes: After updating data mapping or adjusting entity resolution configuration with Senzing Support , take a new snapshot to measure the impact.
- Ongoing monitoring: Periodic snapshots track data quality trends as new records are added and existing records are updated or deleted.
Next steps
If you have any questions, contact Senzing Support. Support is 100% FREE!