Entity Exploration

Search, retrieve, and investigate entities interactively with sz_explorer.

This page assumes Senzing is already installed and the truth set demo data is loaded. If not, start with the Loading the Truth Set page.

The examples and screenshots on this page are based on the truth set demo data. ENTITY_ID values in your database will most likely differ from those shown here, as they depend on load order. Use the ENTITY_ID values returned by your commands in subsequent steps. If you are using the truth set, DATA_SOURCE and RECORD_ID values will be the same.

The following examples demonstrate basic exploration of the truth set using sz_explorer, located in the Senzing project’s /bin directory.

Run sz_explorer from the command line.

Run help to view the available commands.

Screenshot

This page focuses on the adhoc commands listed above. These commands can be used at any time, with or without a snapshot. Common use cases include:

Investigating how a match was made or why one was not, in support of end-user inquiries.
Exporting data and capturing screenshots when reporting issues to Senzing Support .

The snapshot and audit commands shown in the help output are covered in Snapshot Analysis and Auditing .

Senzing is an SDK, and sz_explorer is a tool built using it. Incorporate searching and retrieving records into workflows for making decisions, or into a UI to display entities to end users and allow them to ask why and how. Type show_last_call after any command to see what call was made to Senzing, what flags were used, and what Senzing returned.

Using search

Run help search to view the usage details.

All commands include usage help with examples.

Screenshot

Run search robert smith to find matching entities:

Screenshot

The search returns 3 entities that satisfy the search criteria:

"ENTITY_ID": 3 is a Robert Smith with 4 "DATA_SOURCE": "CUSTOMERS" records.
"ENTITY_ID": 200002 is a Robert Smith from "DATA_SOURCE": "WATCHLIST".
"ENTITY_ID": 16 is a Robert E Smith Sr whose records come from "DATA_SOURCE": "CUSTOMERS" and "DATA_SOURCE": "WATCHLIST".

In addition:

The MATCH_KEY column shows which attributes matched (+NAME) and which principle was satisfied. Senzing performs Principle Based Entity Resolution . Principles are covered in detail later; the most important part at this stage is the MATCH_KEY.
The match score is a simple scoring algorithm that ensures the strongest matches appear first. It sums the scores of each searched attribute, giving more weight to the name. Since this search used only a name, the match score equals the name score.

Using compare

Run compare search to see the entities returned from the prior search side-by-side.

Screenshot

The side-by-side view shows:

The two customers share the same ADDRESS, but their DOB values are about 24 years apart. The difference is consistent with a father and son relationship.
The "DATA_SOURCE": "WATCHLIST" entity on the end does not appear to be related to either customer.
A status line like lines 1-46/46 (END) in the lower left corner indicates a scrolling window. Use the arrow keys to navigate and press Q to quit.

When the compare table is wider than the terminal window, it opens in a scrollable pager. Use the left and right arrow keys to scroll horizontally and see all columns:

compare with horizontal scrolling

To retrieve a specific entity, use the get command.

Using get

Run help get to view the usage details.

Screenshot

Entity detail

Run get 3 detail to retrieve the detail view of "ENTITY_ID": 3.

Screenshot

The output starts with a grid of the records that belong to the entity:

The first column shows which data source and record IDs were resolved to this entity as well as the MATCH_KEY and rule that fired when the record was loaded.
The second column shows the data on each record that was used for resolution. These are the features used for resolution, such as NAME, DOB, ADDRESS, EMAIL, PHONE, and other identifiers.
The third column shows all the other data for each record.

Underneath the records is a tree view of the entities related to this one by match level. For more on match levels, see Understanding match levels .

The ADDITIONAL_DATA column shows:

Robert has one active and three inactive records.
The earliest record date is 1/5/15, the latest is 1/2/18.
The pattern of repeated inactivation and re-registration with different identifying information each year warrants investigation.
Both his father and spouse appear in "DATA_SOURCE": "WATCHLIST".

Resolution decisions

"DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1001" and "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1005" have very little in common. The how command shows the resolution path for this entity.

Run how 3 to view the decision tree that determined all these records belong to the same entity.

Screenshot The how decision tree view:

Read the decision tree from the bottom up.
Most features have one score, but names can have up to 3: first, last, and combined, in that order.
In step 1, "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1002" and "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1004" came together first and created virtual entity V2-S1 which was used in step 2 to match "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1001" and so on. In each step, new features may be learned for use in the next step. In this case, the entity gained the EMAIL used in step 2 and the PHONE used in step 3.

The how report is covered in detail in the how section below. For now, press enter to continue.

Using why

The why command explains why two entities did not resolve to the same entity. There are two possible reasons:

The entities scored below the resolution threshold, though they may still be related.
The entities could not find each other because they had no matching candidate keys, or all candidate keys went generic.

A why result helps explain why two records are related but not resolved to the same entity. For additional assistance, contact Senzing Support .

Run help why to view the usage details.

Screenshot

The previous search for Robert Smith returned three different entities. The why command explains why the first and third did not resolve to the same entity.

Feature scoring

Run why 3 16 to compare them.

Screenshot The upper portion of the table shows "ENTITY_ID": 3 on the left and "ENTITY_ID": 16 on the right.

The data sources row shows that "ENTITY_ID": 3 has 4 "DATA_SOURCE": "CUSTOMERS" records and "ENTITY_ID": 16 has 1 "DATA_SOURCE": "CUSTOMERS" and 1 "DATA_SOURCE": "WATCHLIST" record.
The why result shows the current MATCH_KEY and rule between the two entities.
The MATCH_KEY shows the list of features that contributed to the match, both positively and negatively. The principle is also displayed and is the actual reason for the match. For questions about a specific principle, contact Senzing Support . For more detail, see Principle Based Entity Resolution .
The cross relation is what is stored in the database and should always equal the why result. Although rare, it can happen they are different and reevaluating the entities will correct it. If this occurs, contact Senzing Support .
Below the header are the features for each entity, with the best scoring pair on top.
On the NAME row:
- Robert Smith ("ENTITY_ID": 3) was compared with Robbie Smith ("ENTITY_ID": 16), with a full name score of 97. The surname scored 100 (exact match), and the given name scored 95 (recognized nickname).
- The [2] in brackets after Robert Smith on the left indicates 2 entities share this exact name.
- Bob J Smith on the left is another name for "ENTITY_ID": 3 and the Bob Smith and B smith names are greyed out and have a # sign in the bracket indicator as they are suppressed due to a more complete name being available.
The DOB row is colored red because it scored 58 and detracted from the match. It is also red in the why_result above.
On the ADDRESS row, the best matching address scored 99 and contributed to the match.
On the remaining rows, only the entity on the left had a PHONE and only the entity on the right had a DRLIC so there was nothing to match.

Candidate keys

Before entities can be scored, they must find each other through candidate keys. The lower portion of the why output shows the keys that placed them on a short list of candidates for comparison.

Screenshot

The lower portion of the why screen shows the candidate keys that were created for each entity:

Highlighted in blue are the keys that matched.
To keep the system fast, keys can ”go generic”, which means they are no longer used to generate candidates.
The "NAME_KEY":"RPRT|SM0" is a metaphone for Robert Smith (and also for Robbie Smith), and [3] different entities share this key.
If there is an exclamation point in front of the number like [!120], that key is no longer being used to find candidates.

There is a set of configurable thresholds that dictate when keys “go generic”, meaning no longer used for candidates. But rather than continually increasing thresholds, slowing down the system, Senzing creates lots of keys. Since the NAME_KEY for Robert Smith might go generic, Senzing creates composite keys like NAMEADDR_KEY and NAMEDATE_KEY as well. It is far less likely that all of these would go generic.

To learn more about how Senzing Entity Resolution works see Entity Resolution Processes .

Using how

This section covers the how command and its views in detail.

Run help how to view the usage details. Screenshot

The following example uses an entity with a more complex resolution path.

Run search maria sentosa

Screenshot

Decision tree view

Run how 24 to see how those 5 records resolved to the same entity.

Screenshot

The how decision tree is the default view:

Read the decision tree from the bottom up. The two interim entities created along the way are combined in the last step to form the final entity.
Each step shows the scores of all compared features, along with the MATCH_KEY and the principle that was satisfied.
Each step has one of three types:
- “creating a virtual entity” by combining 2 records,
- “adding a record to a virtual entity”, or
- “combining virtual entities”.

In straightforward cases, two records create a virtual entity in step 1 and additional records are added to it. In more complex cases, two or more interim entities are created before they accumulate enough attributes to be joined. The Maria entity above follows the more complex path.

The how output is a series of why comparisons. Instead of showing why two entities did not match, each step shows how each record entered the entity.

Columnar view

Press C at the prompt to see the columnar view.

Screenshot

Read this view from left to right.
The first two columns show step 1.
The NAME is highlighted in yellow because it did not score high enough for a close name match. However, the given name scored 100, producing a partial name match. This is why the MATCH_KEY starts with PNAME. Principle 110 allows a partial name match when several other important features match, including DOB, ADDRESS, and EMAIL.
Step 1 reveals a more complete NAME and new ADDRESS which were used to match records in the remaining two steps.

A columnar why can be very wide. If it scrolls off the screen, use the arrow keys to scroll left and right, up and down, pressing q to quit.

The columnar view shows what is learned at each step. It only shows how each record enters the entity. Steps that combine virtual entities are not included.

Summary view

Press S at the prompt to display the summary view.

The summary view provides a comprehensive overview of the entity and its resolution.

Screenshot

The resolution summary at the top summarizes the decision tree.

It lists the number and type of steps required.
It highlights steps of interest, including any low-scoring names and steps that combine virtual entities. For large entities with many steps, this section identifies the most significant ones.
It concludes with the principles and match keys that fired.

The entity summary below shows the record count and feature breakdown.

Of the 4 NAME values, 3 are grayed out with a [#1]. The # indicates a suppressed name. Senzing computes the most complete name and identifies which others are derivatives of it.
- For matching, if Barry Smith and Betty Smith both have an aka of B smith, the more complete name is used even if B Smith matches exactly.
- This information is also used in best-name calculation.
After each feature, the number in [] indicates how many other entities use that exact value. The number in blue () indicates how many records in that entity reported that value.
Looking at ADDRESS values, the blue (3) shows that 9304 W 15th is the most common, useful for a best-address calculation. The bracketed [2] indicates another entity shares the 638 Downey St address.

Investigating shared features

Run search addr_full = 638 Downey St, Salem, OR to find entities at that address.

Then run compare search to see them side-by-side.

Screenshot

Maria is in "DATA_SOURCE": "WATCHLIST" and shares an ADDRESS with Susan. This overlap raises several questions:

Whether the address was used fraudulently.
Whether Susan is connected to the same activity.
Whether they simply occupied the address at different times.

Senzing tracks feature usage across all entities, supporting both entity resolution and the identification of potential threats and fraud patterns.

Using why with search

The why command can also be used with search results. Refer to the previous section or run help search to review the syntax.

No keys matched

Run search barry smith to check for matches.

Screenshot

No entities were found, meaning no keys matched. Run why search to view the keys it generated:

Screenshot

The [0] indicates no entities exist with the name Barry Smith, nor any of its metaphone NAME_KEY values.

A NAME alone may not be sufficient to find a record. Adding a date of birth to the search may improve results.

Found but scored too low

Run search bubby smith | date_of_birth: 12/11/1978

Screenshot

No results returned, but the message changed to “entities were found but did not score high enough”.

Run why search 3 if the expected result was "ENTITY_ID": 3.
Run why search if the expected entity ID is unknown.

Screenshot

The result shows that Bubby vs Bob J does not score high enough. The low name score indicates the search name is too distant. Searching for Bobby instead may yield better results.

Refining the search

Run search bobby smith | date_of_birth: 12/11/1978

Screenshot

The search returns two entities. The top result has the same DOB.

Using tree

The tree command displays relationships at multiple degrees of separation.

Run help tree to view the usage details.

Screenshot

The following example demonstrates the tree view using an organization entity.

Run search universal exports

Screenshot

The search returns 4 entities. Retrieve the Worldwide entity to examine the hierarchy.

Run get 103 to view the entity.

Screenshot

This entity is the global parent of the other three, and its relationships include ownership information. The get command shows a one-degree tree view. To see two degrees:

Run tree 103 degree 2

Screenshot

The tree also shows the principals behind Universal Exports USA. The tree command uses a single call to the Senzing SDK.

Using show_last_call

Run show_last_call to see the SDK calls made by the last command.

Screenshot

The last command used the find_network_by_entity_id SDK call. For full API documentation, see https://www.senzing.com/docs/ .

Using export

The export command extracts the original JSON records that make up an entity. Exported records can be loaded into a test system for further debugging, or attached to a Senzing support ticket for investigation.

Run help export to view the usage details.

Screenshot

Run export 3, 16 to /tmp/export.jsonl.

Specify a directory with write permission.

Screenshot

The exported file contains the original JSON records for both entities, suitable for loading into another system for testing or debugging.

The export command is also used for building truth sets. Because the best truth sets are based on real data, complex examples of entities that matched or did not match can be exported as truth set records. See How to create an entity resolution truth set .

Next steps

If you have any questions, contact Senzing Support. Support is 100% FREE!