2.2. Duplicate analysis

Duplicate analysis is a key tool for data quality control - especially for large volumes of data and imported catalogs. It finds duplicate candidates, but does not automatically eliminate them; instead, it forms the basis for downstream cleansing processes.

In concrete terms, this means for the process:

Automatic generation of clusters, where each cluster contains parts that are similar to each other.
Downstream manual annotation process to determine main parts and duplicates.
Export to CSV file

Prev	Up	Next
2.1.5. E-mail notification	Home	2.2.1. Prerequisites