Cluster Analysis

Identifying similarity structures

Identifying similarity structures

In cluster analysis, the aim is to find similarity structures in data rooms. These data rooms can, for example, be generated by physical measured variables. The objective of cluster analysis is to arrange existing measurement points into groups so that they are as similar as possible within a group and as different as possible between the groups.

Detecting anomalies

Clusters determined in this manner can be interpreted with domain knowledge and be used to classify still unknown measurement points and/or to detect anomalies. The distribution of anomalies in the data room can provide information about the cause of errors.

Clustering with
the k-means algorithm

One of the best known methods for analyzing clusters is the k-means algorithm. It starts with a pre-determined number of randomly distributed cluster centers. Each data point is assigned to a center so that the total of all gaps between the data point and the assigned center is minimal. The centers are re-determined by calculating the mean for each cluster and the data points are reassigned. This assignment process is carried out iteratively until the optimum cluster configuration is found.

Identifying fault statuses

One area in which the cluster analysis method can be useful is in identifying the system statuses of multi-status systems. These statuses correspond to the similarity structures of the measurement data found in an intact system. Measurement data can be, for example, sensor data such as torque or currents, which can describe system statuses. The illustration of a cluster analysis shows an example of three identified operating statuses of a motor in the voltage-torque space (see colored ellipses). Measurement data that cannot be clearly assigned are classified as outliers (orange).     

The quality status of the motor can now be evaluated based on the number and distribution of the outliers. Outliers can also provide information about the type of faults or wear and tear processes. This can speed up identification of faults and, consequently, reduce machine downtime.  

Your benefit from the cluster analysis

Recognize errors
in good time


maintenance costs


Any questions? We are happy to help.

Katana is your big data analytics expert for Industry 4.0 applications. With the help of the very latest cloud technology and the expertise of our own data scientists and data engineers, we facilitate fast, highly flexible analysis of your industrial data and the operationalization of profit-generating smart services in your company’s system landscape. Contact us and team up with a strong partner to transform your digitalization plans into a reality.