Unsupervised Clustering for Fault Diagnosis in Nuclear Power Plant Components

Abstract The development of empirical classification models for fault diagnosis usually requires a process of training based on a set of examples. In practice, data collected during plant operation contain signals measured in faulty conditions, but they are ‘unlabeled’, i.e., the indication of the type of fault is usually not available. Then, the objective of the present work is to develop a methodology for the identification of transients of similar characteristics, under the conjecture that faults of the same type lead to similar behavior in the measured signals. The proposed methodology is based on the combined use of Haar wavelet transform, fuzzy similarity, spectral clustering and the Fuzzy C-Means algorithm. A procedure for interpreting the fault cause originating the similar transients is proposed, based on the identification of prototypical behaviors. Its performance is tested with respect to an artificial case study and then applied on transients originated by different faults in the pressurizer ...


Introduction
Fault diagnosis can be seen as a classification problem in which a class identifying the type of the fault needs to be associated to a vector of values of measured signals [Zio et al., 2006a]. Due to the complexity of the phenomena involved and the highly non-linear interrelationships between the causes that determine the equipment behavior and the signal evolutions, it is usually difficult to develop analytical models for fault diagnosis [Venkatasubramian et al., 2003].
An attractive alternative is to resort to empirical classification models (classifiers) whose parameters are tuned through an iterative process, called training, based on a set of examples constituted by signals labelled with the corresponding class of fault under which conditions they have been measured [D'Antone, 1992;Reifman, 1997;Sheng et al., 2004;Zio, 2007;. Methodological approaches have been proposed for fault diagnosis in components of Nuclear Power Plants (NPPs) [Cheon et al., 1993;Kim et al., 1996;Reifman, 1997;Zio et al., 2006a;Zio et al., 2006b;Di Maio et al., 2011]. However, application in practice is limited because of lack of examples for classifier training. Indeed, although data collected during plant operation contain also signals measured in faulty conditions, the information on the fault class is usually not available.
The objective of this work is to develop a methodology for the identification of transients originated by faults of the same class, on the conjecture that they lead to similar behaviors of the measured signals. To this aim, the problem is formulated as one of clustering, in which the vectors of measured signal values are partitioned into a small number of homogeneous clusters so that those belonging to the same cluster are as similar as possible, and as dissimilar as possible to those belonging to the other clusters. In this work, the task of clustering of measured signals is addressed by means of a modified spectral clustering algorithm. The similarity between measured signals is computed by using a fuzzy similarity measure [Joentgen et al., 1999;Na et al., 2004;Xia et al., 2011] in the space of wavelet transforms [Strang et al., 1996;Ikonomopoulos et al., 1998] in order to account for the signals evolution in time (in the following also referred to as "trajectories"). A similarity graph is built, in which each vertex represents a trajectory and the weight associated to the edge connecting two vertices is the value of (fuzzy) similarity between the two corresponding trajectories. Spectral analysis techniques are finally applied in order to find an optimal partition of the graph [von Luxburg, 2007].
The proposed methodology has been tested on an artificial case and then applied to a case study concerning simulated faults in a pressurizer of a Pressurized Water Reactor (PWR) NPP.
The remaining of the paper is organized as follows: Section 2 states the problem; Sections 3.1 and 3.2 sketch the methodology proposed for unsupervised clustering of transients; Section 4 presents the artificial case study used to verify the performance of the proposed methodology; Section 5 presents the case study concerning the pressurizer of the PWR; finally, in Section 6 some conclusions and remarks are drawn. , i.e., ,

Problem statement
The objective of the present work is to partition the N trajectories i X into an unknown number of clusters, C, each one containing transients characterized by similar behavior.

Methodology
The methodology here proposed is based on spectral clustering [Strang et al., 1996]. The main characteristic of spectral clustering is that it allows partitioning objects (in our case, vectors of measured signals) into clusters by using only a measure of similarity between them. A similarity graph G = (V, E) is introduced, in which each vertex v i in this graph represents an object and a weight is associated to each edge e ij connecting vertices i and j, to measure the similarity between objects i and j [von Luxburg, 2007]. Clustering then aims at finding a partition of the graph such that the edges between elements belonging to different groups of the partition have small weights (which means that objects in different clusters are dissimilar from each other) and the edges connecting elements within the same group have large weights (which means that objects within the same cluster are similar to each other) [Alpert et al., 1999]. Section 3.1 illustrates the method proposed to measure similarity between the trajectories, whereas Section 3.2 illustrates the details of the spectral clustering algorithm.

Similarity measure between trajectories
The notion of similarity is strongly related to the objective of the application: in our case, we want a similarity measure that takes large values for trajectories of the same class (transients caused by the same type of fault) and small values for trajectories of different classes.
When looking at the similarity between trajectories, the functional behaviour of the signals is the focus of the analysis irrespective of the numerical values which may be quite dissimilar due to the presence of outliers, noise or different scaling and translating factors [Angstenberger, 2001].
Possible causes of difference in the signal numerical values from transients of the same class are the magnitude of intensity of the faults, the plant operational state, the measurement noise. For this reason, the definition of the similarity measure between two transients should not be based on the magnitude of the signal values, but rather it should consider the functional characteristics of the signal trajectories, e.g., form, slope, curvature [Joentgen et al., 1999]. To catch these characteristics pre-processing of the transient data can be performed. Section 3.1.1 illustrates the data preprocessing technique applied in this work, whereas Section 3.1.2 defines the similarity measure adopted.

Wavelet transform pre-processing
Wavelet transforms have been chosen due to their effectiveness in catching the functional behaviour of the signals in problems of transient classification. [Roverso, 2000], [Roverso, 2003] and [Baraldi et al., 2009] have shown the improved performance of transient classification algorithms when they are fed by wavelet features instead of the direct signal values. In the present work, the Haar wavelet transform [Ogden, 1997] is applied on a sliding window of the signal timeseries. For each signal k=1,...,Z, the retained wavelet features are: the mean value, w 1 , the maximum wavelet coefficient, w 2 , and the minimum wavelet coefficient, w 3 . The first feature is proportional to the average value in the time window and captures the general trend of the signal in the windows in a compact way. The features w 2 and w 3 capture variations in the signal within the window (e.g., downward or upward trends, step changes, spikes, etc.). The window size T s is selected so as to correspond to wavelet dyadic decomposition values (i.e., powers of 2), and consecutive windows are chosen with an overlap of T s -1 to avoid missing features that might be present at the window borders.

Fuzzy similarity measure
After the data pre-processing, the similarity between transient i and transient j can be computed by considering matrixes i Y and j Y . A fuzzy similarity measure is considered, which determines the degree of closeness of the two trajectories with reference to the pointwise difference between the corresponding feature values . In particular the pointwise difference The similarity measure should allow for a gradual transition [Binaghi et al., 1993;Joentgen et al., 1999]. This is here achieved by evaluating the pointwise difference of two trajectories with reference to an "approximately zero" fuzzy set (FS) specified by a function which maps ij  into a value ij  of membership to the condition of "approximately zero": values of ij  close to 0 indicate that the signal evolutions in the two transients i and j are very different, whereas values close to 1 indicate high similarity.
Common membership functions can be used for the definition of the FS, e.g. triangular, trapezoidal, and bell-shaped [Dubois et al., 1988]. In the applications illustrated in this work, the following bell-shaped function is used: The arbitrary parameters  and  can be set by the analyst to shape the desired interpretation of similarity into the fuzzy set: the larger the value of the ratio   2 ln   , the narrower the fuzzy set and the stronger the definition of similarity .

Spectral clustering
The computation of the fuzzy similarity between all possible pairs of trajectories originates the similarity matrix W of size [N, N], whose generic element ij  represents the fuzzy similarity between trajectories i and j. The diagonal components ij  are set to 1 and the matrix is symmetric From the matrix W a similarity graph G = (V,E) is constructed, where each vertex v i represents the i-th trajectory and the weight associated to the edge e ij connecting the two vertices i and j is the similarity value ij  [von Luxburg, 2007]. The original problem of identifying groups of similar trajectories can be reformulated in that of finding a partition of the similarity graph such that the edges connecting elements of different groups have small weights and the edges connecting elements within a group have large weights [Alpert et al., 1999]. The spectral clustering algorithm is based on the following steps:

-Step 1: normalized Graph Laplacian Matrix
Compute: -the normalized graph Laplacian matrix: where L D W  and I is the identity matrix of size [N, N].

-Step 2: eigenvalues and eigenvectors of L sym
The information on the structure of a graph can be obtained from its spectrum [Zhao et al., 2007]. Given sym L , compute the first C eigenvalues 12 , ,..., C    and corresponding eigenvectors 12 , ,..., C u u u . The first C eigenvalues are such that they are very small whereas λ C+1 is relatively large [von Luxburg, 2007].

-Step 3: Number of clusters
The number of clusters is set equal to C, according to the eigengap heuristic theory [Mohar, 1997].

-Step 4: Feature extraction
The relevant information on the structure of the matrix W is obtained by considering the eigenvectors 12 , ,..., C u u u associated to the C smallest eigenvalues of its laplacian matrix sym L .
The square matrix W is transformed into a reduced matrix U of size [N, C], in which the C columns of U are the eigenvectors 12 , ,..., C u u u . Thus, the i-th trajectory similarity with other trajectories is captured in the C-dimensional vector i u corresponding to the i-th row of the matrix U . A matrix T is formed from U by normalizing its rows [von Luxburg, 2007]: It has been shown that this change of representation enhances the cluster properties in the data, so that clusters can be more easily identified [von Luxburg, 2007].

-Step 5: Unsupervised clustering
In this work, we resort to the Fuzzy C-Means (FCM) algorithm to partition the data into C clusters [Bezdek, 1981;Leguizamon et al., 1996;Alata et al., 2008]. FCM originates from hard C-Means clustering: the difference is that it allows elements (trajectories, in our case) to belong to two or more clusters [Klir and Yuan, 1995]. For each i-th element, the algorithms provides its membership m ic to all clusters, 1,2,..., c C  . If needed, crisp assignment can be obtained, e.g., by considering the cluster to whom the element belongs with the largest membership value. A prototypical trajectory can be identified for each cluster by considering the trajectory with the largest membership value to the cluster. The analysis of such trajectories can guide understanding, identification and interpretation of the fault types.

The artificial case study
The performance of the methodology has been firstly verified with respect to an artificial case study built by simulating N=150 trajectories of C=5 different classes in a Z=4 dimensional signal space (Figures 1-4). Transient length T is 100 time steps. Each of the 5 classes of transients is formed by 30 trajectories characterized by a priori established functional behaviours of the 4 signals (e.g., linear, parabolic and exponential). All the transients of the same class differ only for different values of the parameters governing the functional behaviour (e.g., the slope value of a linear functional behaviour) whereas two transients of different classes have at least one signal with a different functional behaviour [Baraldi et al., 2012].
Since the information on the class of each trajectory and on the total number of classes is not expected to be known in real industrial applications, it is not used to drive the partitioning of the transients into clusters but only to verify the performance of the proposed methodology. The similarity matrix W of size 150·150 obtained by computing the similarity measure between all possible pairs of trajectories is shown in Figure 5: the larger the similarity ij  , the brighter the shade of the ij-element of the matrix.
A large number of edges have large weights ij  (i.e., the vertices are strongly connected, which means that the corresponding trajectories are similar), but it is not easy to distinguish a partition of the graph in groups of trajectories. If the trajectories were sorted in such a way that similar trajectories were in consecutive rows of the matrix, e.g., all trajectories of class 1 in rows and columns from 1 to 30, all trajectories of class 2 in rows and columns from 31 to 60 and so on, the representation of the matrix would lead to a checkboard-like structure [Kluger et al., 2003]. This is   Figure 6 shows the 150 eigenvalues obtained by applying spectral analysis to matrix W , as described in Section 3.2. Since the first five eigenvalues are very close to zero and the sixth is remarkably larger, the number of clusters C is set equal to 5. In practice, the problem of clustering the 150 trajectories i X is now reduced to the problem of finding five clusters among the 150 5-dimensional vectors

Fault diagnosis in the pressurizer of a PWR
A case study regarding a pressurizer of a PWR NPP has been considered. Figure 9 is a schematic representation of the pressurizer system for which a Matlab SIMULINK model has been developed, based on the application of the mass and energy conservation equations to the two regions of vapor and liquid; exchanges between the two regions, due to evaporation of liquid and condensation of steam, are taken into account [Kuridan et al., 1998;Todreas et al., 1990]. The system of nonlinear differential equations describing the model is detailed in . In order to represent a realistic situation, the simulations have been carried out based on the operational parameters of a standard PWR pressurizer (Table 1). Furthermore, the total mass of water entering/exiting the pressurizer during a surge line mass flow transient has been related to the temperature variations of the coolant in the Primary Heat Transport (PHT) system.
In order to test the method on pseudo-realistic data, white noise has been added to each signal according to engineering considerations on the sensors accuracy [Hashemian, 2004;Johnson, 2008]. Table 2 reports the standard deviations of the considered noises.  A block diagram of the model identifying the inputs, state of the system, outputs and controller variables is shown in Figure 10. The control of the level L and the pressure P in the pressurizer is achieved through a feedback control scheme which reproduces that used in a standard PWR pressurizer. According to the control scheme illustrated in , the pressure f P and level f L are the controlled signals as well as the controller input signals; the sprayers mass flow rate sp m , and the heaters power Q are the controller outputs.
The present case study focuses on some faults which can occur to the pressurizer control system and can lead to undesired behaviors of the pressurizer. In particular, three different classes of faults are taken into account (Table 3): 1) heaters fail stuck with a fixed power output value Q ; 2) sprayers fail stuck with a fixed mass flow rate sp m ; 3) the communication between the controller and actuators fails: sprayers and heaters receive from the controller the command to provide a wrong quantity of water and heat, respectively. faults are not distinguishable from normal condition transients.

Class 1
Heaters fail stuck with a fixed power output value Q

Class 2
Sprayers fail stuck with a fixed mass flow rate sp m

Normal conditions
Heaters, sprayers and communication between controller and actuators work in normal conditions As an example, Figures 11-17 show the consequences of a class 1 fault (i.e., heaters fail stuck with a fixed power output value Q ) on the behavior of the pressurizer. The considered fault consists in a blockage of the heaters at time t=78 s during the out-surge transient of Figure 11 characterized by a surge mass flow rate of -9.51 Kg/s. The out-surge flow produces a reduction in the pressure, P, and liquid and steam temperatures, L T and V T , which induces the controller to turn on the heaters (Figure 13). In case of a nominal transient (black continuous line in Figure 15), the pressure promptly starts increasing and after 1000 s reaches the desired value, whereas in the case of faulty transient, due to the reduced power provided by the heaters, the pressure recovery is slower and after 1200 s its value is still 1 bar lower than the required (gray dotted line in Figure 15).   suggests that the heaters are providing an insufficient power, i.e., a class 1 fault as previously described (Table 3). Cluster 1 trajectory is characterized by a slightly faster increase of the pressure than in normal conditions: this should lead the expert to identifying a class 3 fault caused by the heaters providing higher power than required, i.e., a class 1 fault with b>1. Finally, cluster 2 is characterized by a slightly slower increase of temperature and pressure than that in nominal case: a possible cause can be a lower power provided by the heaters, as in class 3 fault with b<1.

Figure 22 Prototypical trajectories (noisy thin line) and filtered signals (smooth thick line)
Using the information on the true class of the simulated transients, we can verify the correspondence between clusters and fault classes. Table 4 reports the number of transients of each fault class contained in the 4 clusters obtained. Notice that both clusters 1 and 2 are formed by transients of class 3 (communication between the controller and actuators fails): the former consists in faulty transients caused by heaters providing more power than necessary (b>1, Figure 23), the latter by less power than necessary (b<1, Figure 24). All the transients of class 1 (heathers fail stuck at a fixed power) are contained in cluster 3, together with 3 transients of class 3 characterized by a very large reduction of the power provided by the heaters (b«1, Figure 25 Cluster 2 considered of out-surge transients. In fact, in this case, sprayers are never called in operation and, therefore, their failure cannot affect the transient functional behaviour. Furthermore, transients of class 1 characterized by values of b≈1 are those in which the actuators receive from the controller the command of providing a quantity of water or heat practically equal to the correct one and, therefore, show the same functional behaviour of the normal condition transients. These results show that the methodology is capable of distinguishing transients characterized by similar functional behaviour of the signals.

Conclusions
We have developed a methodology for the identification of groups of transients with similar behaviour because originated by faults of the same type. We have combined Haar wavelets transform, fuzzy similarity, spectral analysis and Fuzzy C-Means clustering. We have shown applications to an artificial case study and to the identification of transients in the pressurizer of a PWR.
The main conclusions of the analysis are: 1) Haar wavelets allow capturing the functional behavior, i.e., the shape of the transient; 2) spectral analysis allows identifying the number of clusters of similar trajectories and extracting their most relevant features; 3) the FCM algorithm allows finding the clusters of similar transients and identifying a prototypical trajectory for each cluster, which can then be used for fault understanding and interpretation.
A drawback of the methodology is that if new transients become available, it is not possible to dynamically update the clustering of the trajectories, but we have to repeat spectral analysis and FCM clustering on the new similarity matrix extended to contain the fuzzy similarity between all the new and old trajectories. To overcome this limitation, in the future we intend to look at an incremental learning framework.
The continuation of this work will also consider the application to real datasets collected during NPP operation and the development of an empirical classification scheme based on a supervised technique which exploits the cluster results gained from the analysis here presented.