## Calculation And Functioning Of

PCA operates by searching for the variables accounting for the greatest amount of variance. The items under investigation (samples, people, and descriptive terms), correlated and contributing the greatest amount of variance in ratio to the total, are grouped into the first principal component. The second component is derived in the same way. It is a measure of the variance remaining after the first component has been extracted and accounts for the next greatest amount of variance. The process continues in that fashion until all the variance has been accounted for. In practice, the investigator is rarely interested in all the components. Usually, the first few account for so much of the existing total variance that the other components can be ignored because they are not supplying enough additional information to justify being retained for further use or evaluation. The consequence of PCA is that the volume and complexity of the data set is reduced. The investigator may give up 5-15% of the information contained in the whole mass of data. In turn, there will be fewer entities to work with in the future, maybe three components, instead of the original number of samples (people, descriptive terms, or whatever was being subjected to PCA). Not only may data reduction be involved but PCA is useful to learn whether products thought to be different are in fact different. A common problem is to judge whether products sold under different brand names are indeed different senso-rially. The same producer may, for example, market the same product under two or more different labels.

Not quite in line with the illustration above is another type of problem (8). Nine brands of Scotch whisky and one of Irish whiskey were analyzed. Upon applying PCA it was learned that some of the brands were so close together in their sensory qualities they should be treated as one set of samples, not two or three different samples. A cooperative research project was conducted between the food chemistry and technology department of the University of Helsinki and the University of Georgia (9). The Finns had selected five kinds of Finnish sour rye bread, which they considered to be representative of different types, for sensorial and chemical examination. The chemical data permitted the five breads to be separated except for breads no. 1 and 4. The sensory data did the same except breads no. 1 and 2 could not be separated. Putting together the two kinds of data permitted sample resolution of the five kinds of bread, but a pictorial representation of just how close together the five kinds were was desired. Figure 1 illus

trates the positions of the five types of bread, one to the other, in three-dimensional space. Other ways of providing evidence that the samples differed will be described later. PCA was used here as a preliminary check on the validity of the assumption that the five kinds of bread were different.

Perhaps the theory underlying PCA should have been given first, but it is hoped that the reader will find the explanation of what PCA is about more understandable having first seen some of the results of its application. In the case of the Finnish sour rye bread, the data mass was reduced from judgments by 20 panelists for 12 sensory characteristics and several chemical measurements to five components, ie, the five coordinates of the five kinds of bread. Figure 2 depicts what is going on. The kinds of measurements represented by the numbers 1,2, and 4 are correlated with each other because they lie along the same, and the longest, axis. They account for the greatest amount of variance; thus they represent the first component. Measurement values 3 and 5 are along the next longest axis, and they account for the next greatest amount of variance, once the variance of the first component has been removed. They thus constitute the second component. Measurement value 6 is off by itself. It, therefore, constitutes the third component. Although the three axes are not at right angles to each other, one of the goals of PCA is to derive components that are orthogonal, ie, at right angles, to each other. The drawing was purposely made not to be ideal in terms of orthogonality, because in practice the axis of each component is not always exactly at a right angle to the other axes. The drawing depicts the way PCA was originally calculated. The problem of calculating principal components was solved by fitting a line to the longest axis of the ellipsoid of objects such that the sum of the squared residual

Figure 2. Depiction of basis for calculation of principal components. Regression curve through measurement types 2, 1, and 4 establishes first component, regression curve through measurement types 5 and 3 establishes second component, curve through measurement type 6 illustrates formation of a third component, the regression curve for each being orthogonal to the others.

Figure 2. Depiction of basis for calculation of principal components. Regression curve through measurement types 2, 1, and 4 establishes first component, regression curve through measurement types 5 and 3 establishes second component, curve through measurement type 6 illustrates formation of a third component, the regression curve for each being orthogonal to the others.

distances was at a minimum for highly correlated objects (10). The same was then done for the next longest axis and those next most highly correlated. The reason the components are orthogonal to each other is that once the first component has been extracted, subsequent ones extracted are not correlated, ie, are orthogonal, because the variance of the first is no longer in play to allow correlation to exist. To carry on PCA calculations today, Hotelling's (11) procedure is generally used. It depends on transposing and inverting the variance-covariance matrix, or the correlation matrix, to accomplish the same purpose. The procedure selects the variables accounting for the greatest proportion of the variance, it arrays them and then rotates them in such a way that they are in the order of decreasing variance and are orthogonal.

Figure 3 provides an illustration of the use of PCA to reduce data dimensionality. This study was mentioned above (7). Forty-five gc peaks were useful to resolve product differences, but only 7 peaks were needed to be 100% correct in classifying 47 samples of the three different kinds of onions. For many, a pictorial display is more convincing than numerical data. To take advantage of the information contained in the 7 peaks, yet still be able to show that the onions did indeed differ, PCA was applied to the 7 peaks to reduce them to three components. The graph shows quite clearly that the three kinds of onions occupy three distinct locations in space.

A more useful purpose of PCA than merely to reduce a number of variables to a lesser number of components to permit visual display is the use of it to classify panelists undergoing training for possible membership on a sensory panel. Reference was made above to biological differences often being the cause of errors in measurement. Humans are no exception. Every one is an individual, responding in different ways when called on to be a sensory panelist. That is true even after training. Some of the variation can be overcome by training panelists to respond nearly alike

Figure 3. Principal component analysis applied to yellow and red globe onions and to Vidalia onions based on reducing seven gas chromatographic peaks to three components, four replications each.

to the same stimulus, but it can never be fully overcome. In deciding whether to admit an individual to membership on a sensory panel, it must be considered whether the individual will be in reasonable concert with other members of the panel, or if the individual will respond so individually as to be a statistical outlier. Figure 4 shows PCA applied to the results of 11 panelists when they examined a baked bean product for the intensity of 14 attributes (12). Panelist 9 is clearly an outlier. Panelists 1, 2, and 3 do not

agree with the others as closely as is desirable. If panelist 9 is to be retained as a panelist, he will have to receive additional training in an effort to bring his responses into line with the rest of the panel. The sensory leader might decide that panelists 1, 2 and 3 might likewise improve in performance with additional guidance and training. One of the advantages sensory technologists have today is that PCA can be readily applied to the responses of panelists, and if there is a program to graph the results, a visual display can be created so that panelists can see how far they are from the rest. Knowing that, often panelists can readjust their mental scaling processes to bring their responses into line with the rest of the panel. For any of the illustrations above, the basic information is supplied as numerical output. The investigator generally just examines it to make decisions. Naturally the numerical output is the only way of examining the output beyond the third component. Graphic illustrations were given here merely to make clear the kinds of information obtainable from PCA.

In reducing the dimensionality of the data, some information is given up, but usually not much. In return for the loss of information, relations among the entities being evaluated, whatever they are, are made clearer. The entities can be products as in the first illustration, gas chromatographic peaks in the third, the responses of people in the fourth, or any pertinent category such as descriptive terms, geographical location, or demographic information.

## The Mediterranean Diet Meltdown

Looking To Lose Weight But Not Starve Yourself? Revealed! The Secret To Long Life And Good Health Is In The Foods We Eat. Download today To Discover The Reason Why The Mediterranean Diet Will Help You Have Great Health, Enjoy Life And Live Longer.

## Post a comment