# CpK data for averaged results and single values different on same data

Discussion in 'SPC - Statistical Process Control' started by Alan Charles, Jun 25, 2019.

1. ### Alan CharlesNew Member

Joined:
Jun 25, 2019
Messages:
1
0
Trophy Points:
1
Hi All,
The above calculation gives me 2 different results on the same data. To explain if I have, say 1000 points of data and carry out a Cpk calculation then I get say 1.6. But actually these 1000 points are made up of sub sets each of 10 data points. So 100 means. If I Cpk these 100 points my result is 1.3. Can someone explain please how the calculation and the averaging is affecting my results on the same data but obviously treated differently.
thanks

2. ### MinerModeratorStaff Member

Joined:
Jul 30, 2015
Messages:
444
345
Trophy Points:
62
Location:
Greater Milwaukee USA
It is difficult to say without the raw data since Cpk uses both the mean and the standard deviation. The means should not be very different whether averaged or not. However, when you average data, the standard deviation of the averages is the standard deviation of the individual measurements divided by the SQRT(n), or in your case by the SQRT(10). This should have the effect of increasing the Cpk, so without the data it is difficult to assess the actual cause. What do the measurements look like over time? Are they stable and in control? If not, the results will be unpredictable. Do you have mixtures of different process streams?

3. ### Bev DModeratorStaff Member

Joined:
Jul 30, 2015
Messages:
503
545
Trophy Points:
92
Location:
Maine
what Miner said.

you should never "Cpk" subgroup means. process capability indices are based on the spread of the individual points , all 1000 of then in your case.

The result you describe is counterintuitive as Miner said unless your process is not homogenous. (if it has large shifts, drifts or cycling that would increase the variation of the subgroup means beyond what would be expected from simple 'sampling error') hence the request for the data.

In reality the math is no where near as insightful as the time series plot of the data. The useful of mathematical manipulation of data is only useful if you understand the underlying requirements for any given formula to be 'correct'; in other words understanding of the variation precedes mathematics. And understanding of the formula must precede it's use.