# Calculation of Cpk

Discussion in 'SPC - Statistical Process Control' started by DennisK, Jun 29, 2018.

Dear all,

One of our customers requires SPC. As we do not have any experience with this subject I'm currently reading a lot about it and experimenting in Excel.
In our production process we perform measurements to release batches, this is done at the end of the process. I have gathered some data to experiment with calculating Cpk and Ppk.

For a certain product I have data from three different days, total 36 values. For the calculation of Ppk I take the standard deviation of the whole dataset (36 samples).
However the standard deviation (sigma estimator) for the Cpk is calculated by R-bar/d2. d2 is a constant which can be found in a table, and this is where I run into confusion. Let's say day 1 I've measured 9 samples, so subgroup sample size is 9, according to the table d2 is 2.970. Second day I've also measured 9 samples, d2 is 2.970. However on the third day I've measured 18 samples, so d2 becomes 3.640.

How can I calculate the Cpk now? Do I have to calculate Cpk for every single batch/day? So day one I calculate Cpk for the 9 samples, same for day two. And for day three I calculate Cpk using d2 for 18 samples?

I have included the dataset in Excel in the attachment.

Dennis

Hi Dennis:
We have some ace statisticians here. Someone will be along shortly to help...

SPC isn't really done "after the fact", as it involves controlling the process by monitoring measurements as the process runs. Sound more like you are doing a six sigma project than SPC per se.
I suppose you could "control" the next batch by the previous batches measurements, but it seems that may miss some of the variables, inherent and otherwise, in the process. Will be an interesting thread to follow.

That is exactly my point. Our organization is not ready yet to perform SPC during the production process, although it is required by our customer. So for now I'm trying to do some kind of SPC afterwards, also to get familiar with SPC, but then the questions mentioned in my starting post occur...

I suppose it depends on your process. Are there any changes from day to day, such as new material? If so, you may want to calculate daily. You just need to calculate some numbers and see what makes the most sense. Good luck.

Do you have process parameter(s) to influence product characteristics, used for capability analysis?

Prior to production the 'tool setter' performs the setup of parameters. When everything is set correct, the only thing the operator needs to do is push the buttons. During the process he is not supposed to change parameters, only when dimensions are not met during inspection.

1st (and most important) comment: A capability study IS NOT SPC.

A capability study is a prediction if a process that you have not had much experience with, a new process, may in fact be able to be controlled statistically. In other words, given a very (relative) short run initially, can we infer the process will in fact be capable long term? It does a fair job at this, there are those who totally dislike them and they have good reasons.

SPC usually involve control charts of some sort (xbar R, xbar s, p charts, etc). This is a different thing. It uses statistics to set up a guard band (the control limits) to give you advance warning a process is heading towards making bad parts BEFORE it starts actually making them. You could just pluck guard bands out of the air. Reduce your tolerance by say 80%. The problem with this simplified approach is - you may not have optimized guard bands. Remember, the goal is to know to adjust the process BEFORE you make bad parts. Scrap is expensive as is lost machine time.

I do not know why people utter the words "Are you using SPC? Then what's your capability?" (or some form of this). But they do.

For your Pp and Ppk, you have it right. Simply use the Excel STDDEV.S function (not the STDDEV.P function) to get your sigma. For Cp and Cpk, you typically want sigma "hat", which is an estimator of sigma. Key point: There are BUNCHES of spreadsheets out there that just use the same sigma for both. But ... sigma "hat" is what is called the pooled standard deviation. Instead of looking at the standard deviation of the whole data set (like the Excel function does), you are looking at the standard deviations of the subgroups and then averaging these together. Think Pp and Ppk being based on the noise of the whole group, wherease Cp and Cpk are based on the within group noise. Thus ... a process that is consistent part to part, but drifts over a shift (for say, grinder wear) will have a really good Cp/Cpk (within group variation is small) but a less good Pp and Ppk (over the shift the mean is shifting, which appears as more variation overall).

Easiest way to skin this cat in Excel is calculate STDDEV.S for each subgroup, then average all these results to get sigma "hat." If you really want to know how to do it, Google will show you. But my experience, my easy way only differs after the 3 significant figure. More than good enough.

One remaining question about where to draw the line. I now have 3 subgroups, 2018-1-22 / 2018-3-5 / 2018-4-16, and when I pool these subgroups together I'm probably heading towards a Ppk value, as this goes over a period of 4 months.
Best thing to do is calculate Cpk for each batch, right? So I get three Cpk-values...

Considering that Cp/Ppk are really based on sigma, the number of samples in the study drives the accuracy of sigma. How many data points are in these subgroups? (I'm used to fairly high volume manufacturing, so I'm thinking this is tens of thousands a month). Why not do the three months separately and trend the metrics to show stability over time? You also need to be careful that you don't capture an assignable cause in the data set. You can still get the metric, it's just math, but the underlying assumption of the metric is that it is based on a stable process. Having an assignable cause in there is NOT a stable process. (One can argue that neither is an adjustment. I'm more OK with adjustments in there because they ARE part of the process. But a machine crashing is not.) Bear in mind, anything you choose that takes the data away from normal renders the Cp/Cpk results "less good." If you're trying to get a customer off your back and they don't care, go for it. But if they know their statistics, lumping in this many points may raise questions. If it were me, I'd say "Here's my current metrics, based of the last n days of production ..." where n would be enough time to getween between 40 and 100 points in the study. But I'd run chart it first and look for assignable causes, which would look like big discontinuities in the plot.

To reiterate what Walker said, Cpk and Ppk are NOT SPC. Capability indexes are actually just statistical alchemy (see my paper in the “Resources” tab) but many Customer’s don’t understand how to measure true capability and dont’ care. They are simply checking their own blind ‘quality’ box. That rant aside, for Cpk to have any value at all the samples must be planned. Simply taking some data that’s lying around and doing math on it may allow you to check your Customer’s box but thta’s All. And if your Cpk is too small to make the grade you’ll have no way to improve it other than tricks with math. If it’s large enough make the grade simply because you didn’t collect the sample correctly you and your customer will be surprised when your process doesn’t make 100% good parts.

ncwalker is correct that capability indices are based on the knowledge (not theory, belief or hope) that the process is stable. However this doesn’t mean the absence of a non-homogenous process - what some will call an assignable cause. Processes can be stable AND non-homogenous. This si why rational sub grouping was developed for SPC. A non homogenous process is simply one in which the primary causal factor of the mean is NOT the primary causal factor of the standard deviation. Non-homogenous processes are far more common than homogenous processes. The difference between Cpk and Ppk is driven by this non-homogeneity.

I guess it really matters what you are trying to do.