1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
Dismiss Notice
You must be a registered member in order to post messages and view/download attached files in this forum.
Click here to register.

Relationship between a response and predictor variables

Discussion in 'DOE - Design of Experiments' started by essegn, Oct 6, 2016.

  1. essegn

    essegn Member

    Joined:
    Feb 5, 2016
    Messages:
    47
    Likes Received:
    5
    Trophy Points:
    7
    Dear Gentelmen,

    I am trying to find out which process parameter / parameters (temperature, time, pressure etc.) have the effect on process response.But i do not mean DOE.

    There is a process which is not really stable. All the parameters are already set and use in the same way, but due to several factors is their performance being changed according to material purity, machine condition etc.

    Do i need to use a Regression analysis?

    I thought that, but today i found, that by Regression analysis a relationship between a predictor variable and a response does not mean the variable causes the response.
    But i need to find out which predictor variable (or their combination / interaction) increases variance of the process.

    Thank you very much in advance.

    Peter
     
  2. Miner

    Miner Moderator Staff Member

    Joined:
    Jul 30, 2015
    Messages:
    576
    Likes Received:
    492
    Trophy Points:
    62
    Location:
    Greater Milwaukee USA
    The only way to conclusively prove a causal relationship is through experimentation (e.g., DOE). That is you deliberately manipulate the variable (i.e., flip the switch) and observe the change in the response (does the light turn on/off?). To be completely certain, you must replicate this several times. Experimentation may be analyzed in many ways. A DOE is traditionally analyzed using ANOVA, but could be analyzed using regression, so it is not regression alone that is the problem proving causality.

    Where the problem arises is when you perform an observational study. That is, you do not manipulate the variables, but simply observe them as they vary. Regression is often the only analytical tool suitable for this analysis. It is the observational nature of the study that prevents conclusions about causality, not the tool used for analysis.

    However, there is a simple solution. Perform the observational study, analyze it with regression and develop a model. Use the model to predict the results under several scenarios, then run a confirmation experiment under each scenario. It the experiments confirm the predictions, you have demonstrated causality.
     
    Bev D likes this.
  3. Bev D

    Bev D Moderator Staff Member

    Joined:
    Jul 30, 2015
    Messages:
    605
    Likes Received:
    663
    Trophy Points:
    92
    Location:
    Maine
    "Correlation does not imply cause" is great advice. It doesn't say that you can't determine which factors cause a response or output characteristic to vary. It is actually a warning that you must replicate your results - something that is often left out of 'traditional' statistical teaching.

    In your case I would probably start with a multi-vari chart to narrow the field of input factors. Or you can try multiple linear regression but that is pretty sophisticated for a beginner.


    Some great introduction articles are:

    Robert D Zaciewski and Lou Nemeth, “The Multi-Vari Chart: An Underutilized Quality Tool”, Quality Progress, October 1995, pp. 81-83

    A Painless Look at Using Statistical Techniques to Find the Root Cause of a Problem”, http://www.processexcellencenetwork.com/lean-six-sigma-business-transformation/articles/a-painless-look-at-using-statistical-techniques-/

    Steiner, Stefan H., MacKay, R. Jock, Steiner, “Strategies for Variability Reduction”, Quality Engineering, Volume 10, Issue 1, September 1997 , pp 125-136

    Steiner, Stefan H., MacKay, R. Jock, Steiner, "Statistical Engineering, A Case Study", Quality Progress, June, 2006 pp 33-39


    Two books that I always recommend are:

    Steiner, Stefan H., MacKay, R. Jock, Statistical Engineering: An Algorithm for Reducing Variation in Manufacturing Processes, ASQ Quality Press, 2005.

    Moen, Ronald D., Nolan, Thomas, W., Provost, Lloyd P., Quality Improvement through Planned Experimentation 2nd Edition, McGraw-Hill, 1999
     
    essegn and Miner like this.
  4. Miner

    Miner Moderator Staff Member

    Joined:
    Jul 30, 2015
    Messages:
    576
    Likes Received:
    492
    Trophy Points:
    62
    Location:
    Greater Milwaukee USA
    I can second the recommendation on this book. Excellent thought processes.
     
  5. Bev D

    Bev D Moderator Staff Member

    Joined:
    Jul 30, 2015
    Messages:
    605
    Likes Received:
    663
    Trophy Points:
    92
    Location:
    Maine
    An alternative approach is to plot all inputs and outputs and watch for correlation. This method has been dubbed EFAST (every factor at the same time) by Donald Wheeler. See his most recent article in Quality Digest: Input and Output Charts and EFAST Studies.

    If you aren't already recording the input factor results this approach can take longer than the approach outlined in the articles and books I posted above.
     
  6. Bev D

    Bev D Moderator Staff Member

    Joined:
    Jul 30, 2015
    Messages:
    605
    Likes Received:
    663
    Trophy Points:
    92
    Location:
    Maine
    As an aside I don't like his use of the term EFAST. It's a cute play on OFAT (One Factor At a Time experiments) but FAST is a type of value analysis approach developed by Charles Bytheway. FAST stands for "Function Analysis Systems Technique". The New Science of Fixing Things (John Allen and David Hartshorne) coined the term E-FAST several years ago as an advancement on Bytheway's approach to deal with energy functions (effort-flow).
     
  7. ncwalker

    ncwalker Well-Known Member

    Joined:
    Sep 21, 2015
    Messages:
    261
    Likes Received:
    168
    Trophy Points:
    42
    Location:
    North Carolina
    I am going to add a couple of statements to help you choose a method from the excellent recommendations above.

    IN GENERAL - if you select the DOE route, you will get a much more accurate conclusion BUT... this usually involves interrupting the process in which you are not making product (money). It doesn't go well with your operations department. And the interruption can be quite extensive if you have a lot of factors to test OR it takes a long time to set up the factors in each of your experiments.

    IN GENERAL - the E-FAST approach, or any other "observe and look for correlation" approach that does NOT interrupt the process is good because, well, it doesn't interrupt the process. BUT it may take a long time to actually find the predictor. This is especially true if you are looking for something that has a low frequency of occurrence. Because you are basically waiting for the bad thing to happen and hoping to catch the right inputs. If your cost of poor quality is high, this also won't go well with your operations department.

    My recommendation would be start with an E-FAST sort of experiment. Multi-Vary is my go to. Something called "Process Search" also can yield good results. What you want to do is eliminate as many possible GROUPINGS of predictors as you can before you start interrupting the process with more formal DOE type experiments. Then you conduct the formal DOE on the reduced set and really nail down your predictor/response relationship. Most times, this approach works.
     
    essegn and Bev D like this.
  8. essegn

    essegn Member

    Joined:
    Feb 5, 2016
    Messages:
    47
    Likes Received:
    5
    Trophy Points:
    7
    Thank you guys for all your replies. I really appreciate this kind of sharing experiences.
    I have tried Multiple Regression from the Assistant Menu (limited to 5 variables) and then ANOVA - General Linear Model - Fit General Model (no limits of variables).
    Both are working well for me, both showed me really close results nad moreover, i was expecting the similar results like i have got.

    Ncwalker, maybe is my question stupid, but what do you mean by E-FAST ? I have tried to google the term, but without chance.
     
  9. Bev D

    Bev D Moderator Staff Member

    Joined:
    Jul 30, 2015
    Messages:
    605
    Likes Received:
    663
    Trophy Points:
    92
    Location:
    Maine
    see my post above regarding the article on EFAST studies by Donald Wheeler.
     
  10. essegn

    essegn Member

    Joined:
    Feb 5, 2016
    Messages:
    47
    Likes Received:
    5
    Trophy Points:
    7
    Dear All,

    I am stuck with an interpretation of a regression analysis with Minitab. I would like to prove that process parameters have an effect to the process output.
    The collected data set consist of 4 continuous variables and 1 categorical X variable (A sample size 37), but I picked up 2 of them to check them with regression analyses.

    Minitab - Stat - Regression - Regression - Fit Regression Model

    R-sq(adj) - 51,48%
    both parameters are statistically significant (p<0,05)

    Minitab - Stat - Regression - Fit Line Plot

    R-sq(adj) - 43% - linear
    R-sq(adj) - 37% - quadratic


    Minitab - Assistant - Regression - Multiple Regression

    R-sq(adj) - 63%
    -3 unusuall X-Values

    How would you interpret the results?
    Why there is a difference among R-sq(adj) results?
    Would it be reasonable to fix these variables or to start from the beginning (to find out another PIVs)?

    Many thanks for your inputs.

    Peter
     
  11. Miner

    Miner Moderator Staff Member

    Joined:
    Jul 30, 2015
    Messages:
    576
    Likes Received:
    492
    Trophy Points:
    62
    Location:
    Greater Milwaukee USA
    I would have to see your analysis in Minitab to venture an opinion. The only thing I can tell you from this, is that the Fitted Line plot runs a linear, quadratic and cubic analysis with the R^2 for each to allow you to make your own decision on which fits the best. Also, the Fitted Line Plot can only handle one predictor at a time, while you mentioned using 2 predictors. Therefore, that R^2 is almost assuredly going to be smaller since it only has one explanatory variable in the model. You initial regression said both were significant.
     
    essegn likes this.
  12. essegn

    essegn Member

    Joined:
    Feb 5, 2016
    Messages:
    47
    Likes Received:
    5
    Trophy Points:
    7
    Thank you for your reply. By Fit Line Plot it should be as following:

    1. predictor R-sq(adj) - 43% - linear
    2. predictor R-sq(adj) - 37% - quadratic

    What should i do, you to see ana analysis from Minitab? Should i post here printscreens or upload a measured data?
    I do not understand, why the differences among results are so big. Aren't they?
     
  13. Miner

    Miner Moderator Staff Member

    Joined:
    Jul 30, 2015
    Messages:
    576
    Likes Received:
    492
    Trophy Points:
    62
    Location:
    Greater Milwaukee USA
    You can upload your data in Excel. Your fitted line example suggests that the linear model is a better fit than the quadratic. However, I emphasize again that this is misleading since you have two significant factors. You must form your model using multiple regression.
     
  14. essegn

    essegn Member

    Joined:
    Feb 5, 2016
    Messages:
    47
    Likes Received:
    5
    Trophy Points:
    7
    Hi Miner,

    could you please analyse the attached data?
    As i mentioned earlier in this thread i found that two significant parameters are: "Prozess" and "Totalzeit".

    Please let me know if my results are correct.

    Thank you.
     

    Attached File(s): 1. Scan for viruses before using. 2. Report any 'bad' files by reporting this post. 3. Use at your own Risk.:

  15. Miner

    Miner Moderator Staff Member

    Joined:
    Jul 30, 2015
    Messages:
    576
    Likes Received:
    492
    Trophy Points:
    62
    Location:
    Greater Milwaukee USA
    I attached my analysis below. It is in Minitab 17 format. If you have an earlier version, I can also save it as version 16 or 15. I also attached a key graph.

    This was a fairly complicated analysis as there was multicollinearity, several interactions, quadratic relationships as well as a very influential categorical variable. If you perform many experiments such as this, you should learn how to do multiple regression.



    Note: Thanks Atul for allowing attachment of MPJ files.
     

    Attached File(s): 1. Scan for viruses before using. 2. Report any 'bad' files by reporting this post. 3. Use at your own Risk.:

    Last edited: Mar 24, 2017
    Atul Khandekar likes this.
  16. Atul Khandekar

    Atul Khandekar Administrator Staff Member

    Joined:
    Jul 24, 2015
    Messages:
    376
    Likes Received:
    266
    Trophy Points:
    62
    Location:
    Pune, India
    You can zip and attach as .zip file. I also added .MPJ to attachment file types.
     
  17. essegn

    essegn Member

    Joined:
    Feb 5, 2016
    Messages:
    47
    Likes Received:
    5
    Trophy Points:
    7
    Hi Miner,

    I really appreciate your help and time that you invested for my analysis. To be honest, i was able to open essegn.mpj directly from your attachment.
    I have got exactly the same results as you had.

    It is kind of embarrassing, but the same try with the Multiple Regression did not work before. After that i have chosen two significant parameters from: Minitab - Stat - Regression - Regression - Fit Regression Model.
    Compared to the "new results" was one parameters wrong anyway. One more time many thanks for your help.

    This is not the first time, that I have got confusing results from Minitab. I cannot remember, what an error message i have got in Minitab, but it was something like: The analyze cannot be performed with all variables, because there are not enough measurements. As I already wrote, today I was able to perform this analyze with all the variables. Do you have experiences with that?

    The interaction between SiO2 and TopCoat is really strange, because the variables itself are not significant at all. But I suppose, this is the advantage of the multiple regression.

    Peter
     
    Atul Khandekar likes this.