1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
Dismiss Notice
You must be a registered member in order to post messages and view/download attached files in this forum.
Click here to register.

DOE with non parametric data

Discussion in 'DOE - Design of Experiments' started by mmmarta, May 16, 2023.

  1. mmmarta

    mmmarta Member

    Joined:
    Apr 12, 2023
    Messages:
    7
    Likes Received:
    0
    Trophy Points:
    1
    Hi all,

    we have to run a DOE (using Minitab) but we know our distribution will be non parametric.
    Do you know if this could be an issue while analyzing data or do you have any recommendation?

    Thank you in advance,
    Marta
     
  2. Miner

    Miner Moderator Staff Member

    Joined:
    Jul 30, 2015
    Messages:
    576
    Likes Received:
    492
    Trophy Points:
    62
    Location:
    Greater Milwaukee USA
    Please explain what you mean by nonparametric as it is often incorrectly used. Do you mean that while you do have interval or ratio data it is not distributed normally? Or do you mean that you do not have interval or ratio data?

    If you mean the former, the raw data does NOT need to be normally distributed. The actual assumption is that the RESIDUALS are normally distributed. Even this assumption is not critical with a larger dataset as ANOVA is very robust against this particular assumption.
     
    BradM likes this.
  3. mmmarta

    mmmarta Member

    Joined:
    Apr 12, 2023
    Messages:
    7
    Likes Received:
    0
    Trophy Points:
    1
    Hi Miner,

    thank you for your reply. I will try to better explain my problem.

    Each DOE run returns us a distribution and not a single value.
    We perform a DOE run and the output distribution does not fit with any "classic" distribution (we run a goodness of fit test).

    My concern is also regarding which statistic of my distribution I should consider as the output of a run.
    For example, If my output distribution would have been Gaussian, I know that the mean value is a good statistic to identify the distribution. In my case, I am not sure which parameter of my output I should better use to compare the runs. I can image that this depends on the particular distribution, but I would like to have your opinion on that.

    I hope I have been clear. Thank you in advance,
    Marta
     

    Attached File(s): 1. Scan for viruses before using. 2. Report any 'bad' files by reporting this post. 3. Use at your own Risk.:

  4. Miner

    Miner Moderator Staff Member

    Joined:
    Jul 30, 2015
    Messages:
    576
    Likes Received:
    492
    Trophy Points:
    62
    Location:
    Greater Milwaukee USA
    You have several options depending on your needs, but you will have to weigh the pros/cons of each option.
    • Central tendency: mean, median or mode. I would probably select median, but do not have the context to better shape this decision.
    • Spread: If this data is typical, it fits a 3-parameter Weibull or 3-parameter Gamma distribution fairly well, so the Scale parameter could be used to quantify variation or spread
     
  5. mmmarta

    mmmarta Member

    Joined:
    Apr 12, 2023
    Messages:
    7
    Likes Received:
    0
    Trophy Points:
    1
    Thank you Miner!

    Just another question regarding the goodness of fit of the data I sent you.
    Usually, when the p value is not higher than 0.05, I conclude the null hypothesis cannot be rejected. This happens also for the 3-parameter Weibull or 3-parameter Gamma distributions on my Minitab. Do you also look at the AD value to say the distribution fits the data? What's the relation of AD value with the p-value?

    Thank you!
    Marta
     
  6. Miner

    Miner Moderator Staff Member

    Joined:
    Jul 30, 2015
    Messages:
    576
    Likes Received:
    492
    Trophy Points:
    62
    Location:
    Greater Milwaukee USA
    Any goodness of fit test will start failing when you have extremely large sample sizes such as you have here. I made the determination primarily visually using the probability graphs. AD stands for the Anderson-Darling statistic, which is a smaller-the-better statistic. This is an example of one from the same sample size of randomly generated normal data. Measured data will often fail more frequently due to the resolution of the measurement device. This can be seen in the chunkiness of the data in the picture below.

    upload_2023-5-17_10-14-5.png upload_2023-5-17_10-17-47.png
     
    mmmarta and BradM like this.
  7. mmmarta

    mmmarta Member

    Joined:
    Apr 12, 2023
    Messages:
    7
    Likes Received:
    0
    Trophy Points:
    1
    Thank you very much for the help!