Dismiss Notice
You must be a registered member in order to post messages and view/download attached files in this forum.
Click here to register.

R-sq vs. R-sq (pred)

Discussion in 'Capability - Process, Machine, Gage …' started by essegn, Apr 11, 2020.

  1. essegn

    essegn Member

    Joined:
    Feb 5, 2016
    Messages:
    47
    Likes Received:
    4
    Trophy Points:
    7
    I have quick-checked a quite big data set, if there is a sense to spend time on the deeper analysis.
    I would like to know, if any predictor(s) are able to explain the response.

    I choosed the PLS Regression - with Cross Validation.
    I have included all the predictors without checking significance - without 2-way interactions.

    The results are:
    R-sq - 65% with 4 components
    R-sq (pred) - 0%

    R-sq is the percentage of variation in the response that is explained by the model.
    R-sq (pred) - Predicted R-sq

    ---------

    What does it mean for the process?
    With the data i can explain 65% of variance in the response.
    But the predictive ability is 0% - or near to 0%.

    How could be such an information used?
    Okay, 65% could be explained, but in case that the predictive ability is 0%, then there is no need to spend the time on the process improvement (with the actual data)?
     
  2. Miner

    Miner Moderator Staff Member

    Joined:
    Jul 30, 2015
    Messages:
    376
    Likes Received:
    282
    Trophy Points:
    62
    Location:
    Greater Milwaukee USA
    There are three versions of R^2 used in regression to assess the adequacy of the model:
    • R^2: As you stated, R^2 is the percentage of variation explained by the model. However, it has a very serious drawback in that as you add terms to the model R^2 will continue to increase even when the terms add no value
    • R^2 (adj): Enter R^2 (adjusted). The adjustment penalizes you for adding terms. Therefore, it only increases when the term added adds value. Otherwise, it will decrease. However, you can still add a term that appears to add value, but actually creates a more complicated model that over-fits the data.
    • R^2 (pred): Enter R^2 (predicted). This protects against over-fitting your model, which is what you appear to have done. It uses a portion of your data to predict the unused portion and assess the quality of the prediction. In your case, choose a simpler model with fewer terms. Your results may be so poor due to experimental noise, measurement variation, missing terms, etc.
     
    essegn likes this.
  3. essegn

    essegn Member

    Joined:
    Feb 5, 2016
    Messages:
    47
    Likes Received:
    4
    Trophy Points:
    7
    Thank you Miner.
    You are right. I forgot to delete one outlier, this is why the predicton was so poor.
     
  4. Miner

    Miner Moderator Staff Member

    Joined:
    Jul 30, 2015
    Messages:
    376
    Likes Received:
    282
    Trophy Points:
    62
    Location:
    Greater Milwaukee USA
    Do you know the reason for the outlier? Bad measurement, transposed digits? If you don't know the reason, you should be careful about removing an outlier.
     
  5. essegn

    essegn Member

    Joined:
    Feb 5, 2016
    Messages:
    47
    Likes Received:
    4
    Trophy Points:
    7
    I suspect that the outlier was caused by a typo. 105 instead of ca. 1050. However i need to compare the value with the measurement file.
     

Share This Page