# R-sq vs. R-sq (pred)

Discussion in 'Capability - Process, Machine, Gage …' started by essegn, Apr 11, 2020.

1. ### essegnMember

Joined:
Feb 5, 2016
Messages:
47
5
Trophy Points:
7
I have quick-checked a quite big data set, if there is a sense to spend time on the deeper analysis.
I would like to know, if any predictor(s) are able to explain the response.

I choosed the PLS Regression - with Cross Validation.
I have included all the predictors without checking significance - without 2-way interactions.

The results are:
R-sq - 65% with 4 components
R-sq (pred) - 0%

R-sq is the percentage of variation in the response that is explained by the model.
R-sq (pred) - Predicted R-sq

---------

What does it mean for the process?
With the data i can explain 65% of variance in the response.
But the predictive ability is 0% - or near to 0%.

How could be such an information used?
Okay, 65% could be explained, but in case that the predictive ability is 0%, then there is no need to spend the time on the process improvement (with the actual data)?

2. ### MinerModeratorStaff Member

Joined:
Jul 30, 2015
Messages:
474
383
Trophy Points:
62
Location:
Greater Milwaukee USA
There are three versions of R^2 used in regression to assess the adequacy of the model:
• R^2: As you stated, R^2 is the percentage of variation explained by the model. However, it has a very serious drawback in that as you add terms to the model R^2 will continue to increase even when the terms add no value
• R^2 (adj): Enter R^2 (adjusted). The adjustment penalizes you for adding terms. Therefore, it only increases when the term added adds value. Otherwise, it will decrease. However, you can still add a term that appears to add value, but actually creates a more complicated model that over-fits the data.
• R^2 (pred): Enter R^2 (predicted). This protects against over-fitting your model, which is what you appear to have done. It uses a portion of your data to predict the unused portion and assess the quality of the prediction. In your case, choose a simpler model with fewer terms. Your results may be so poor due to experimental noise, measurement variation, missing terms, etc.

Marco2mac and essegn like this.
3. ### essegnMember

Joined:
Feb 5, 2016
Messages:
47
5
Trophy Points:
7
Thank you Miner.
You are right. I forgot to delete one outlier, this is why the predicton was so poor.

4. ### MinerModeratorStaff Member

Joined:
Jul 30, 2015
Messages:
474
383
Trophy Points:
62
Location:
Greater Milwaukee USA
Do you know the reason for the outlier? Bad measurement, transposed digits? If you don't know the reason, you should be careful about removing an outlier.

Joined:
Feb 5, 2016
Messages:
47