Knowledge

Shrinkage (statistics)

Source đź“ť

400: 181:
data-generating models, even in cases of severe undersampling. ...method is fully analytic and hence computationally inexpensive. Moreover, ...procedure simultaneously provides estimates of the entropy and of the cell frequencies. ...The proposed shrinkage estimators of entropy and mutual information, as well as all other investigated entropy estimators, have been implemented in R (R Development Core Team, 2008). A corresponding R package “entropy” was deposited in the R archive CRAN and is accessible at the URL
39:
and, separately, to the standard adjustment made in the coefficient of determination to compensate for the subjunctive effects of further sampling, like controlling for the potential of new explanatory terms improving the model by chance: that is, the adjustment formula itself provides "shrinkage."
104:
Assume that the expected value of the raw estimate is not zero and consider other estimators obtained by multiplying the raw estimate by a certain parameter. A value for this parameter can be specified so as to minimize the MSE of the new estimate. For this value of the parameter, the new estimate
180:
Hausser and Strimmer "develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, ...it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and
176:
The use of shrinkage estimators in the context of regression analysis, where there may be a large number of explanatory variables, has been described by Copas. Here the values of the estimated regression coefficients are shrunk towards zero with the effect of reducing the mean square error of
51:
that, either explicitly or implicitly, incorporates the effects of shrinkage. In loose terms this means that a naive or raw estimate is improved by combining it with other information. The term relates to the notion that the improved estimate is made closer to the value supplied by the 'other
97:(MSE), by shrinking them towards zero (or any other finite constant value). In other words, the improvement in the estimate from the corresponding reduction in the width of the confidence interval can outweigh the worsening of the estimate introduced by biasing the estimate towards zero (see 177:
predicted values from the model when applied to new data. A later paper by Copas applies shrinkage in a context where the problem is to predict a binary response on the basis of binary explanatory variables.
137:) gives an unbiased estimator, while other divisors have lower MSE, at the expense of bias. The optimal choice of divisor (weighting of shrinkage) depends on the 337: 300: 31:, a fitted relationship appears to perform less well on a new data set than on the data set used for fitting. In particular the value of the 226: 441: 165:, where coefficients derived from a regular least squares regression are brought closer to zero by multiplying by a constant (the 145:, but one can always do better (in terms of MSE) than the unbiased estimator; for the normal distribution a divisor of 286: 216: 434: 81:
procedures do not include shrinkage effects, although they can be used within shrinkage estimation schemes.
221: 53: 32: 366:"Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks" 211: 199: 105:
will have a smaller MSE than the raw one. Thus it has been improved. An effect here may be to convert an
460: 365: 470: 70: 427: 78: 236: 415: 142: 134: 98: 59: 321: 106: 90: 8: 158: 28: 465: 346: 309: 231: 194: 94: 74: 66: 282: 173:, where coefficients are brought closer to zero by adding or subtracting a constant. 56: 399: 250: 182: 170: 162: 317: 138: 122: 411: 204: 454: 281:
Everitt B.S. (2002) Cambridge Dictionary of Statistics (2nd Edition), CUP.
36: 52:
information' than the raw estimate. In this sense, shrinkage is used to
407: 350: 313: 20: 48: 149: + 1 gives one which has the minimum mean squared error. 118: 117:
A well-known example arises in the estimation of the population
335:
Copas, J.B. (1993). "The shrinkage of point scoring methods".
227:
Shrinkage estimation in the estimation of covariance matrices
298:
Copas, J.B. (1983). "Regression, Prediction and Shrinkage".
40:
But the adjustment formula yields an artificial shrinkage.
27:
is the reduction in the effects of sampling variation. In
69:
and penalized likelihood inference, and explicit in
338:Journal of the Royal Statistical Society, Series C 301:Journal of the Royal Statistical Society, Series B 452: 183:https://cran.r-project.org/web/packages/entropy/ 363: 73:-type inference. In contrast, simple types of 435: 133: − 1 in the usual formula ( 442: 428: 161:that involve shrinkage estimates include 35:'shrinks'. This idea is complementary to 109:raw estimate to an improved biased one. 242: 185:under the GNU General Public License." 453: 334: 297: 394: 373:Journal of Machine Learning Research 248: 141:of the population, as discussed at 13: 14: 482: 398: 364:Hausser, Jean; Strimmer (2009). 89:Many standard estimators can be 357: 328: 291: 275: 217:Principal component regression 84: 1: 268: 414:. You can help Knowledge by 222:Regularization (mathematics) 143:mean squared error: variance 33:coefficient of determination 7: 200:Boosting (machine learning) 188: 112: 10: 487: 393: 152: 65:Shrinkage is implicit in 79:least-squares estimation 16:Phenomenon in statistics 237:Tikhonov regularization 129:, the use of a divisor 125:. For a sample size of 410:-related article is a 99:bias-variance tradeoff 255:entropy package for R 243:Statistical software 135:Bessel's correction 45:shrinkage estimator 29:regression analysis 195:Additive smoothing 95:mean squared error 75:maximum-likelihood 67:Bayesian inference 461:Estimation theory 423: 422: 212:Chapman estimator 478: 471:Statistics stubs 444: 437: 430: 402: 395: 388: 387: 385: 384: 370: 361: 355: 354: 332: 326: 325: 295: 289: 279: 264: 262: 261: 171:lasso regression 167:shrinkage factor 163:ridge regression 486: 485: 481: 480: 479: 477: 476: 475: 451: 450: 449: 448: 392: 391: 382: 380: 368: 362: 358: 333: 329: 296: 292: 280: 276: 271: 259: 257: 249:Hausser, Jean. 245: 232:Stein's example 191: 155: 139:excess kurtosis 123:sample variance 115: 87: 17: 12: 11: 5: 484: 474: 473: 468: 463: 447: 446: 439: 432: 424: 421: 420: 403: 390: 389: 356: 345:(2): 315–331. 327: 308:(3): 311–354. 290: 273: 272: 270: 267: 266: 265: 244: 241: 240: 239: 234: 229: 224: 219: 214: 209: 208: 207: 205:Decision stump 197: 190: 187: 154: 151: 114: 111: 93:, in terms of 86: 83: 15: 9: 6: 4: 3: 2: 483: 472: 469: 467: 464: 462: 459: 458: 456: 445: 440: 438: 433: 431: 426: 425: 419: 417: 413: 409: 404: 401: 397: 396: 378: 374: 367: 360: 352: 348: 344: 340: 339: 331: 323: 319: 315: 311: 307: 303: 302: 294: 288: 287:0-521-81099-X 284: 278: 274: 256: 252: 247: 246: 238: 235: 233: 230: 228: 225: 223: 220: 218: 215: 213: 210: 206: 203: 202: 201: 198: 196: 193: 192: 186: 184: 178: 174: 172: 168: 164: 160: 150: 148: 144: 140: 136: 132: 128: 124: 120: 110: 108: 102: 100: 96: 92: 82: 80: 76: 72: 68: 63: 61: 58: 55: 50: 46: 41: 38: 34: 30: 26: 22: 416:expanding it 405: 381:. Retrieved 376: 372: 359: 342: 336: 330: 305: 299: 293: 277: 258:. Retrieved 254: 179: 175: 166: 156: 146: 130: 126: 116: 103: 88: 64: 44: 42: 24: 18: 379:: 1469–1484 85:Description 71:James–Stein 37:overfitting 455:Categories 408:statistics 383:2013-03-23 269:References 260:2013-03-23 159:regression 62:problems. 54:regularize 21:statistics 466:Estimator 251:"entropy" 157:Types of 60:inference 57:ill-posed 49:estimator 25:shrinkage 189:See also 119:variance 113:Examples 107:unbiased 91:improved 351:2986235 322:0737642 314:2345402 169:), and 153:Methods 349:  320:  312:  285:  47:is an 406:This 369:(PDF) 347:JSTOR 310:JSTOR 412:stub 283:ISBN 101:). 77:and 121:by 19:In 457:: 377:10 375:. 371:. 343:42 341:. 318:MR 316:. 306:45 304:. 253:. 43:A 23:, 443:e 436:t 429:v 418:. 386:. 353:. 324:. 263:. 147:n 131:n 127:n

Index

statistics
regression analysis
coefficient of determination
overfitting
estimator
regularize
ill-posed
inference
Bayesian inference
James–Stein
maximum-likelihood
least-squares estimation
improved
mean squared error
bias-variance tradeoff
unbiased
variance
sample variance
Bessel's correction
excess kurtosis
mean squared error: variance
regression
ridge regression
lasso regression
https://cran.r-project.org/web/packages/entropy/
Additive smoothing
Boosting (machine learning)
Decision stump
Chapman estimator
Principal component regression

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑