Knowledge

Gittins index

Source πŸ“

67:. He first takes the "Scheduling Problem" and reduces it to a machine which has to perform jobs and has a set time period, every hour or day for example, to finish each job in. The machine is given a reward value, based on finishing or not within the time period, and a probability value of whether it will finish or not for each job is calculated. The problem is "to decide which job to process next at each stage so as to maximize the total expected reward." He then moves on to the "Multi–armed bandit problem" where each pull on a " 2264:
significantly. A problem setting to prevent such early terminations consists of defining the optimization as maximization of the future ratio seen by each state. An indexation is conjectured to exist for this problem, be computable as simple variation on existing restart-in-state or state elimination algorithms and evaluated to work well in practice.
425: 99:
with a reward function and with a probability of termination. It is a measure of the reward that can be achieved by the process evolving from that state on, under the probability that it will be terminated in future. The "index policy" induced by the Gittins index, consisting of choosing at any time
2255:
While conventional Gittins indices induce a policy to optimize the accrual of a reward, a common problem setting consists of optimizing the ratio of accrued rewards. For example, this is a case for systems to maximize bandwidth, consisting of data over time, or minimize power consumption, consisting
199:
algorithm. Cowan, W. and Katehakis (2014), provide a solution to the problem, with potentially non-Markovian, uncountable state space reward processes, under frameworks in which, either the discount factors may be non-uniform and vary over time, or the periods of activation of each bandit may be not
172:
could be used to calculate the index of a state without requiring its calculation for all states with higher index values. LCM Kallenberg provided a parametric LP implementation to compute the indices for all states of a Markov chain. Further, Katehakis and Veinott demonstrated that the index is
106:
such as the one of dynamic allocation, where a decision-maker has to maximize the total reward by distributing a limited amount of effort to a number of competing projects, each returning a stochastic reward. If the projects are independent from each other and only one project at a time may evolve,
40:
To illustrate the theory we can take two examples from a developing sector, such as from electricity generating technologies: wind power and wave power. If we are presented with the two technologies when they are both proposed as ideas we cannot say which will be better in the long run as we have no
2259:
This class of problems is different from the optimization of a semi-Markov reward process, because the latter one might select states with a disproportionate sojourn time just for accruing a higher reward. Instead, it corresponds to the class of linear-fractional markov reward optimization problem.
44:
If we were to make a judgment call at that early time in development we could be condemning one technology to being put on the shelf and the other would be developed and put into operation. If we develop both technologies we would be able to make a judgment call on each by comparing the progress of
75:
and has an unknown probability of success. There are multiple "bandits" and the distribution of successful pulls is calculated and different for each machine. Gittins states that the problem here is "to decide which arm to pull next at each stage so as to maximize the total expected reward from an
24:
with certain properties, namely: the process has an ultimate termination state and evolves with an option, at each intermediate state, of terminating. Upon terminating at a given state, the reward achieved is the sum of the probabilistic expected rewards associated with every state from the actual
189:
algorithm. This approach also has the advantage of calculating the index for one specific state without having to calculate all the greater indexes and it is valid under more general state space conditions. A faster algorithm for the calculation of all indices was obtained in 2004 by Sonin as a
2263:
However, a detrimental aspect of such ratio optimizations is that, once the achieved ratio in some state is high, the optimization might select states leading to a low ratio because they bear a high probability of termination, so that the process is likely to terminate before the ratio drops
194:
for the optimal stopping of a Markov chain. In this algorithm the termination probability of the process may depend on the current state rather than being a fixed factor. A faster algorithm was proposed in 2007 by NiΓ±o-Mora by exploiting the structure of a parametric simplex to reduce the
2246:
In queueing theory, Gittins index is used to determine the optimal scheduling of jobs, e.g., in an M/G/1 queue. The mean completion time of jobs under a Gittins index schedule can be determined using the SOAP approach. Note that the dynamics of the queue are intrinsically Markovian, and
136:
and his collaborators demonstrated in a Markovian framework that the optimal solution of the general case is an index policy whose "dynamic allocation index" is computable in principle for every state of each project as a function of the single project's dynamics. In parallel to Gittins,
970: 219: 41:
data, as yet, to base our judgments on. It would be easy to say that wave power would be too problematic to develop as it seems easier to put up many wind turbines than to make the long floating generators, tow them out to sea and lay the cables necessary.
1338: 1872: 131:
Questions about the optimal stopping policies in the context of clinical trials have been open from the 1940s and in the 1960s a few authors analyzed simple models leading to optimal index policies, but it was only in the 1970s that
1746: 1619: 79:
Gittins says that "Both the problems described above involve a sequence of decisions, each of which is based on more information than its predecessors, and these both problems may be tackled by dynamic allocation indices."
653: 200:
be fixed or uniform, subject instead to a possibly stochastic duration of activation before a change to a different bandit is allowed. The solution is based on generalized restart-in-state indices.
2247:
stochasticity is due to the arrival and service processes. This is in contrast to most of the works in the learning literature, where stochasticity is explicitly accounted through a noise term.
2175: 168:
was first addressed by Varaiya and his collaborators with an algorithm that computes the indexes from the largest first down to the smallest and by Chen and Katehakis who showed that standard
815: 420:{\displaystyle \nu (i)=\sup _{\tau >0}{\frac {\left\langle \sum _{t=0}^{\tau -1}\beta ^{t}R\right\rangle _{Z(0)=i}}{\left\langle \sum _{t=0}^{\tau -1}\beta ^{t}\right\rangle _{Z(0)=i}}}} 565: 1969: 1038: 766: 2118: 1204: 532: 2207: 1904: 1522: 1473: 1435: 1075: 457: 2236: 1998: 1752: 804: 1388: 1176: 1126: 676: 2056: 2027: 1361: 486: 1493: 1196: 1149: 1099: 506: 1632: 45:
each technology at a set time interval such as every three months. The decisions we make about investment in the next stage would be based on those results.
2482: 1530: 2936: 2900: 2848: 2791: 577: 71:" lever is allocated a reward function for a successful pull, and a zero reward for an unsuccessful pull. The sequence of successes forms a 119:
and the Gittins index policy is a known good heuristic but no optimal solution exists in general. In fact, in general this problem is
2865:
Scully, Ziv and Harchol-Balter, Mor and Scheller-Wolf, Alan (2018). "SOAP: One Clean Analysis of All Age-Based Scheduling Policies".
2756:
Scully, Ziv and Harchol-Balter, Mor and Scheller-Wolf, Alan (2018). "SOAP: One Clean Analysis of All Age-Based Scheduling Policies".
2960: 2920: 2806: 965:{\displaystyle v(i,k)=\sup _{\tau >0}\left\langle \sum _{t=0}^{\tau -1}\beta ^{t}R+\beta ^{t}k\right\rangle _{Z(0)=i}} 3007: 2603: 2571: 2542: 2509:
Varaiya, P.; Walrand, J.; Buyukkoc, C. (May 1985). "Extensions of the multiarmed bandit problem: The discounted case".
2124: 2678:
Ni, Mora J (2007). "A (2/3)^n Fast-Pivoting Algorithm for the Gittins Index and Optimal Stopping of a Markov Chain".
2569:
Kallenberg, Lodewijk C. M. (1986). "A Note on M. N. Katehakis' and Y.-R. Chen's Computation of the Gittins Index".
2389:
Gittins, J. C.; Jones, D. M. (1979). "A Dynamic Allocation Index for the Discounted Multiarmed Bandit Problem".
537: 3124: 1909: 2952: 2477: 981: 700: 145: 115:
problems) and the Gittins index policy is optimal. If multiple projects can evolve, the problem is called
2064: 3129: 1333:{\displaystyle h(i)=\sup _{\pi }\left\langle \sum _{t=0}^{\tau -1}\beta ^{t}R\right\rangle _{Z(0)=i}} 3119: 2908: 2842: 2692: 3005:; Veinott, Arthur F. Jr. (1987). "The multi-armed bandit problem: decomposition and computation". 2601:; Veinott, Arthur F. Jr. (1987). "The multi-armed bandit problem: decomposition and computation". 511: 174: 3075: 2651:
Sonin I (2008). "A generalized Gittins index for a Markov chain and its recursive calculation".
2183: 1882: 1498: 1081:
and Veinott (1987) associates to every state the action of restarting from one arbitrary state
2687: 1867:{\displaystyle Q^{\tau }(i)=\left\langle 1-\prod _{t=0}^{\tau -1}\beta \right\rangle _{Z(0)=i}} 1449: 1396: 1051: 433: 56: 2894: 2785: 2287:
Cowan, Robin (July 1991). "Tortoises and Hares: Choice among technologies of unknown merit".
2212: 1974: 112: 774: 160:
and conjectured that the same index would be a good heuristic in a more general setup named
2864: 2755: 1366: 1154: 1104: 679: 661: 196: 92: 29: 2032: 2003: 1346: 694:
The dynamic programming formulation in terms of retirement process, given by Whittle, is:
462: 8: 195:
computational effort of the pivot steps and thereby achieving the same complexity as the
153: 149: 100:
the stochastic process with the currently highest Gittins index, is the solution of some
3081:
Cowan, Robin (1991). "Tortoises and Hares: Choice Among Technologies of Unknown Merit".
3098: 3032: 3024: 2990: 2930: 2882: 2816: 2773: 2705: 2628: 2620: 2495: 2459: 2451: 2406: 2352: 2344: 2340: 2304: 1741:{\displaystyle R^{\tau }(i)=\left\langle \sum _{t=0}^{\tau -1}R\right\rangle _{Z(0)=i}} 1478: 1181: 1134: 1084: 491: 169: 108: 96: 60: 21: 55:
suggests a solution for problems such as this. He takes the two basic functions of a "
3045: 3002: 2956: 2916: 2886: 2777: 2709: 2598: 2537: 1078: 72: 2463: 2370:
Mitten L (1960). "An Analytic Solution to the Least Cost Testing Sequence Problem".
2356: 3090: 3057: 3016: 2982: 2970: 2874: 2765: 2736: 2697: 2660: 2612: 2580: 2551: 2518: 2491: 2441: 2433: 2398: 2336: 2296: 102: 3036: 2632: 2944: 1614:{\displaystyle \alpha (i)=\sup _{\tau >0}{\frac {R^{\tau }(i)}{Q^{\tau }(i)}}} 138: 133: 120: 52: 2915:. Monographs on Statistics and Applied Probability. London: Chapman & Hall. 2826: 2812: 3061: 2986: 2741: 2724: 2664: 3113: 2522: 25:
terminating state to the ultimate terminal state, inclusive. The index is a
2804: 2701: 2540:(1986). "Linear programming for finite state multi-armed bandit problems". 165: 68: 3020: 2616: 2584: 2555: 2327:
Gittins, J. C. (1979). "Bandit Processes and Dynamic Allocation Indices".
3048:(2014). "Multi-armed Bandits under General Depreciation and Commitment". 648:{\displaystyle \langle X\rangle _{c}\doteq \sum _{x\in \chi }xP\{X=x|c\}} 89: 26: 2446: 1495:, a generalization introduced by Sonin (2008) defines the Gittins index 3102: 3028: 2994: 2867:
Proceedings of the ACM on Measurement and Analysis of Computing Systems
2758:
Proceedings of the ACM on Measurement and Analysis of Computing Systems
2624: 2455: 2424:
Weitzman, Martin L. (1979). "Optimal Search for the Best Alternative".
2410: 2348: 2308: 534:
is the probability that the stochastic process does not terminate, and
2951:. Wiley-Interscience Series in Systems and Optimization. foreword by 2821: 488:
is the utility (also called reward) associated to the discrete state
123:
and it is generally accepted that no feasible solution can be found.
3094: 2878: 2769: 2437: 2402: 2300: 2907: 2329:
Journal of the Royal Statistical Society. Series B (Methodological)
1524:
as the maximum discounted total reward per chance of termination.
3077:
Matlab/Octave implementation of the index computation algorithms
2973:(November 1992). "On the Gittins index for multiarmed bandits". 1178:
if one can always choose to continue or restart from that state
20:
is a measure of the reward that can be achieved through a given
2805:
Di Gregorio, Lorenzo and Frascolla, Valerio (October 1, 2019).
2725:"Multi-armed bandits under general depreciation and commitment" 181:
and can be calculated exactly by solving that problem with the
3043: 164:. The question of how to actually calculate the index for 63:" problem and shows how these problems can be solved using 3050:
Probability in the Engineering and Informational Sciences
2729:
Probability in the Engineering and Informational Sciences
141:
established the same result in the economics literature.
2508: 1077:
is a Markov chain with rewards, the interpretation of
2913:
Bandit problems: Sequential allocation of experiments
2723:
Cowan, Wesley; Katehakis, Michael N. (January 2015).
2480:(1980). "Multi-armed Bandits and the Gittins Index". 2215: 2186: 2127: 2067: 2035: 2006: 1977: 1912: 1885: 1755: 1635: 1533: 1501: 1481: 1452: 1399: 1369: 1349: 1207: 1184: 1157: 1151:
is the highest total reward which can be achieved on
1137: 1107: 1087: 1054: 984: 818: 777: 703: 664: 580: 540: 514: 494: 465: 436: 222: 567:
is the conditional expectation operator given 
2483:Journal of the Royal Statistical Society, Series B 2230: 2201: 2169: 2112: 2050: 2021: 1992: 1963: 1898: 1866: 1740: 1613: 1516: 1487: 1467: 1429: 1382: 1355: 1332: 1190: 1170: 1143: 1120: 1093: 1069: 1032: 964: 798: 760: 670: 647: 559: 526: 500: 480: 451: 419: 2170:{\displaystyle \alpha (i)\neq k\nu (i),\forall k} 1101:, thereby constructing a Markov decision process 689: 88:In applied mathematics, the "Gittins index" is a 3111: 3001: 2597: 1550: 1224: 975:with the same notation as above. It holds that 841: 719: 239: 1043: 213:The classical definition by Gittins et al. is: 177:constructed over the Markov chain and known as 50:Bandit Processes and Dynamic Allocation Indices 2722: 2646: 2644: 2642: 2180:this observation leads Sonin to conclude that 2943: 2808:Handover Optimality in Heterogeneous Networks 2535: 2935:: CS1 maint: multiple names: authors list ( 2899:: CS1 maint: multiple names: authors list ( 2847:: CS1 maint: multiple names: authors list ( 2790:: CS1 maint: multiple names: authors list ( 2388: 2238:is the "true meaning" of the Gittins index. 755: 722: 642: 622: 588: 581: 548: 541: 2969: 2639: 208: 2955:. Chichester: John Wiley & Sons, Ltd. 2677: 2568: 560:{\displaystyle \langle \cdot \rangle _{c}} 203: 2820: 2740: 2691: 2445: 148:demonstrated that the index emerges as a 144:Soon after the seminal paper of Gittins, 2423: 2369: 1964:{\displaystyle \prod _{j=0}^{t-1}\beta } 2650: 2476: 2326: 1033:{\displaystyle \nu (i)=(1-\beta )w(i).} 761:{\displaystyle w(i)=\inf\{k:v(i,k)=k\}} 3112: 2511:IEEE Transactions on Automatic Control 2322: 2320: 2318: 2250: 3080: 2949:Multi-armed bandit allocation indices 2286: 2282: 2280: 2278: 2276: 185:algorithm, or approximately with the 2113:{\displaystyle \alpha (i)=h(i)=w(i)} 1441: 2315: 95:value associated to the state of a 13: 3008:Mathematics of Operations Research 2653:Statistics and Probability Letters 2604:Mathematics of Operations Research 2572:Mathematics of Operations Research 2543:Mathematics of Operations Research 2496:10.1111/j.2517-6161.1980.tb01111.x 2341:10.1111/j.2517-6161.1979.tb01068.x 2273: 2241: 2161: 156:formulation of the problem called 14: 3141: 3069: 2975:The Annals of Applied Probability 2372:Journal of Industrial Engineering 1131:The Gittins Index of that state 2798: 2749: 2716: 2671: 2591: 1446:If the probability of survival 2562: 2529: 2502: 2470: 2417: 2382: 2363: 2225: 2219: 2196: 2190: 2155: 2149: 2137: 2131: 2107: 2101: 2092: 2086: 2077: 2071: 2045: 2039: 2016: 2010: 1987: 1981: 1958: 1955: 1949: 1943: 1853: 1847: 1835: 1832: 1826: 1820: 1772: 1766: 1727: 1721: 1709: 1706: 1700: 1694: 1652: 1646: 1605: 1599: 1584: 1578: 1543: 1537: 1511: 1505: 1462: 1456: 1424: 1418: 1409: 1403: 1319: 1313: 1301: 1298: 1292: 1279: 1217: 1211: 1064: 1058: 1024: 1018: 1012: 1000: 994: 988: 951: 945: 917: 914: 908: 902: 834: 822: 793: 781: 746: 734: 713: 707: 690:Retirement process formulation 635: 475: 469: 446: 440: 404: 398: 335: 329: 317: 314: 308: 302: 232: 226: 35: 1: 2858: 83: 76:infinite sequence of pulls." 2680:INFORMS Journal on Computing 1044:Restart-in-state formulation 7: 2911:and Fristedt, Bert (1985). 527:{\displaystyle \beta <1} 10: 3146: 2202:{\displaystyle \alpha (i)} 1899:{\displaystyle \beta ^{t}} 1517:{\displaystyle \alpha (i)} 126: 65:Dynamic allocation indices 48:In a paper in 1979 called 3062:10.1017/S0269964814000217 2742:10.1017/S0269964814000217 2665:10.1016/j.spl.2008.01.049 1468:{\displaystyle \beta (i)} 1430:{\displaystyle h(i)=w(i)} 1070:{\displaystyle Z(\cdot )} 459:is a stochastic process, 452:{\displaystyle Z(\cdot )} 173:the expected reward of a 2523:10.1109/TAC.1985.1103989 2267: 1363:indicates a policy over 209:Dynamic allocation index 2987:10.1214/aoap/1177005588 2231:{\displaystyle \nu (i)} 1993:{\displaystyle \nu (i)} 204:Mathematical definition 175:Markov decision process 2702:10.1287/ijoc.1060.0206 2232: 2203: 2171: 2114: 2052: 2023: 1994: 1971:in the definitions of 1965: 1939: 1900: 1868: 1816: 1742: 1690: 1615: 1518: 1489: 1469: 1431: 1384: 1357: 1334: 1265: 1192: 1172: 1145: 1122: 1095: 1071: 1034: 966: 888: 800: 799:{\displaystyle v(i,k)} 762: 672: 649: 561: 528: 502: 482: 453: 421: 378: 288: 107:the problem is called 3125:Design of experiments 3021:10.1287/moor.12.2.262 3003:Katehakis, Michael N. 2829:on September 28, 2020 2617:10.1287/moor.12.2.262 2599:Katehakis, Michael N. 2585:10.1287/moor.11.1.184 2556:10.1287/moor.11.1.180 2538:Katehakis, Michael N. 2256:of energy over time. 2233: 2204: 2172: 2115: 2058:, then it holds that 2053: 2024: 1995: 1966: 1913: 1901: 1869: 1790: 1743: 1664: 1616: 1519: 1490: 1475:depends on the state 1470: 1432: 1385: 1383:{\displaystyle M_{i}} 1358: 1335: 1239: 1193: 1173: 1171:{\displaystyle M_{i}} 1146: 1123: 1121:{\displaystyle M_{i}} 1096: 1072: 1035: 967: 862: 801: 763: 673: 671:{\displaystyle \chi } 650: 562: 529: 503: 483: 454: 422: 352: 262: 192:elimination algorithm 113:Stochastic scheduling 3083:The Economic Journal 2289:The Economic Journal 2213: 2184: 2125: 2065: 2051:{\displaystyle h(i)} 2033: 2022:{\displaystyle w(i)} 2004: 1975: 1910: 1883: 1753: 1633: 1531: 1499: 1479: 1450: 1397: 1367: 1356:{\displaystyle \pi } 1347: 1205: 1182: 1155: 1135: 1105: 1085: 1052: 982: 816: 775: 701: 662: 578: 538: 512: 492: 481:{\displaystyle R(i)} 463: 434: 220: 197:Gaussian elimination 2251:Fractional problems 190:consequence of his 154:dynamic programming 150:Lagrange multiplier 3130:Sequential methods 2228: 2199: 2167: 2110: 2048: 2019: 1990: 1961: 1896: 1864: 1738: 1611: 1564: 1514: 1485: 1465: 1427: 1380: 1353: 1330: 1232: 1188: 1168: 1141: 1118: 1091: 1067: 1030: 962: 855: 796: 758: 668: 645: 615: 557: 524: 498: 478: 449: 417: 253: 158:retirement process 109:multi-armed bandit 97:stochastic process 61:multi-armed bandit 22:stochastic process 2962:978-0-471-92059-5 2922:978-0-412-24810-8 2659:(12): 1526–1533. 1609: 1549: 1488:{\displaystyle i} 1442:Generalized index 1390:. It holds that 1223: 1191:{\displaystyle i} 1144:{\displaystyle i} 1094:{\displaystyle i} 840: 600: 501:{\displaystyle i} 415: 238: 103:stopping problems 73:Bernoulli process 3137: 3106: 3089:(407): 801–814. 3065: 3040: 2998: 2981:(4): 1024–1033. 2966: 2940: 2934: 2926: 2909:Berry, Donald A. 2904: 2898: 2890: 2853: 2852: 2846: 2838: 2836: 2834: 2825:. Archived from 2824: 2802: 2796: 2795: 2789: 2781: 2753: 2747: 2746: 2744: 2720: 2714: 2713: 2695: 2675: 2669: 2668: 2648: 2637: 2636: 2595: 2589: 2588: 2566: 2560: 2559: 2533: 2527: 2526: 2506: 2500: 2499: 2474: 2468: 2467: 2449: 2421: 2415: 2414: 2386: 2380: 2379: 2367: 2361: 2360: 2324: 2313: 2312: 2295:(407): 801–814. 2284: 2237: 2235: 2234: 2229: 2208: 2206: 2205: 2200: 2176: 2174: 2173: 2168: 2119: 2117: 2116: 2111: 2057: 2055: 2054: 2049: 2028: 2026: 2025: 2020: 1999: 1997: 1996: 1991: 1970: 1968: 1967: 1962: 1938: 1927: 1905: 1903: 1902: 1897: 1895: 1894: 1873: 1871: 1870: 1865: 1863: 1862: 1842: 1838: 1815: 1804: 1765: 1764: 1747: 1745: 1744: 1739: 1737: 1736: 1716: 1712: 1689: 1678: 1645: 1644: 1620: 1618: 1617: 1612: 1610: 1608: 1598: 1597: 1587: 1577: 1576: 1566: 1563: 1523: 1521: 1520: 1515: 1494: 1492: 1491: 1486: 1474: 1472: 1471: 1466: 1436: 1434: 1433: 1428: 1389: 1387: 1386: 1381: 1379: 1378: 1362: 1360: 1359: 1354: 1339: 1337: 1336: 1331: 1329: 1328: 1308: 1304: 1291: 1290: 1275: 1274: 1264: 1253: 1231: 1197: 1195: 1194: 1189: 1177: 1175: 1174: 1169: 1167: 1166: 1150: 1148: 1147: 1142: 1127: 1125: 1124: 1119: 1117: 1116: 1100: 1098: 1097: 1092: 1076: 1074: 1073: 1068: 1039: 1037: 1036: 1031: 971: 969: 968: 963: 961: 960: 940: 936: 932: 931: 898: 897: 887: 876: 854: 805: 803: 802: 797: 767: 765: 764: 759: 677: 675: 674: 669: 654: 652: 651: 646: 638: 614: 596: 595: 566: 564: 563: 558: 556: 555: 533: 531: 530: 525: 507: 505: 504: 499: 487: 485: 484: 479: 458: 456: 455: 450: 426: 424: 423: 418: 416: 414: 413: 393: 389: 388: 387: 377: 366: 345: 344: 324: 320: 298: 297: 287: 276: 255: 252: 183:policy iteration 179:Restart in State 69:one armed bandit 59:Problem" and a " 3145: 3144: 3140: 3139: 3138: 3136: 3135: 3134: 3120:Decision theory 3110: 3109: 3095:10.2307/2233856 3072: 2963: 2928: 2927: 2923: 2892: 2891: 2879:10.1145/3179419 2861: 2856: 2843:cite conference 2840: 2839: 2832: 2830: 2803: 2799: 2783: 2782: 2770:10.1145/3179419 2754: 2750: 2721: 2717: 2676: 2672: 2649: 2640: 2596: 2592: 2567: 2563: 2536:Chen, Yih Ren; 2534: 2530: 2507: 2503: 2475: 2471: 2438:10.2307/1910412 2422: 2418: 2403:10.2307/2335176 2387: 2383: 2368: 2364: 2325: 2316: 2301:10.2307/2233856 2285: 2274: 2270: 2253: 2244: 2242:Queueing theory 2214: 2211: 2210: 2185: 2182: 2181: 2126: 2123: 2122: 2066: 2063: 2062: 2034: 2031: 2030: 2005: 2002: 2001: 1976: 1973: 1972: 1928: 1917: 1911: 1908: 1907: 1906:is replaced by 1890: 1886: 1884: 1881: 1880: 1843: 1805: 1794: 1783: 1779: 1778: 1760: 1756: 1754: 1751: 1750: 1717: 1679: 1668: 1663: 1659: 1658: 1640: 1636: 1634: 1631: 1630: 1593: 1589: 1588: 1572: 1568: 1567: 1565: 1553: 1532: 1529: 1528: 1500: 1497: 1496: 1480: 1477: 1476: 1451: 1448: 1447: 1444: 1398: 1395: 1394: 1374: 1370: 1368: 1365: 1364: 1348: 1345: 1344: 1309: 1286: 1282: 1270: 1266: 1254: 1243: 1238: 1234: 1233: 1227: 1206: 1203: 1202: 1183: 1180: 1179: 1162: 1158: 1156: 1153: 1152: 1136: 1133: 1132: 1112: 1108: 1106: 1103: 1102: 1086: 1083: 1082: 1053: 1050: 1049: 1046: 983: 980: 979: 941: 927: 923: 893: 889: 877: 866: 861: 857: 856: 844: 817: 814: 813: 776: 773: 772: 702: 699: 698: 692: 663: 660: 659: 634: 604: 591: 587: 579: 576: 575: 551: 547: 539: 536: 535: 513: 510: 509: 493: 490: 489: 464: 461: 460: 435: 432: 431: 394: 383: 379: 367: 356: 351: 347: 346: 325: 293: 289: 277: 266: 261: 257: 256: 254: 242: 221: 218: 217: 211: 206: 187:value iteration 162:Restless bandit 139:Martin Weitzman 129: 117:Restless bandit 86: 53:John C. Gittins 38: 12: 11: 5: 3143: 3133: 3132: 3127: 3122: 3108: 3107: 3078: 3071: 3070:External links 3068: 3067: 3066: 3046:M.N. Katehakis 3044:Cowan, W. and 3041: 3015:(2): 262–268. 2999: 2967: 2961: 2941: 2921: 2905: 2873:(1). ACM: 16. 2860: 2857: 2855: 2854: 2813:5G World Forum 2797: 2764:(1). ACM: 16. 2748: 2715: 2693:10.1.1.77.5127 2686:(4): 596–606. 2670: 2638: 2611:(2): 262–268. 2590: 2579:(1): 184–186. 2561: 2550:(1): 180–183. 2528: 2517:(5): 426–439. 2501: 2490:(2): 143–149. 2478:Whittle, Peter 2469: 2432:(3): 641–654. 2416: 2397:(3): 561–565. 2381: 2362: 2335:(2): 148–177. 2314: 2271: 2269: 2266: 2252: 2249: 2243: 2240: 2227: 2224: 2221: 2218: 2198: 2195: 2192: 2189: 2178: 2177: 2166: 2163: 2160: 2157: 2154: 2151: 2148: 2145: 2142: 2139: 2136: 2133: 2130: 2120: 2109: 2106: 2103: 2100: 2097: 2094: 2091: 2088: 2085: 2082: 2079: 2076: 2073: 2070: 2047: 2044: 2041: 2038: 2018: 2015: 2012: 2009: 1989: 1986: 1983: 1980: 1960: 1957: 1954: 1951: 1948: 1945: 1942: 1937: 1934: 1931: 1926: 1923: 1920: 1916: 1893: 1889: 1877: 1876: 1875: 1874: 1861: 1858: 1855: 1852: 1849: 1846: 1841: 1837: 1834: 1831: 1828: 1825: 1822: 1819: 1814: 1811: 1808: 1803: 1800: 1797: 1793: 1789: 1786: 1782: 1777: 1774: 1771: 1768: 1763: 1759: 1748: 1735: 1732: 1729: 1726: 1723: 1720: 1715: 1711: 1708: 1705: 1702: 1699: 1696: 1693: 1688: 1685: 1682: 1677: 1674: 1671: 1667: 1662: 1657: 1654: 1651: 1648: 1643: 1639: 1622: 1621: 1607: 1604: 1601: 1596: 1592: 1586: 1583: 1580: 1575: 1571: 1562: 1559: 1556: 1552: 1548: 1545: 1542: 1539: 1536: 1513: 1510: 1507: 1504: 1484: 1464: 1461: 1458: 1455: 1443: 1440: 1439: 1438: 1426: 1423: 1420: 1417: 1414: 1411: 1408: 1405: 1402: 1377: 1373: 1352: 1341: 1340: 1327: 1324: 1321: 1318: 1315: 1312: 1307: 1303: 1300: 1297: 1294: 1289: 1285: 1281: 1278: 1273: 1269: 1263: 1260: 1257: 1252: 1249: 1246: 1242: 1237: 1230: 1226: 1222: 1219: 1216: 1213: 1210: 1187: 1165: 1161: 1140: 1115: 1111: 1090: 1066: 1063: 1060: 1057: 1045: 1042: 1041: 1040: 1029: 1026: 1023: 1020: 1017: 1014: 1011: 1008: 1005: 1002: 999: 996: 993: 990: 987: 973: 972: 959: 956: 953: 950: 947: 944: 939: 935: 930: 926: 922: 919: 916: 913: 910: 907: 904: 901: 896: 892: 886: 883: 880: 875: 872: 869: 865: 860: 853: 850: 847: 843: 839: 836: 833: 830: 827: 824: 821: 808:value function 795: 792: 789: 786: 783: 780: 769: 768: 757: 754: 751: 748: 745: 742: 739: 736: 733: 730: 727: 724: 721: 718: 715: 712: 709: 706: 691: 688: 667: 656: 655: 644: 641: 637: 633: 630: 627: 624: 621: 618: 613: 610: 607: 603: 599: 594: 590: 586: 583: 554: 550: 546: 543: 523: 520: 517: 497: 477: 474: 471: 468: 448: 445: 442: 439: 428: 427: 412: 409: 406: 403: 400: 397: 392: 386: 382: 376: 373: 370: 365: 362: 359: 355: 350: 343: 340: 337: 334: 331: 328: 323: 319: 316: 313: 310: 307: 304: 301: 296: 292: 286: 283: 280: 275: 272: 269: 265: 260: 251: 248: 245: 241: 237: 234: 231: 228: 225: 210: 207: 205: 202: 128: 125: 85: 82: 37: 34: 9: 6: 4: 3: 2: 3142: 3131: 3128: 3126: 3123: 3121: 3118: 3117: 3115: 3104: 3100: 3096: 3092: 3088: 3084: 3079: 3076: 3074: 3073: 3063: 3059: 3055: 3051: 3047: 3042: 3038: 3034: 3030: 3026: 3022: 3018: 3014: 3010: 3009: 3004: 3000: 2996: 2992: 2988: 2984: 2980: 2976: 2972: 2968: 2964: 2958: 2954: 2953:Peter Whittle 2950: 2946: 2945:Gittins, J.C. 2942: 2938: 2932: 2924: 2918: 2914: 2910: 2906: 2902: 2896: 2888: 2884: 2880: 2876: 2872: 2868: 2863: 2862: 2850: 2844: 2828: 2823: 2818: 2814: 2810: 2809: 2801: 2793: 2787: 2779: 2775: 2771: 2767: 2763: 2759: 2752: 2743: 2738: 2734: 2730: 2726: 2719: 2711: 2707: 2703: 2699: 2694: 2689: 2685: 2681: 2674: 2666: 2662: 2658: 2654: 2647: 2645: 2643: 2634: 2630: 2626: 2622: 2618: 2614: 2610: 2606: 2605: 2600: 2594: 2586: 2582: 2578: 2574: 2573: 2565: 2557: 2553: 2549: 2545: 2544: 2539: 2532: 2524: 2520: 2516: 2512: 2505: 2497: 2493: 2489: 2485: 2484: 2479: 2473: 2465: 2461: 2457: 2453: 2448: 2443: 2439: 2435: 2431: 2427: 2420: 2412: 2408: 2404: 2400: 2396: 2392: 2385: 2377: 2373: 2366: 2358: 2354: 2350: 2346: 2342: 2338: 2334: 2330: 2323: 2321: 2319: 2310: 2306: 2302: 2298: 2294: 2290: 2283: 2281: 2279: 2277: 2272: 2265: 2261: 2257: 2248: 2239: 2222: 2216: 2193: 2187: 2164: 2158: 2152: 2146: 2143: 2140: 2134: 2128: 2121: 2104: 2098: 2095: 2089: 2083: 2080: 2074: 2068: 2061: 2060: 2059: 2042: 2036: 2013: 2007: 1984: 1978: 1952: 1946: 1940: 1935: 1932: 1929: 1924: 1921: 1918: 1914: 1891: 1887: 1859: 1856: 1850: 1844: 1839: 1829: 1823: 1817: 1812: 1809: 1806: 1801: 1798: 1795: 1791: 1787: 1784: 1780: 1775: 1769: 1761: 1757: 1749: 1733: 1730: 1724: 1718: 1713: 1703: 1697: 1691: 1686: 1683: 1680: 1675: 1672: 1669: 1665: 1660: 1655: 1649: 1641: 1637: 1629: 1628: 1627: 1626: 1625: 1602: 1594: 1590: 1581: 1573: 1569: 1560: 1557: 1554: 1546: 1540: 1534: 1527: 1526: 1525: 1508: 1502: 1482: 1459: 1453: 1421: 1415: 1412: 1406: 1400: 1393: 1392: 1391: 1375: 1371: 1350: 1325: 1322: 1316: 1310: 1305: 1295: 1287: 1283: 1276: 1271: 1267: 1261: 1258: 1255: 1250: 1247: 1244: 1240: 1235: 1228: 1220: 1214: 1208: 1201: 1200: 1199: 1185: 1163: 1159: 1138: 1129: 1113: 1109: 1088: 1080: 1061: 1055: 1027: 1021: 1015: 1009: 1006: 1003: 997: 991: 985: 978: 977: 976: 957: 954: 948: 942: 937: 933: 928: 924: 920: 911: 905: 899: 894: 890: 884: 881: 878: 873: 870: 867: 863: 858: 851: 848: 845: 837: 831: 828: 825: 819: 812: 811: 810: 809: 790: 787: 784: 778: 752: 749: 743: 740: 737: 731: 728: 725: 716: 710: 704: 697: 696: 695: 687: 685: 681: 665: 639: 631: 628: 625: 619: 616: 611: 608: 605: 601: 597: 592: 584: 574: 573: 572: 570: 552: 544: 521: 518: 515: 495: 472: 466: 443: 437: 410: 407: 401: 395: 390: 384: 380: 374: 371: 368: 363: 360: 357: 353: 348: 341: 338: 332: 326: 321: 311: 305: 299: 294: 290: 284: 281: 278: 273: 270: 267: 263: 258: 249: 246: 243: 235: 229: 223: 216: 215: 214: 201: 198: 193: 188: 184: 180: 176: 171: 167: 166:Markov chains 163: 159: 155: 151: 147: 146:Peter Whittle 142: 140: 135: 124: 122: 118: 114: 111:(one type of 110: 105: 104: 98: 94: 91: 81: 77: 74: 70: 66: 62: 58: 54: 51: 46: 42: 33: 31: 28: 23: 19: 18:Gittins index 3086: 3082: 3053: 3049: 3012: 3006: 2978: 2974: 2948: 2912: 2895:cite journal 2870: 2866: 2831:. Retrieved 2827:the original 2822:1908.09991v2 2807: 2800: 2786:cite journal 2761: 2757: 2751: 2735:(1): 51–76. 2732: 2728: 2718: 2683: 2679: 2673: 2656: 2652: 2608: 2602: 2593: 2576: 2570: 2564: 2547: 2541: 2531: 2514: 2510: 2504: 2487: 2481: 2472: 2447:1721.1/31303 2429: 2426:Econometrica 2425: 2419: 2394: 2390: 2384: 2375: 2371: 2365: 2332: 2328: 2292: 2288: 2262: 2258: 2254: 2245: 2179: 1878: 1623: 1445: 1342: 1130: 1047: 974: 807: 770: 693: 683: 657: 568: 429: 212: 191: 186: 182: 178: 161: 157: 143: 130: 116: 101: 87: 78: 64: 49: 47: 43: 39: 17: 15: 2971:Weber, R.R. 121:NP-complete 36:Terminology 3114:Categories 2859:References 2391:Biometrika 678:being the 84:Definition 57:scheduling 3056:: 51–76. 2931:cite book 2887:216145213 2833:April 18, 2778:216145213 2710:122785013 2688:CiteSeerX 2217:ν 2188:α 2162:∀ 2147:ν 2141:≠ 2129:α 2069:α 1979:ν 1941:β 1933:− 1915:∏ 1888:β 1818:β 1810:− 1807:τ 1792:∏ 1788:− 1762:τ 1684:− 1681:τ 1666:∑ 1642:τ 1595:τ 1574:τ 1555:τ 1535:α 1503:α 1454:β 1351:π 1288:π 1268:β 1259:− 1256:τ 1241:∑ 1229:π 1079:Katehakis 1062:⋅ 1010:β 1007:− 986:ν 925:β 891:β 882:− 879:τ 864:∑ 846:τ 666:χ 612:χ 609:∈ 602:∑ 598:≐ 589:⟩ 582:⟨ 549:⟩ 545:⋅ 542:⟨ 516:β 444:⋅ 381:β 372:− 369:τ 354:∑ 291:β 282:− 279:τ 264:∑ 244:τ 224:ν 2947:(1989). 2464:32530881 2378:(1): 17. 2357:17724147 2209:and not 1840:⟩ 1781:⟨ 1714:⟩ 1661:⟨ 1306:⟩ 1236:⟨ 938:⟩ 859:⟨ 682:of  391:⟩ 349:⟨ 322:⟩ 259:⟨ 3103:2233856 3029:3689689 2995:2959678 2625:3689689 2456:1910412 2411:2335176 2349:2985029 2309:2233856 806:is the 152:from a 134:Gittins 127:History 3101:  3037:656323 3035:  3027:  2993:  2959:  2919:  2885:  2776:  2708:  2690:  2633:656323 2631:  2623:  2462:  2454:  2409:  2355:  2347:  2307:  1624:where 1343:where 771:where 680:domain 430:where 93:scalar 30:scalar 3099:JSTOR 3033:S2CID 3025:JSTOR 2991:JSTOR 2883:S2CID 2817:arXiv 2774:S2CID 2706:S2CID 2629:S2CID 2621:JSTOR 2460:S2CID 2452:JSTOR 2407:JSTOR 2353:S2CID 2345:JSTOR 2305:JSTOR 2268:Notes 658:with 2957:ISBN 2937:link 2917:ISBN 2901:link 2849:link 2835:2020 2792:link 2029:and 1558:> 849:> 519:< 247:> 90:real 27:real 16:The 3091:doi 3087:101 3058:doi 3017:doi 2983:doi 2875:doi 2766:doi 2737:doi 2698:doi 2661:doi 2613:doi 2581:doi 2552:doi 2519:doi 2492:doi 2442:hdl 2434:doi 2399:doi 2337:doi 2297:doi 2293:101 1879:If 1551:sup 1225:sup 1048:If 842:sup 720:inf 240:sup 3116:: 3097:. 3085:. 3054:29 3052:. 3031:. 3023:. 3013:12 3011:. 2989:. 2977:. 2933:}} 2929:{{ 2897:}} 2893:{{ 2881:. 2869:. 2845:}} 2841:{{ 2815:. 2811:. 2788:}} 2784:{{ 2772:. 2760:. 2733:29 2731:. 2727:. 2704:. 2696:. 2684:19 2682:. 2657:78 2655:. 2641:^ 2627:. 2619:. 2609:12 2607:. 2577:11 2575:. 2548:11 2546:. 2515:30 2513:. 2488:42 2486:. 2458:. 2450:. 2440:. 2430:47 2428:. 2405:. 2395:66 2393:. 2376:11 2374:. 2351:. 2343:. 2333:41 2331:. 2317:^ 2303:. 2291:. 2275:^ 2000:, 1198:. 1128:. 686:. 571:: 508:, 170:LP 32:. 3105:. 3093:: 3064:. 3060:: 3039:. 3019:: 2997:. 2985:: 2979:2 2965:. 2939:) 2925:. 2903:) 2889:. 2877:: 2871:2 2851:) 2837:. 2819:: 2794:) 2780:. 2768:: 2762:2 2745:. 2739:: 2712:. 2700:: 2667:. 2663:: 2635:. 2615:: 2587:. 2583:: 2558:. 2554:: 2525:. 2521:: 2498:. 2494:: 2466:. 2444:: 2436:: 2413:. 2401:: 2359:. 2339:: 2311:. 2299:: 2226:) 2223:i 2220:( 2197:) 2194:i 2191:( 2165:k 2159:, 2156:) 2153:i 2150:( 2144:k 2138:) 2135:i 2132:( 2108:) 2105:i 2102:( 2099:w 2096:= 2093:) 2090:i 2087:( 2084:h 2081:= 2078:) 2075:i 2072:( 2046:) 2043:i 2040:( 2037:h 2017:) 2014:i 2011:( 2008:w 1988:) 1985:i 1982:( 1959:] 1956:) 1953:j 1950:( 1947:Z 1944:[ 1936:1 1930:t 1925:0 1922:= 1919:j 1892:t 1860:i 1857:= 1854:) 1851:0 1848:( 1845:Z 1836:] 1833:) 1830:t 1827:( 1824:Z 1821:[ 1813:1 1802:0 1799:= 1796:t 1785:1 1776:= 1773:) 1770:i 1767:( 1758:Q 1734:i 1731:= 1728:) 1725:0 1722:( 1719:Z 1710:] 1707:) 1704:t 1701:( 1698:Z 1695:[ 1692:R 1687:1 1676:0 1673:= 1670:t 1656:= 1653:) 1650:i 1647:( 1638:R 1606:) 1603:i 1600:( 1591:Q 1585:) 1582:i 1579:( 1570:R 1561:0 1547:= 1544:) 1541:i 1538:( 1512:) 1509:i 1506:( 1483:i 1463:) 1460:i 1457:( 1437:. 1425:) 1422:i 1419:( 1416:w 1413:= 1410:) 1407:i 1404:( 1401:h 1376:i 1372:M 1326:i 1323:= 1320:) 1317:0 1314:( 1311:Z 1302:] 1299:) 1296:t 1293:( 1284:Z 1280:[ 1277:R 1272:t 1262:1 1251:0 1248:= 1245:t 1221:= 1218:) 1215:i 1212:( 1209:h 1186:i 1164:i 1160:M 1139:i 1114:i 1110:M 1089:i 1065:) 1059:( 1056:Z 1028:. 1025:) 1022:i 1019:( 1016:w 1013:) 1004:1 1001:( 998:= 995:) 992:i 989:( 958:i 955:= 952:) 949:0 946:( 943:Z 934:k 929:t 921:+ 918:] 915:) 912:t 909:( 906:Z 903:[ 900:R 895:t 885:1 874:0 871:= 868:t 852:0 838:= 835:) 832:k 829:, 826:i 823:( 820:v 794:) 791:k 788:, 785:i 782:( 779:v 756:} 753:k 750:= 747:) 744:k 741:, 738:i 735:( 732:v 729:: 726:k 723:{ 717:= 714:) 711:i 708:( 705:w 684:X 643:} 640:c 636:| 632:x 629:= 626:X 623:{ 620:P 617:x 606:x 593:c 585:X 569:c 553:c 522:1 496:i 476:) 473:i 470:( 467:R 447:) 441:( 438:Z 411:i 408:= 405:) 402:0 399:( 396:Z 385:t 375:1 364:0 361:= 358:t 342:i 339:= 336:) 333:0 330:( 327:Z 318:] 315:) 312:t 309:( 306:Z 303:[ 300:R 295:t 285:1 274:0 271:= 268:t 250:0 236:= 233:) 230:i 227:(

Index

stochastic process
real
scalar
John C. Gittins
scheduling
multi-armed bandit
one armed bandit
Bernoulli process
real
scalar
stochastic process
stopping problems
multi-armed bandit
Stochastic scheduling
NP-complete
Gittins
Martin Weitzman
Peter Whittle
Lagrange multiplier
dynamic programming
Markov chains
LP
Markov decision process
Gaussian elimination
domain
Katehakis



Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑