Knowledge

Query optimization

Source 📝

135:
structures are complex, in most cases, and especially for not-very-simple queries, the needed data for a query can be collected from a database by accessing it in different ways, through different data-structures, and in different orders. Each different way typically requires different processing time. Processing times of the same query may have large variance, from a fraction of a second to hours, depending on the chosen method. The purpose of query optimization, which is an automated process, is to find the way to process a given query in minimum time. The large possible variance in time justifies performing query optimization, though finding the exact optimal query plan, among all possibilities, is typically very complex, time-consuming by itself, may be too costly, and often practically impossible. Thus query optimization typically tries to approximate the optimum by comparing several common-sense alternatives to provide in a reasonable time a "good enough" plan which typically does not deviate much from the best possible result.
482:
but must be to find a query plan that realizes the best compromise between different cost metrics. What the best compromise is depends on user preferences (e.g., some users might prefer a cheaper plan while others prefer a faster plan in a cloud scenario). The goal of optimization is therefore either to find the best query plan based on some specification of user preferences provided as input to the optimizer (e.g., users can define weights between different cost metrics to express relative importance or define hard cost bounds on certain metrics) or to generate an approximation of the set of Pareto-optimal query plans (i.e., plans such that no other plan has better cost according to all metrics) such that the user can select the preferred cost tradeoff out of that plan set.
216: 207: 36: 1287: 1297: 1307: 481:
Different cost metrics might conflict with each other (e.g., there might be one plan with minimal execution time and a different plan with minimal monetary execution fees in a cloud computing scenario). Therefore, the goal of optimization cannot be to find a query plan that minimizes all cost metrics
473:
scenario for instance, one should compare query plans not only in terms of how much time they take to execute but also in terms of how much money their execution costs. Or in the context of approximate query optimization, it is possible to execute query plans on randomly selected samples of the input
134:
123-45-6789," or more complex like "find the average salary of all the employed married men in California between the ages 30 to 39 who earn less than their spouses." The result of a query is generated by processing the rows in a database in a way that yields the requested information. Since database
490:
Multi-objective parametric query optimization generalizes parametric and multi-objective query optimization. Plans are compared according to multiple cost metrics and plan costs may depend on parameters whose values are unknown at optimization time. The cost of a query plan is therefore modeled as a
477:
Multi-objective query optimization models the cost of a query plan as a cost vector where each vector component represents cost according to a different cost metric. Classical query optimization can be considered as a special case of multi-objective query optimization where the dimension of the cost
147:
and the quality of the choice; the optimizer may not choose the best answer on its own. Different qualities of database management systems have different ways of balancing these two. Cost-based query optimizers evaluate the resource footprint of various query plans and use this as the basis for plan
460:
The goal of optimization is usually to generate all query plans that could be optimal for any of the possible parameter value combinations. This yields a set of relevant query plans. At run time, the best plan is selected out of that set once the true parameter values become known. The advantage of
456:
Classical query optimization associates each query plan with one scalar cost value. Parametric query optimization assumes that query plan cost depends on parameters whose values are unknown at optimization time. Such parameters can for instance represent the selectivity of query predicates that are
188:
of "plan nodes". A plan node encapsulates a single operation that is required to execute the query. The nodes are arranged as a tree, in which intermediate results flow from the bottom of the tree to the top. Each node has zero or more child nodes—those are nodes whose output is fed as input to the
447:
Classical query optimization assumes that query plans are compared according to one single cost metric, usually execution time, and that the cost of each query plan can be calculated without uncertainty. Both assumptions are sometimes violated in practice and multiple extensions of classical query
266:
The performance of a query plan is determined largely by the order in which the tables are joined. For example, when joining 3 tables A, B, C of size 10 rows, 10,000 rows, and 1,000,000 rows, respectively, a query plan that joins B and C first can take several orders-of-magnitude more time to
341:
into a select-project-join query, but not always. Query plans for nested SQL queries can also be chosen using the same dynamic programming algorithm as used for join ordering, but this can lead to an enormous escalation in query optimization time. So some database management systems use an
189:
parent node. For example, a join node will have two child nodes, which represent the two join operands, whereas a sort node would have a single child node (the input to be sorted). The leaves of the tree are nodes which produce results by scanning the disk, for example by performing an
122:
Generally, the query optimizer cannot be accessed directly by users: once queries are submitted to the database server, and parsed by the parser, they are then passed to the query optimizer where optimization occurs. However, some database engines allow guiding the query optimizer with
498:
to show which operations have the highest processing cost. Microsoft SMS, ApexSQLPlan, Hana, and Tableau are some examples. Fixing these issues found in these plans can shave tens of percent execution time, and in some cases can cut two-dimensional searches to linear ones.
457:
not fully specified at optimization time but will be provided at execution time. Parametric query optimization therefore associates each query plan with a cost function that maps from a multi-dimensional parameter space to a one-dimensional cost space.
474:
data in order to obtain approximate results with reduced execution overhead. In such cases, alternative query plans must be compared in terms of their execution time but also in terms of the precision or reliability of the data they generate.
491:
function from a multi-dimensional parameter space to a multi-dimensional cost space. The goal of optimization is to generate the set of query plans that can be optimal for each possible combination of parameter values and user preferences.
148:
selection. These assign an estimated "cost" to each possible query plan, and choose the plan with the smallest cost. Costs are used to estimate the runtime cost of evaluating the query, in terms of the number of I/O operations required,
350:
One of the hardest problems in query optimization is to accurately estimate the costs of alternative query plans. Optimizers cost query plans using a mathematical model of query execution costs that relies heavily on estimates of the
435:), and it is very hard to estimate the selectivity of the conjunct in general. Poor cardinality estimates and uncaught correlation are one of the main reasons why query optimizers pick poor query plans. This is one reason why a 448:
optimization have been studied in the research literature that overcome those limitations. Those extended problem variants differ in how they model the cost of single query plans and in terms of their optimization goal.
298:
in the query, an index scan can also be used. For each relation, the optimizer records the cheapest way to scan the relation, as well as the cheapest way to scan the relation that produces records in a particular sorted
156:. The set of query plans examined is formed by examining the possible access paths (e.g., primary index access, secondary index access, full file scan) and various relational table join techniques (e.g., 313:
Sort order can avoid a redundant sort operation later on in processing the query. Second, a particular sort order can speed up a subsequent join because it clusters the data in a particular way.
306:. It will preserve the cheapest way to join each pair of relations, in addition to the cheapest way to join each pair of relations that produces its output according to a particular sort order. 302:
The optimizer then considers combining each pair of relations for which a join condition exists. For each pair, the optimizer will consider the available join algorithms implemented by the
359:
of predicates in the query. Traditionally, database systems estimate selectivities through fairly detailed statistics on the distribution of values in each column, such as
330: 322: 309:
Then all three-relation query plans are computed, by joining each two-relation plan produced by the previous phase with the remaining relations in the query.
897: 152:, amount of disk buffer space, disk storage service time, and interconnect usage between units of parallelism, and other factors determined from the 880: 545: 892: 165: 334: 321:
A SQL query to a modern relational DBMS does more than just selections and joins. In particular, SQL queries often nest several layers of
338: 502:
One of the primary and simplest optimization checklists is to use operations that most RDMS are designed to execute efficiently. See
355:, or number of tuples, flowing through each edge in a query plan. Cardinality estimation in turn depends on estimates of the 100: 461:
parametric query optimization is that optimization (which is in general a very expensive operation) is avoided at run time.
1331: 1290: 963: 852: 79: 57: 50: 295: 176:
to solve the query—and physical optimization—which is used to determine the means of carrying out each operation.
1310: 1016: 363:. This technique works well for estimation of selectivities of individual predicates. However many queries have 17: 469:
There are often other cost metrics in addition to execution time that are relevant to compare query plans. In a
172:
query. There are two types of optimization. These consist of logical optimization—which generates a sequence of
130:
A query is a request for information from a database. It can be as simple as "find the address of a person with
1267: 914: 660: 1206: 608: 290:
in the query are computed. Every relation in the query can be accessed via a sequential scan. If there is an
764:
Ioannidis, Yannis; Ng, Raymond T.; Shim, Kyuseok; Sellis, Timos K. (1997). "Parametric Query Optimization".
1336: 1201: 1232: 951: 643:; Lorie, R. A.; Price, T. G. (1979). "Access Path Selection in a Relational Database Management System". 1155: 1145: 921: 303: 1242: 975: 44: 778: 115:
attempts to determine the most efficient way to execute a given query by considering the possible
1191: 845: 149: 1272: 1227: 904: 773: 436: 131: 61: 1247: 1001: 515: 356: 439:
should regularly update the database statistics, especially after major data loads/unloads.
1300: 1237: 1119: 1089: 958: 909: 640: 8: 1257: 1150: 1135: 1062: 887: 364: 272: 185: 1252: 1114: 1006: 946: 838: 814: 791: 586: 173: 215: 206: 1072: 926: 656: 636: 604: 590: 1262: 1109: 1099: 1067: 795: 783: 746: 648: 616: 576: 291: 287: 96: 1170: 1140: 1094: 875: 645:
Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data
520: 470: 153: 1222: 1160: 1104: 1077: 970: 931: 326: 108: 830: 168:). The search space can become quite large depending on the complexity of the 1325: 1041: 1026: 750: 734: 676: 143:
There is a trade-off between the amount of time spent figuring out the best
1341: 279: 267:
execute than one that joins A and C first. Most query optimizers determine
787: 652: 621: 581: 564: 1031: 1011: 352: 1175: 1084: 1046: 1021: 495: 360: 268: 225: 190: 157: 144: 124: 116: 161: 27:
Feature to efficiently execute queries efficiently in DBMS softwares
1036: 991: 861: 525: 503: 819: 712: 613:
Proceedings of the ACM Symposium on Principles of Database Systems
695:"NoSQL Architect: Data modeling/query optimization for DynamoDB" 485: 941: 635: 342:
alternative rule-based approach that uses a query graph model.
427:. Query predicates are often highly correlated (for example, 936: 104: 996: 811:
Approximation Schemes for Many-Objective Query Optimization
694: 478:
space (i.e., the number of cost vector components) is one.
609:"An Overview of Query Optimization in Relational Systems" 276: 169: 337:
operators. In some cases such nested SQL queries can be
316: 282:
database project . This algorithm works in two stages:
763: 464: 184:Most query optimizers represent query plans as a 1323: 451: 860: 735:"Multi-Objective Parametric Query Optimization" 846: 486:Multi-objective parametric query optimization 809:Trummer, Immanuel; Koch, Christoph (2014). 808: 733:Trummer, Immanuel; Koch, Christoph (2015). 732: 294:on a relation that can be used to answer a 853: 839: 325:blocks (Select-Project-Join), by means of 818: 777: 620: 603: 580: 562: 138: 80:Learn how and when to remove this message 43:This article includes a list of general 14: 1324: 629: 597: 556: 101:relational database management systems 834: 728: 726: 317:Query planning for nested SQL queries 677:"Oracle SQL cost based optimization" 29: 1306: 24: 723: 465:Multi-objective query optimization 345: 49:it lacks sufficient corresponding 25: 1353: 179: 1305: 1295: 1286: 1285: 563:Ioannidis, Yannis (March 1996). 257:first and joins the result with 245:first and joins the result with 214: 205: 196: 34: 1296: 802: 286:First, all ways to access each 813:. SIGMOD. pp. 1299–1310. 757: 705: 687: 669: 538: 13: 1: 531: 452:Parametric query optimization 442: 103:and other databases such as 7: 1332:Database management systems 862:Database management systems 509: 235:R(A, B) ⋈ S(B, C) ⋈ T(A, C) 10: 1358: 1268:Object–relational database 494:A number of tools display 1281: 1243:Federated database system 1215: 1184: 1128: 1055: 984: 976:Blockchain-based database 868: 368: 751:10.1145/2949741.2949748 275:algorithm pioneered by 64:more precise citations. 1273:Transaction processing 1228:Database normalization 1171:Query rewriting system 546:"IBM Knowledge Center" 437:database administrator 367:of predicates such as 193:or a sequential scan. 139:General considerations 132:Social Security number 1248:Referential integrity 788:10.1007/s007780050037 653:10.1145/582095.582099 622:10.1145/275487.275492 582:10.1145/234313.234367 569:ACM Computing Surveys 516:Join selection factor 496:query execution plans 1238:Distributed database 713:"EXPLAIN QUERY PLAN" 565:"Query optimization" 1337:Database algorithms 1258:Relational calculus 1136:Concurrency control 639:; Astrahan, M. M.; 273:dynamic programming 249:, the second joins 1253:Relational algebra 1197:Query optimization 1002:Armstrong's axioms 681:www.dba-oracle.com 647:. pp. 23–34. 615:. pp. 34–43. 605:Chaudhuri, Surajit 237:; the first joins 174:relational algebra 93:Query optimization 1319: 1318: 927:Wide-column store 922:Document-oriented 641:Chamberlin, D. D. 90: 89: 82: 16:(Redirected from 1349: 1309: 1308: 1299: 1298: 1289: 1288: 1263:Relational model 1233:Database storage 1110:Stored procedure 855: 848: 841: 832: 831: 825: 824: 822: 806: 800: 799: 781: 761: 755: 754: 730: 721: 720: 709: 703: 702: 691: 685: 684: 673: 667: 666: 633: 627: 626: 624: 601: 595: 594: 584: 560: 554: 553: 542: 434: 430: 426: 425: 424:'Accord' 422: 419: 416: 413: 410: 407: 404: 401: 398: 395: 392: 389: 386: 383: 380: 377: 374: 371: 357:selection factor 260: 256: 252: 248: 244: 240: 236: 218: 209: 85: 78: 74: 71: 65: 60:this article by 51:inline citations 38: 37: 30: 21: 1357: 1356: 1352: 1351: 1350: 1348: 1347: 1346: 1322: 1321: 1320: 1315: 1277: 1223:Database models 1211: 1180: 1166:Query optimizer 1141:Data dictionary 1124: 1095:Transaction log 1051: 1007:Codd's 12 rules 980: 910:Column-oriented 876:Object-oriented 864: 859: 829: 828: 807: 803: 762: 758: 731: 724: 711: 710: 706: 693: 692: 688: 675: 674: 670: 663: 637:Selinger, P. G. 634: 630: 602: 598: 561: 557: 544: 543: 539: 534: 521:Query rewriting 512: 488: 471:cloud computing 467: 454: 445: 432: 428: 423: 420: 417: 414: 411: 408: 406:'Honda' 405: 402: 399: 396: 393: 390: 387: 384: 381: 378: 375: 372: 369: 348: 346:Cost estimation 319: 264: 263: 262: 261: 258: 254: 250: 246: 242: 238: 234: 232: 221: 220: 219: 211: 210: 199: 182: 154:data dictionary 150:CPU path length 141: 113:query optimizer 109:graph databases 86: 75: 69: 66: 56:Please help to 55: 39: 35: 28: 23: 22: 18:Query optimizer 15: 12: 11: 5: 1355: 1345: 1344: 1339: 1334: 1317: 1316: 1314: 1313: 1303: 1293: 1282: 1279: 1278: 1276: 1275: 1270: 1265: 1260: 1255: 1250: 1245: 1240: 1235: 1230: 1225: 1219: 1217: 1216:Related topics 1213: 1212: 1210: 1209: 1204: 1199: 1194: 1192:Administration 1188: 1186: 1182: 1181: 1179: 1178: 1173: 1168: 1163: 1161:Query language 1158: 1153: 1148: 1143: 1138: 1132: 1130: 1126: 1125: 1123: 1122: 1117: 1112: 1107: 1102: 1097: 1092: 1087: 1082: 1081: 1080: 1075: 1070: 1059: 1057: 1053: 1052: 1050: 1049: 1044: 1039: 1034: 1029: 1024: 1019: 1014: 1009: 1004: 999: 994: 988: 986: 982: 981: 979: 978: 973: 968: 967: 966: 956: 955: 954: 944: 939: 934: 929: 924: 919: 918: 917: 907: 902: 901: 900: 895: 885: 884: 883: 872: 870: 866: 865: 858: 857: 850: 843: 835: 827: 826: 801: 772:(2): 132–151. 756: 722: 717:www.sqlite.org 704: 686: 668: 661: 628: 596: 575:(1): 121–123. 555: 536: 535: 533: 530: 529: 528: 526:Sargable query 523: 518: 511: 508: 487: 484: 466: 463: 453: 450: 444: 441: 429:model='Accord' 347: 344: 318: 315: 311: 310: 307: 300: 231:triangle query 230: 223: 222: 213: 212: 204: 203: 202: 201: 200: 198: 195: 181: 180:Implementation 178: 140: 137: 88: 87: 42: 40: 33: 26: 9: 6: 4: 3: 2: 1354: 1343: 1340: 1338: 1335: 1333: 1330: 1329: 1327: 1312: 1304: 1302: 1294: 1292: 1284: 1283: 1280: 1274: 1271: 1269: 1266: 1264: 1261: 1259: 1256: 1254: 1251: 1249: 1246: 1244: 1241: 1239: 1236: 1234: 1231: 1229: 1226: 1224: 1221: 1220: 1218: 1214: 1208: 1205: 1203: 1200: 1198: 1195: 1193: 1190: 1189: 1187: 1183: 1177: 1174: 1172: 1169: 1167: 1164: 1162: 1159: 1157: 1154: 1152: 1149: 1147: 1144: 1142: 1139: 1137: 1134: 1133: 1131: 1127: 1121: 1118: 1116: 1113: 1111: 1108: 1106: 1103: 1101: 1098: 1096: 1093: 1091: 1088: 1086: 1083: 1079: 1076: 1074: 1071: 1069: 1066: 1065: 1064: 1061: 1060: 1058: 1054: 1048: 1045: 1043: 1042:Surrogate key 1040: 1038: 1035: 1033: 1030: 1028: 1027:Candidate key 1025: 1023: 1020: 1018: 1015: 1013: 1010: 1008: 1005: 1003: 1000: 998: 995: 993: 990: 989: 987: 983: 977: 974: 972: 969: 965: 962: 961: 960: 957: 953: 950: 949: 948: 945: 943: 940: 938: 935: 933: 930: 928: 925: 923: 920: 916: 913: 912: 911: 908: 906: 903: 899: 896: 894: 891: 890: 889: 886: 882: 879: 878: 877: 874: 873: 871: 867: 863: 856: 851: 849: 844: 842: 837: 836: 833: 821: 816: 812: 805: 797: 793: 789: 785: 780: 779:10.1.1.33.696 775: 771: 767: 760: 752: 748: 744: 740: 736: 729: 727: 718: 714: 708: 700: 696: 690: 682: 678: 672: 664: 658: 654: 650: 646: 642: 638: 632: 623: 618: 614: 610: 606: 600: 592: 588: 583: 578: 574: 570: 566: 559: 551: 547: 541: 537: 527: 524: 522: 519: 517: 514: 513: 507: 505: 500: 497: 492: 483: 479: 475: 472: 462: 458: 449: 440: 438: 366: 362: 358: 354: 343: 340: 336: 332: 328: 324: 314: 308: 305: 301: 297: 293: 289: 285: 284: 283: 281: 278: 274: 270: 233: 227: 224:Two possible 217: 208: 197:Join ordering 194: 192: 187: 177: 175: 171: 167: 163: 159: 155: 151: 146: 136: 133: 128: 126: 120: 118: 114: 110: 106: 102: 98: 94: 84: 81: 73: 63: 59: 53: 52: 46: 41: 32: 31: 19: 1196: 1165: 810: 804: 769: 765: 759: 742: 738: 716: 707: 699:volisoft.org 698: 689: 680: 671: 644: 631: 612: 599: 572: 568: 558: 549: 540: 501: 493: 489: 480: 476: 468: 459: 455: 446: 433:make='Honda' 365:conjunctions 349: 320: 312: 271:order via a 265: 229: 183: 166:product join 142: 129: 121: 112: 92: 91: 76: 67: 48: 1311:WikiProject 1202:Replication 1090:Transaction 1032:Foreign key 1012:CAP theorem 959:Multi-model 745:: 221–232. 550:www.ibm.com 353:cardinality 226:query plans 117:query plans 62:introducing 1326:Categories 1176:Query plan 1129:Components 1047:Unique key 964:comparison 898:comparison 888:Relational 881:comparison 662:089791001X 532:References 443:Extensions 361:histograms 335:not exists 191:index scan 158:merge join 145:query plan 70:March 2013 45:references 1185:Functions 1120:Partition 947:In-memory 905:Key–value 820:1404.0046 774:CiteSeerX 339:flattened 296:predicate 162:hash join 1291:Category 1207:Sharding 1063:Relation 1037:Superkey 992:Database 985:Concepts 607:(1998). 591:47190708 510:See also 504:Sargable 431:implies 327:group by 288:relation 280:System R 228:for the 99:of many 1301:Outline 1100:Trigger 1056:Objects 796:3060505 97:feature 58:improve 1115:Cursor 1073:column 942:NewSQL 794:  776:  659:  589:  370:select 333:, and 331:exists 299:order. 111:. The 47:, but 1105:Index 1068:table 971:Cloud 937:NoSQL 932:Graph 869:Types 815:arXiv 792:S2CID 587:S2CID 418:model 391:where 373:count 292:index 277:IBM's 125:hints 105:NoSQL 95:is a 1156:ODBC 1146:JDBC 1085:View 1022:Null 1017:CRUD 997:ACID 952:list 915:list 893:list 766:VLDB 739:VLDB 657:ISBN 400:make 385:from 304:DBMS 269:join 253:and 241:and 186:tree 107:and 1342:SQL 1151:XQJ 1078:row 784:doi 747:doi 649:doi 617:doi 577:doi 409:and 323:SPJ 170:SQL 1328:: 790:. 782:. 768:. 743:45 741:. 737:. 725:^ 715:. 697:. 679:. 655:. 611:. 585:. 573:28 571:. 567:. 548:. 506:. 329:, 164:, 160:, 127:. 119:. 854:e 847:t 840:v 823:. 817:: 798:. 786:: 770:6 753:. 749:: 719:. 701:. 683:. 665:. 651:: 625:. 619:: 593:. 579:: 552:. 421:= 415:. 412:R 403:= 397:. 394:R 388:R 382:) 379:* 376:( 259:T 255:S 251:R 247:R 243:T 239:S 83:) 77:( 72:) 68:( 54:. 20:)

Index

Query optimizer
references
inline citations
improve
introducing
Learn how and when to remove this message
feature
relational database management systems
NoSQL
graph databases
query plans
hints
Social Security number
query plan
CPU path length
data dictionary
merge join
hash join
product join
SQL
relational algebra
tree
index scan
A query plan for the triangle query R(A, B) ⋈ S(B, C) ⋈ T(A, C) that uses binary joins. It joins S and T first, then joins the result with R.
A query plan for the triangle query R(A, B) ⋈ S(B, C) ⋈ T(A, C) that uses binary joins. It joins R and S first, then joins the result with T.
query plans
join
dynamic programming
IBM's
System R

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.