135:
structures are complex, in most cases, and especially for not-very-simple queries, the needed data for a query can be collected from a database by accessing it in different ways, through different data-structures, and in different orders. Each different way typically requires different processing time. Processing times of the same query may have large variance, from a fraction of a second to hours, depending on the chosen method. The purpose of query optimization, which is an automated process, is to find the way to process a given query in minimum time. The large possible variance in time justifies performing query optimization, though finding the exact optimal query plan, among all possibilities, is typically very complex, time-consuming by itself, may be too costly, and often practically impossible. Thus query optimization typically tries to approximate the optimum by comparing several common-sense alternatives to provide in a reasonable time a "good enough" plan which typically does not deviate much from the best possible result.
482:
but must be to find a query plan that realizes the best compromise between different cost metrics. What the best compromise is depends on user preferences (e.g., some users might prefer a cheaper plan while others prefer a faster plan in a cloud scenario). The goal of optimization is therefore either to find the best query plan based on some specification of user preferences provided as input to the optimizer (e.g., users can define weights between different cost metrics to express relative importance or define hard cost bounds on certain metrics) or to generate an approximation of the set of Pareto-optimal query plans (i.e., plans such that no other plan has better cost according to all metrics) such that the user can select the preferred cost tradeoff out of that plan set.
216:
207:
36:
1287:
1297:
1307:
481:
Different cost metrics might conflict with each other (e.g., there might be one plan with minimal execution time and a different plan with minimal monetary execution fees in a cloud computing scenario). Therefore, the goal of optimization cannot be to find a query plan that minimizes all cost metrics
473:
scenario for instance, one should compare query plans not only in terms of how much time they take to execute but also in terms of how much money their execution costs. Or in the context of approximate query optimization, it is possible to execute query plans on randomly selected samples of the input
134:
123-45-6789," or more complex like "find the average salary of all the employed married men in
California between the ages 30 to 39 who earn less than their spouses." The result of a query is generated by processing the rows in a database in a way that yields the requested information. Since database
490:
Multi-objective parametric query optimization generalizes parametric and multi-objective query optimization. Plans are compared according to multiple cost metrics and plan costs may depend on parameters whose values are unknown at optimization time. The cost of a query plan is therefore modeled as a
477:
Multi-objective query optimization models the cost of a query plan as a cost vector where each vector component represents cost according to a different cost metric. Classical query optimization can be considered as a special case of multi-objective query optimization where the dimension of the cost
147:
and the quality of the choice; the optimizer may not choose the best answer on its own. Different qualities of database management systems have different ways of balancing these two. Cost-based query optimizers evaluate the resource footprint of various query plans and use this as the basis for plan
460:
The goal of optimization is usually to generate all query plans that could be optimal for any of the possible parameter value combinations. This yields a set of relevant query plans. At run time, the best plan is selected out of that set once the true parameter values become known. The advantage of
456:
Classical query optimization associates each query plan with one scalar cost value. Parametric query optimization assumes that query plan cost depends on parameters whose values are unknown at optimization time. Such parameters can for instance represent the selectivity of query predicates that are
188:
of "plan nodes". A plan node encapsulates a single operation that is required to execute the query. The nodes are arranged as a tree, in which intermediate results flow from the bottom of the tree to the top. Each node has zero or more child nodes—those are nodes whose output is fed as input to the
447:
Classical query optimization assumes that query plans are compared according to one single cost metric, usually execution time, and that the cost of each query plan can be calculated without uncertainty. Both assumptions are sometimes violated in practice and multiple extensions of classical query
266:
The performance of a query plan is determined largely by the order in which the tables are joined. For example, when joining 3 tables A, B, C of size 10 rows, 10,000 rows, and 1,000,000 rows, respectively, a query plan that joins B and C first can take several orders-of-magnitude more time to
341:
into a select-project-join query, but not always. Query plans for nested SQL queries can also be chosen using the same dynamic programming algorithm as used for join ordering, but this can lead to an enormous escalation in query optimization time. So some database management systems use an
189:
parent node. For example, a join node will have two child nodes, which represent the two join operands, whereas a sort node would have a single child node (the input to be sorted). The leaves of the tree are nodes which produce results by scanning the disk, for example by performing an
122:
Generally, the query optimizer cannot be accessed directly by users: once queries are submitted to the database server, and parsed by the parser, they are then passed to the query optimizer where optimization occurs. However, some database engines allow guiding the query optimizer with
498:
to show which operations have the highest processing cost. Microsoft SMS, ApexSQLPlan, Hana, and
Tableau are some examples. Fixing these issues found in these plans can shave tens of percent execution time, and in some cases can cut two-dimensional searches to linear ones.
457:
not fully specified at optimization time but will be provided at execution time. Parametric query optimization therefore associates each query plan with a cost function that maps from a multi-dimensional parameter space to a one-dimensional cost space.
474:
data in order to obtain approximate results with reduced execution overhead. In such cases, alternative query plans must be compared in terms of their execution time but also in terms of the precision or reliability of the data they generate.
491:
function from a multi-dimensional parameter space to a multi-dimensional cost space. The goal of optimization is to generate the set of query plans that can be optimal for each possible combination of parameter values and user preferences.
148:
selection. These assign an estimated "cost" to each possible query plan, and choose the plan with the smallest cost. Costs are used to estimate the runtime cost of evaluating the query, in terms of the number of I/O operations required,
350:
One of the hardest problems in query optimization is to accurately estimate the costs of alternative query plans. Optimizers cost query plans using a mathematical model of query execution costs that relies heavily on estimates of the
435:), and it is very hard to estimate the selectivity of the conjunct in general. Poor cardinality estimates and uncaught correlation are one of the main reasons why query optimizers pick poor query plans. This is one reason why a
448:
optimization have been studied in the research literature that overcome those limitations. Those extended problem variants differ in how they model the cost of single query plans and in terms of their optimization goal.
298:
in the query, an index scan can also be used. For each relation, the optimizer records the cheapest way to scan the relation, as well as the cheapest way to scan the relation that produces records in a particular sorted
156:. The set of query plans examined is formed by examining the possible access paths (e.g., primary index access, secondary index access, full file scan) and various relational table join techniques (e.g.,
313:
Sort order can avoid a redundant sort operation later on in processing the query. Second, a particular sort order can speed up a subsequent join because it clusters the data in a particular way.
306:. It will preserve the cheapest way to join each pair of relations, in addition to the cheapest way to join each pair of relations that produces its output according to a particular sort order.
302:
The optimizer then considers combining each pair of relations for which a join condition exists. For each pair, the optimizer will consider the available join algorithms implemented by the
359:
of predicates in the query. Traditionally, database systems estimate selectivities through fairly detailed statistics on the distribution of values in each column, such as
330:
322:
309:
Then all three-relation query plans are computed, by joining each two-relation plan produced by the previous phase with the remaining relations in the query.
897:
152:, amount of disk buffer space, disk storage service time, and interconnect usage between units of parallelism, and other factors determined from the
880:
545:
892:
165:
334:
321:
A SQL query to a modern relational DBMS does more than just selections and joins. In particular, SQL queries often nest several layers of
338:
502:
One of the primary and simplest optimization checklists is to use operations that most RDMS are designed to execute efficiently. See
355:, or number of tuples, flowing through each edge in a query plan. Cardinality estimation in turn depends on estimates of the
100:
461:
parametric query optimization is that optimization (which is in general a very expensive operation) is avoided at run time.
1331:
1290:
963:
852:
79:
57:
50:
295:
176:
to solve the query—and physical optimization—which is used to determine the means of carrying out each operation.
1310:
1016:
363:. This technique works well for estimation of selectivities of individual predicates. However many queries have
17:
469:
There are often other cost metrics in addition to execution time that are relevant to compare query plans. In a
172:
query. There are two types of optimization. These consist of logical optimization—which generates a sequence of
130:
A query is a request for information from a database. It can be as simple as "find the address of a person with
1267:
914:
660:
1206:
608:
290:
in the query are computed. Every relation in the query can be accessed via a sequential scan. If there is an
764:
Ioannidis, Yannis; Ng, Raymond T.; Shim, Kyuseok; Sellis, Timos K. (1997). "Parametric Query
Optimization".
1336:
1201:
1232:
951:
643:; Lorie, R. A.; Price, T. G. (1979). "Access Path Selection in a Relational Database Management System".
1155:
1145:
921:
303:
1242:
975:
44:
778:
115:
attempts to determine the most efficient way to execute a given query by considering the possible
1191:
845:
149:
1272:
1227:
904:
773:
436:
131:
61:
1247:
1001:
515:
356:
439:
should regularly update the database statistics, especially after major data loads/unloads.
1300:
1237:
1119:
1089:
958:
909:
640:
8:
1257:
1150:
1135:
1062:
887:
364:
272:
185:
1252:
1114:
1006:
946:
838:
814:
791:
586:
173:
215:
206:
1072:
926:
656:
636:
604:
590:
1262:
1109:
1099:
1067:
795:
783:
746:
648:
616:
576:
291:
287:
96:
1170:
1140:
1094:
875:
645:
Proceedings of the 1979 ACM SIGMOD International
Conference on Management of Data
520:
470:
153:
1222:
1160:
1104:
1077:
970:
931:
326:
108:
830:
168:). The search space can become quite large depending on the complexity of the
1325:
1041:
1026:
750:
734:
676:
143:
There is a trade-off between the amount of time spent figuring out the best
1341:
279:
267:
execute than one that joins A and C first. Most query optimizers determine
787:
652:
621:
581:
564:
1031:
1011:
352:
1175:
1084:
1046:
1021:
495:
360:
268:
225:
190:
157:
144:
124:
116:
161:
27:
Feature to efficiently execute queries efficiently in DBMS softwares
1036:
991:
861:
525:
503:
819:
712:
613:
Proceedings of the ACM Symposium on
Principles of Database Systems
695:"NoSQL Architect: Data modeling/query optimization for DynamoDB"
485:
941:
635:
342:
alternative rule-based approach that uses a query graph model.
427:. Query predicates are often highly correlated (for example,
936:
104:
996:
811:
Approximation
Schemes for Many-Objective Query Optimization
694:
478:
space (i.e., the number of cost vector components) is one.
609:"An Overview of Query Optimization in Relational Systems"
276:
169:
337:
operators. In some cases such nested SQL queries can be
316:
282:
database project . This algorithm works in two stages:
763:
464:
184:Most query optimizers represent query plans as a
1323:
451:
860:
735:"Multi-Objective Parametric Query Optimization"
846:
486:Multi-objective parametric query optimization
809:Trummer, Immanuel; Koch, Christoph (2014).
808:
733:Trummer, Immanuel; Koch, Christoph (2015).
732:
294:on a relation that can be used to answer a
853:
839:
325:blocks (Select-Project-Join), by means of
818:
777:
620:
603:
580:
562:
138:
80:Learn how and when to remove this message
43:This article includes a list of general
14:
1324:
629:
597:
556:
101:relational database management systems
834:
728:
726:
317:Query planning for nested SQL queries
677:"Oracle SQL cost based optimization"
29:
1306:
24:
723:
465:Multi-objective query optimization
345:
49:it lacks sufficient corresponding
25:
1353:
179:
1305:
1295:
1286:
1285:
563:Ioannidis, Yannis (March 1996).
257:first and joins the result with
245:first and joins the result with
214:
205:
196:
34:
1296:
802:
286:First, all ways to access each
813:. SIGMOD. pp. 1299–1310.
757:
705:
687:
669:
538:
13:
1:
531:
452:Parametric query optimization
442:
103:and other databases such as
7:
1332:Database management systems
862:Database management systems
509:
235:R(A, B) ⋈ S(B, C) ⋈ T(A, C)
10:
1358:
1268:Object–relational database
494:A number of tools display
1281:
1243:Federated database system
1215:
1184:
1128:
1055:
984:
976:Blockchain-based database
868:
368:
751:10.1145/2949741.2949748
275:algorithm pioneered by
64:more precise citations.
1273:Transaction processing
1228:Database normalization
1171:Query rewriting system
546:"IBM Knowledge Center"
437:database administrator
367:of predicates such as
193:or a sequential scan.
139:General considerations
132:Social Security number
1248:Referential integrity
788:10.1007/s007780050037
653:10.1145/582095.582099
622:10.1145/275487.275492
582:10.1145/234313.234367
569:ACM Computing Surveys
516:Join selection factor
496:query execution plans
1238:Distributed database
713:"EXPLAIN QUERY PLAN"
565:"Query optimization"
1337:Database algorithms
1258:Relational calculus
1136:Concurrency control
639:; Astrahan, M. M.;
273:dynamic programming
249:, the second joins
1253:Relational algebra
1197:Query optimization
1002:Armstrong's axioms
681:www.dba-oracle.com
647:. pp. 23–34.
615:. pp. 34–43.
605:Chaudhuri, Surajit
237:; the first joins
174:relational algebra
93:Query optimization
1319:
1318:
927:Wide-column store
922:Document-oriented
641:Chamberlin, D. D.
90:
89:
82:
16:(Redirected from
1349:
1309:
1308:
1299:
1298:
1289:
1288:
1263:Relational model
1233:Database storage
1110:Stored procedure
855:
848:
841:
832:
831:
825:
824:
822:
806:
800:
799:
781:
761:
755:
754:
730:
721:
720:
709:
703:
702:
691:
685:
684:
673:
667:
666:
633:
627:
626:
624:
601:
595:
594:
584:
560:
554:
553:
542:
434:
430:
426:
425:
424:'Accord'
422:
419:
416:
413:
410:
407:
404:
401:
398:
395:
392:
389:
386:
383:
380:
377:
374:
371:
357:selection factor
260:
256:
252:
248:
244:
240:
236:
218:
209:
85:
78:
74:
71:
65:
60:this article by
51:inline citations
38:
37:
30:
21:
1357:
1356:
1352:
1351:
1350:
1348:
1347:
1346:
1322:
1321:
1320:
1315:
1277:
1223:Database models
1211:
1180:
1166:Query optimizer
1141:Data dictionary
1124:
1095:Transaction log
1051:
1007:Codd's 12 rules
980:
910:Column-oriented
876:Object-oriented
864:
859:
829:
828:
807:
803:
762:
758:
731:
724:
711:
710:
706:
693:
692:
688:
675:
674:
670:
663:
637:Selinger, P. G.
634:
630:
602:
598:
561:
557:
544:
543:
539:
534:
521:Query rewriting
512:
488:
471:cloud computing
467:
454:
445:
432:
428:
423:
420:
417:
414:
411:
408:
406:'Honda'
405:
402:
399:
396:
393:
390:
387:
384:
381:
378:
375:
372:
369:
348:
346:Cost estimation
319:
264:
263:
262:
261:
258:
254:
250:
246:
242:
238:
234:
232:
221:
220:
219:
211:
210:
199:
182:
154:data dictionary
150:CPU path length
141:
113:query optimizer
109:graph databases
86:
75:
69:
66:
56:Please help to
55:
39:
35:
28:
23:
22:
18:Query optimizer
15:
12:
11:
5:
1355:
1345:
1344:
1339:
1334:
1317:
1316:
1314:
1313:
1303:
1293:
1282:
1279:
1278:
1276:
1275:
1270:
1265:
1260:
1255:
1250:
1245:
1240:
1235:
1230:
1225:
1219:
1217:
1216:Related topics
1213:
1212:
1210:
1209:
1204:
1199:
1194:
1192:Administration
1188:
1186:
1182:
1181:
1179:
1178:
1173:
1168:
1163:
1161:Query language
1158:
1153:
1148:
1143:
1138:
1132:
1130:
1126:
1125:
1123:
1122:
1117:
1112:
1107:
1102:
1097:
1092:
1087:
1082:
1081:
1080:
1075:
1070:
1059:
1057:
1053:
1052:
1050:
1049:
1044:
1039:
1034:
1029:
1024:
1019:
1014:
1009:
1004:
999:
994:
988:
986:
982:
981:
979:
978:
973:
968:
967:
966:
956:
955:
954:
944:
939:
934:
929:
924:
919:
918:
917:
907:
902:
901:
900:
895:
885:
884:
883:
872:
870:
866:
865:
858:
857:
850:
843:
835:
827:
826:
801:
772:(2): 132–151.
756:
722:
717:www.sqlite.org
704:
686:
668:
661:
628:
596:
575:(1): 121–123.
555:
536:
535:
533:
530:
529:
528:
526:Sargable query
523:
518:
511:
508:
487:
484:
466:
463:
453:
450:
444:
441:
429:model='Accord'
347:
344:
318:
315:
311:
310:
307:
300:
231:triangle query
230:
223:
222:
213:
212:
204:
203:
202:
201:
200:
198:
195:
181:
180:Implementation
178:
140:
137:
88:
87:
42:
40:
33:
26:
9:
6:
4:
3:
2:
1354:
1343:
1340:
1338:
1335:
1333:
1330:
1329:
1327:
1312:
1304:
1302:
1294:
1292:
1284:
1283:
1280:
1274:
1271:
1269:
1266:
1264:
1261:
1259:
1256:
1254:
1251:
1249:
1246:
1244:
1241:
1239:
1236:
1234:
1231:
1229:
1226:
1224:
1221:
1220:
1218:
1214:
1208:
1205:
1203:
1200:
1198:
1195:
1193:
1190:
1189:
1187:
1183:
1177:
1174:
1172:
1169:
1167:
1164:
1162:
1159:
1157:
1154:
1152:
1149:
1147:
1144:
1142:
1139:
1137:
1134:
1133:
1131:
1127:
1121:
1118:
1116:
1113:
1111:
1108:
1106:
1103:
1101:
1098:
1096:
1093:
1091:
1088:
1086:
1083:
1079:
1076:
1074:
1071:
1069:
1066:
1065:
1064:
1061:
1060:
1058:
1054:
1048:
1045:
1043:
1042:Surrogate key
1040:
1038:
1035:
1033:
1030:
1028:
1027:Candidate key
1025:
1023:
1020:
1018:
1015:
1013:
1010:
1008:
1005:
1003:
1000:
998:
995:
993:
990:
989:
987:
983:
977:
974:
972:
969:
965:
962:
961:
960:
957:
953:
950:
949:
948:
945:
943:
940:
938:
935:
933:
930:
928:
925:
923:
920:
916:
913:
912:
911:
908:
906:
903:
899:
896:
894:
891:
890:
889:
886:
882:
879:
878:
877:
874:
873:
871:
867:
863:
856:
851:
849:
844:
842:
837:
836:
833:
821:
816:
812:
805:
797:
793:
789:
785:
780:
779:10.1.1.33.696
775:
771:
767:
760:
752:
748:
744:
740:
736:
729:
727:
718:
714:
708:
700:
696:
690:
682:
678:
672:
664:
658:
654:
650:
646:
642:
638:
632:
623:
618:
614:
610:
606:
600:
592:
588:
583:
578:
574:
570:
566:
559:
551:
547:
541:
537:
527:
524:
522:
519:
517:
514:
513:
507:
505:
500:
497:
492:
483:
479:
475:
472:
462:
458:
449:
440:
438:
366:
362:
358:
354:
343:
340:
336:
332:
328:
324:
314:
308:
305:
301:
297:
293:
289:
285:
284:
283:
281:
278:
274:
270:
233:
227:
224:Two possible
217:
208:
197:Join ordering
194:
192:
187:
177:
175:
171:
167:
163:
159:
155:
151:
146:
136:
133:
128:
126:
120:
118:
114:
110:
106:
102:
98:
94:
84:
81:
73:
63:
59:
53:
52:
46:
41:
32:
31:
19:
1196:
1165:
810:
804:
769:
765:
759:
742:
738:
716:
707:
699:volisoft.org
698:
689:
680:
671:
644:
631:
612:
599:
572:
568:
558:
549:
540:
501:
493:
489:
480:
476:
468:
459:
455:
446:
433:make='Honda'
365:conjunctions
349:
320:
312:
271:order via a
265:
229:
183:
166:product join
142:
129:
121:
112:
92:
91:
76:
67:
48:
1311:WikiProject
1202:Replication
1090:Transaction
1032:Foreign key
1012:CAP theorem
959:Multi-model
745:: 221–232.
550:www.ibm.com
353:cardinality
226:query plans
117:query plans
62:introducing
1326:Categories
1176:Query plan
1129:Components
1047:Unique key
964:comparison
898:comparison
888:Relational
881:comparison
662:089791001X
532:References
443:Extensions
361:histograms
335:not exists
191:index scan
158:merge join
145:query plan
70:March 2013
45:references
1185:Functions
1120:Partition
947:In-memory
905:Key–value
820:1404.0046
774:CiteSeerX
339:flattened
296:predicate
162:hash join
1291:Category
1207:Sharding
1063:Relation
1037:Superkey
992:Database
985:Concepts
607:(1998).
591:47190708
510:See also
504:Sargable
431:implies
327:group by
288:relation
280:System R
228:for the
99:of many
1301:Outline
1100:Trigger
1056:Objects
796:3060505
97:feature
58:improve
1115:Cursor
1073:column
942:NewSQL
794:
776:
659:
589:
370:select
333:, and
331:exists
299:order.
111:. The
47:, but
1105:Index
1068:table
971:Cloud
937:NoSQL
932:Graph
869:Types
815:arXiv
792:S2CID
587:S2CID
418:model
391:where
373:count
292:index
277:IBM's
125:hints
105:NoSQL
95:is a
1156:ODBC
1146:JDBC
1085:View
1022:Null
1017:CRUD
997:ACID
952:list
915:list
893:list
766:VLDB
739:VLDB
657:ISBN
400:make
385:from
304:DBMS
269:join
253:and
241:and
186:tree
107:and
1342:SQL
1151:XQJ
1078:row
784:doi
747:doi
649:doi
617:doi
577:doi
409:and
323:SPJ
170:SQL
1328::
790:.
782:.
768:.
743:45
741:.
737:.
725:^
715:.
697:.
679:.
655:.
611:.
585:.
573:28
571:.
567:.
548:.
506:.
329:,
164:,
160:,
127:.
119:.
854:e
847:t
840:v
823:.
817::
798:.
786::
770:6
753:.
749::
719:.
701:.
683:.
665:.
651::
625:.
619::
593:.
579::
552:.
421:=
415:.
412:R
403:=
397:.
394:R
388:R
382:)
379:*
376:(
259:T
255:S
251:R
247:R
243:T
239:S
83:)
77:(
72:)
68:(
54:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.