Input enhancement (computer science)

643:. If the algorithm encounters a pair of characters that do not match, it accesses the table for the character's shift value and shifts accordingly. Horspool's algorithm takes the key (K) and the text (M) and outputs either the index of the matching substring or the string “Key not found” depending on the result. Pseudocode for Horspool's algorithm is presented below: 776:

Implementing a preprocessor is the concept in which a program takes an input and processes it into an output to be used by another program entirely. This sounds like input enhancement, but the application of preprocessor applies to the generic program that processes the source input to be outputted in a format that a compiler can read and can then be compiled.

352:, often called the text. If a matched character occurs, it checks the second character of the key to see if it matches. If it does, the next character is checked and so on until the string matches or the subsequent character doesn't match and the entire key shifts a single character. This continues until the key is found or until the text is exhausted. 483:-1 characters in the key. The algorithm for the shift table generator is given the key and an alphabet of possible characters that could appear in the string (K) as input and returns the shift table (T). Pseudocode for the shift table generator and an example of the shift table for the string ‘POTATO’ is displayed below: 58:

algorithms is measured by the slowest component, the addition of the sorting component is negligible if the search is less efficient. Unfortunately, presorting is usually the slowest component of the algorithm. Contrasting, a searching algorithm without a presort is almost always slower than that with a presort.

771:

Precomputing and input enhancement can sometimes be used synonymously. More specifically, precomputation is the calculation of a given input before anything else is done to the input. Oftentimes a table is generated to be looked back on during the actual execution of the algorithm. Input enhancement

378:

Because of the necessity of more efficient string matching algorithms, several faster algorithms have been developed, with most of them utilizing the idea of input enhancement. The key is preprocessed to gather information about what to look for in the text and that information is stored in order to

336:

are the forefront of the internet and the online world. When given a keyword or a string that needs to be searched among millions upon millions of words, it would take an unbelievable amount of time to match this string character per character. Input enhancement allows an input to be altered to make

301:

The benefits of putting data in a tree are great, especially if the data is being manipulated or repeatedly searched through. Binary search trees are the most simplest, yet most common type of tree for this implementation. The insertion, deletion, and searching of items in a tree are all worst case

474:

This may seem like it is not more efficient than the brute-force algorithm since it has to check all of the characters on every check. However, this is not the case. Horspool's algorithm utilizes a shift table to store the number of characters the algorithm should shift if it runs into a specific

775:

When speaking about altering inputs, preprocessing is often misused. In computer science, a preprocessor and preprocessing are entirely different. When preprocessing is used in context, the usual intention is to portray the concept of input enhancement, and not that of utilizing a preprocessor.

57:

Presorting is the technique of sorting an input before attempting to search it. Because the addition of a sorting component to an algorithm is added to the runtime of the searching algorithm, and not multiplied, it only competes for the slowest portion of the algorithm. Since the efficiency of

187:

Without a presort, at worst case, this algorithm would require every element to be checked against every other element with two possible outcomes: either there is no duplicate element in the array, or the last two elements in the array are the duplicates. This results in an

61:

The sorting portion of the algorithm processes the input of the problem before the searching portion of the algorithm is even reached. Having the elements of the input sorted in some sort of order makes the search trivial in practice. The simplest sorting algorithms –

310:). This makes the repeated searching of elements even quicker for large inputs. There are many different types of binary search trees that work more efficiently and even self-balance upon addition and removal of items, like the AVL tree which has a worst case O(log 44:

Input enhancement when searching has been an essential component of the algorithm world for some time in computer science. The main idea behind this principle is that the efficiency of a search is much faster when the time is taken to create or sort a

772:

that calculates values and assigns them to elements of the input can be classified as precomputation, but the similarities stop there. There are sections of input enhancement that do not utilize precomputing and the terms should not be mutually used.

475:

character. The input is precomputed into a table with every possible character that can be encountered in the text. The shift size is computed with two options: one, if the character is in not in the key, then the shift size is

297:

to name just a small few – have been developed to properly store, access, and manipulate data while maintaining their structure. Trees are a principal data structure for dictionary implementation.

32:, or both. The altered input is usually stored and accessed to simplify the problem. By exploiting the structure and properties of the inputs, input enhancement creates various speed-ups in the efficiency of the 195:

Now compare this to a similar algorithm that utilizes presorting. This algorithm sorts the inputted array, and then checks each pair of elements for a duplicate. The pseudocode is presented below:

277:

Creating data structures to more efficiently search through data is also a form of input enhancement. Placing data into a tree to store and search through inputs is another popular technique.

317:

Taking the time to put the inputted data into such a structure will have great speed-ups for repeated searching of elements, as opposed to searching through the data that hasn't enhanced.

379:

refer back to them when necessary. The accessing of this information is constant time and greatly increases the runtime efficiency of the algorithms that use it, most famously the

479:, the length of the key; or two, if the character appears in the key, then its shift value is the distance from the rightmost occurrence of the character in the first 631:

After the shift table is constructed in the input enhancement stage, the algorithm lines up the key and starts executing. The algorithm executes until a matching

367:). On average case, the maximum number of check trials would never be reached and only a few would be executed, resulting in an average time efficiency of O( 238:

As previously stated, the least efficient part of this algorithm is the sorting of the array, which, if an efficient sort is selected, would run in O(

387:. These algorithms, for the most part, use the same methods to obtain its efficiency with the main difference being on how the key is composed. 751:). This places Horspool's algorithm, which utilizes input enhancement, in a much faster class than the brute-force algorithm for this problem. 117:

A simple example of the benefits of presorting can be seen with an algorithm that checks an array for unique elements: If an array of

395:

As a demonstration of input enhancement in string matching, one should examine a simplified version of the Boyer-Moore algorithm,

384: 261:

This simple example demonstrates what is capable with an input enhancement technique such as presorting. The algorithm went from

396: 809: 795: 380: 348:

characters, often called the key or pattern, the string would be compared to every single character of a longer string

467:

does occur again in the key. If this occurs, the key is shifted to align the rightmost occurrence if the character

824: 24:

is the principle that processing a given input to a problem and altering it in a specific way will increase

110:). Utilizing these sorting algorithms, a search algorithm that incorporates presorting will yield these 453:

doesn't occur again in the key. If this occurs, the entire key can be shifted the length of the key.

435:

is in the key. If this occurs, the key is shifted to align the rightmost occurrence of the character

121:

elements is given, return true if every element in the array is unique, otherwise return false. The

266: 449:

matches with the last character in the key but the other characters don't fully match the key and

246:). But after the array is sorted, the array only needs to be traversed once, which would run in O( 262: 278: 329: 8: 767:. Although they are related, there are several important differences that must be noted. 743:

Although it may not be evident, the worst case runtime efficiency of this algorithm is O(

281:

are used throughout computer science and many different types of trees –

421:

is not in the key. If this occurs, the entire key can be shifted the length of the key.

764: 341: 290: 282: 49:

of the given input before attempting to search for the element in said data structure.

805: 791: 355:

This algorithm is extremely inefficient. The maximum number of check trials would be

344:

algorithm for this problem would perform as follows: When presented with a string of

333: 17: 325: 25: 760: 111: 67: 63: 46: 747:). Fortunately, on texts that are random, the runtime efficiency is linear, O( 818: 294: 71: 463:

matches the key but the other characters don't fully match the key and

122: 83: 632: 95: 33: 286: 79: 29: 639:

is found or the key overlaps the last characters of text

269:

runtime which will result in speed-ups for large inputs.

788:

Introduction to The Design & Analysis of Algorithms

363:+1 trials, making the big-O efficiency at worst case O( 459:: The fourth and last possible case is that character 411:. There are 4 possible cases of what can happen next. 407:

and compares the character. Let's call this character

759:

Input enhancement is often used interchangeably with

427:: The second possible case is that the character 816: 445:: The third possible case is that the character 417:: The first possible case is that the character 78:), while the more advanced sorting algorithms – 314:) for all searching, inserting, and deletion. 306:), but are most often executed in O(log 399:algorithm. The algorithm starts at the 390: 86:– which have a worst case runtime of O( 817: 74:– all have a worst case runtime of O( 754: 328:is a complex issue in the world of 13: 431:is not the current character, but 320: 14: 836: 802:Concepts of Programming Languages 337:this process that much faster. 201:presortUniqueElementSearch(A) 98:– which has a worst case of O( 1: 780: 52: 381:Knuth–Morris–Pratt algorithm 39: 7: 800:Sebesta, Robert W. (2012). 653:shiftTableGenerator(K) 272: 10: 841: 804:(Tenth Edition). Pearson. 790:(Third Edition). Pearson. 649:HorspoolsAlgorithm(K), M) 546:Shift Table for 'POTATO' 403:th character of the text 102:) but is almost always O( 250:). This results in an O( 786:Levitin, Anany (2012). 489:shiftTableGenerator(K) 131:uniqueElementSearch(A) 825:Software optimization 385:Boyer–Moore algorithm 391:Horspool's algorithm 125:is presented below: 679: := 0 547: 283:binary search trees 545: 26:runtime efficiency 810:978-0-13-139531-2 796:978-0-13-231681-1 629: 628: 22:input enhancement 832: 755:Related concepts 740:“Key not found” 725:+ 1 701:+ 1 548: 544: 30:space efficiency 18:computer science 840: 839: 835: 834: 833: 831: 830: 829: 815: 814: 783: 757: 741: 543: 393: 326:String matching 323: 321:String matching 291:red–black trees 275: 236: 185: 55: 42: 12: 11: 5: 838: 828: 827: 813: 812: 798: 782: 779: 778: 777: 773: 761:precomputation 756: 753: 690:– 1 and K = M 645: 627: 626: 623: 620: 617: 614: 611: 608: 605: 602: 599: 596: 593: 589: 588: 585: 582: 579: 576: 573: 570: 567: 564: 561: 558: 555: 485: 392: 389: 334:search engines 322: 319: 274: 271: 258:) efficiency. 197: 192:) efficiency. 127: 114:efficiencies. 68:selection sort 64:insertion sort 54: 51: 47:data structure 41: 38: 9: 6: 4: 3: 2: 837: 826: 823: 822: 820: 811: 807: 803: 799: 797: 793: 789: 785: 784: 774: 770: 769: 768: 766: 765:preprocessing 762: 752: 750: 746: 739: 735: 731: 728: 724: 720: 717: 714: 711: 707: 704: 700: 696: 693: 689: 685: 682: 678: 675: 671: 667: 664: 660: 656: 652: 648: 644: 642: 638: 634: 624: 621: 618: 615: 612: 609: 606: 603: 600: 597: 594: 591: 590: 586: 583: 580: 577: 574: 571: 568: 565: 562: 559: 556: 554: 550: 549: 541: 538: 534: 530: 526: 523: 519: 516: 513: 509: 505: 502: 498: 495: 492: 488: 484: 482: 478: 472: 470: 466: 462: 458: 454: 452: 448: 444: 440: 438: 434: 430: 426: 422: 420: 416: 412: 410: 406: 402: 398: 388: 386: 382: 376: 374: 370: 366: 362: 358: 353: 351: 347: 343: 338: 335: 331: 327: 318: 315: 313: 309: 305: 299: 296: 292: 288: 284: 280: 270: 268: 264: 259: 257: 253: 249: 245: 241: 235: 232: 229: 225: 222: 218: 215: 211: 208: 204: 200: 196: 193: 191: 184: 181: 178: 174: 171: 168: 165: 161: 157: 154: 151: 147: 144: 140: 137: 134: 130: 126: 124: 120: 115: 113: 109: 105: 101: 97: 93: 89: 85: 81: 77: 73: 69: 65: 59: 50: 48: 37: 35: 31: 27: 23: 19: 801: 787: 758: 748: 744: 742: 737: 733: 729: 726: 722: 718: 715: 712: 709: 705: 702: 698: 694: 691: 687: 683: 680: 676: 673: 669: 665: 662: 658: 654: 650: 646: 640: 636: 630: 592:shift value 552: 539: 536: 532: 528: 524: 521: 517: 514: 511: 507: 503: 500: 496: 493: 490: 486: 480: 476: 473: 468: 464: 460: 456: 455: 450: 446: 442: 441: 436: 432: 428: 424: 423: 418: 414: 413: 408: 404: 400: 394: 377: 372: 368: 364: 360: 356: 354: 349: 345: 339: 324: 316: 311: 307: 303: 300: 276: 267:linearithmic 260: 255: 251: 247: 243: 239: 237: 233: 231:return false 230: 227: 223: 220: 216: 213: 209: 206: 205:sort(A) 202: 198: 194: 189: 186: 182: 180:return false 179: 176: 172: 169: 166: 163: 159: 155: 152: 149: 145: 142: 138: 135: 132: 128: 118: 116: 107: 103: 99: 91: 87: 75: 60: 56: 43: 21: 15: 531:T] := 520: := 0 342:brute-force 330:programming 265:runtime to 234:return true 212: := 0 183:return true 141: := 0 72:bubble sort 781:References 551:character 510:T := 397:Horspool's 123:pseudocode 84:merge sort 53:Presorting 736:+ T] 697: := 657: := 647:algorithm 633:substring 487:algorithm 332:now that 295:2–3 trees 287:AVL trees 263:quadratic 199:algorithm 158: := 129:algorithm 96:quicksort 40:Searching 34:algorithm 819:Category 661:– 1 635:of text 383:and the 273:In trees 94:) – and 80:heapsort 808: 794: 738:return 716:return 540:return 535:– 1 – 457:Case 4 443:Case 3 425:Case 2 415:Case 1 293:, and 226:A = A 175:A = A 70:, and 681:while 663:while 279:Trees 112:big-O 806:ISBN 792:ISBN 763:and 727:else 713:then 672:– 1 527:– 2 506:– 1 499:= 0 340:The 254:log 242:log 228:then 219:– 1 177:then 162:+ 1 148:– 1 106:log 90:log 749:n/m 581:... 575:... 566:... 515:for 494:for 375:). 207:for 153:for 136:for 28:or 16:In 821:: 745:mn 732:= 721:– 708:= 703:if 692:do 686:≤ 674:do 668:≤ 651:is 625:6 587:_ 542:T 529:do 522:to 508:do 501:to 491:is 471:. 439:. 365:mn 302:O( 289:, 285:, 224:if 221:do 214:to 203:is 188:O( 173:if 170:do 164:to 150:do 143:to 133:is 82:, 66:, 36:. 20:, 734:i 730:i 723:n 719:i 710:m 706:k 699:k 695:k 688:m 684:k 677:k 670:m 666:i 659:n 655:i 641:m 637:m 622:6 619:6 616:1 613:6 610:5 607:4 604:6 601:6 598:6 595:2 584:Z 578:T 572:P 569:O 563:C 560:B 557:A 553:x 537:j 533:n 525:n 518:j 512:m 504:s 497:i 481:n 477:n 469:x 465:x 461:x 451:x 447:x 437:x 433:x 429:x 419:x 409:x 405:m 401:n 373:n 371:+ 369:m 361:n 359:- 357:m 350:m 346:n 312:n 308:n 304:n 256:n 252:n 248:n 244:n 240:n 217:n 210:i 190:n 167:n 160:i 156:j 146:n 139:i 119:n 108:n 104:n 100:n 92:n 88:n 76:n

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Knowledge

Input enhancement (computer science)

Index