Berkeley RISC - Knowledge

191:, allowing small constants to be folded directly into the instruction instead of having to be loaded separately. Additionally, the results of one operation are often used soon after by another, so by skipping the write to memory and storing the result in a register, the program did not end up much larger, and could in theory run much faster. For instance, a string of instructions carrying out a series of mathematical operations might require only a few loads from memory, while the majority of the numbers being used would be either constants in the instructions, or intermediate values left in the registers from prior calculations. In a sense, in this technique some registers are used to 242:

levels deep before the register windowing mechanism reaches its limit; once the last window is reached, no new window can be set up for another nested call. And if procedures are only nested a few levels deep, registers in the windows above the deepest call nesting level can never be accessed at all, so these are completely wasted. It was Stanford's work on compilers that led them to ignore the register window concept, believing that an efficient compiler could make better use of the registers than a fixed system in hardware. (The same reasoning would apply for a smart assembly language programmer.)

281:(ISCA) in 1981. It had 44,500 transistors implementing 31 instructions and a register file containing 78 32-bit registers. This allowed for six register windows containing 14 registers. Of those 14 registers, 4 were overlapped from the prior window. The total is then: 10*6 registers in windows + 18 globals=78 registers total. The control and instruction decode section occupied only 6% of the die, whereas the typical design of the era used about 50% for the same role. The register file took up most of that space. 359: 251: 309:

ran on June 11. In testing, the chips proved to have lesser performance than expected. In general, an instruction would take 2 μs to complete, while the original design allotted for about .4 μs (five times as fast). The precise reasons for this problem were never fully explained. However, throughout testing it was clear that certain instructions did run at the expected speed, suggesting the problem was physical, not logical.

163:, which might take several cycles to access. By providing more registers, and making sure the compilers actually used them, programs should run much faster. Additionally, the speed of the processor would be more closely defined by its clock speed, because less of its time would be spent waiting for memory accesses. Transistor for transistor, a RISC design would outperform a conventional CPU. 262:. Work on the design started in 1980 as part of a VLSI design course, but the then-complicated design crashed almost all existing design tools. The team had to spend considerable amounts of time improving or re-writing the tools, and even with these new tools it took just under an hour to extract the design on a 378:

The savings due to the new design were tremendous. Whereas Gold contained a total of 78 registers in 6 windows, Blue contained 138 registers broken into 8 windows of 16 registers each, with another 10 globals. This expansion of the register file increases the chance that a given procedure can fit all

374:

The key difference was simpler cache circuitry that eliminated one line per bit (from three to two), dramatically shrinking the register file size. The change also required much tighter bus timing, but this was a small price to pay and in order to meet the needs several other parts of the design were

370:

design. Work on Blue progressed slower than Gold, due both to the lack of a pressing need now that Gold was going to fab, and to changeovers in the classes and students staffing the effort. This pace also allowed them to add in several new features that would end up improving the design considerably.

347:

UC Berkeley students designed and built the first VLSI reduced instruction set computer in 1981. The simplified instructions of RISC-I reduced the hardware for instruction decode and control, which enabled a flat 32-bit address space, a large set of registers, and pipelined execution. A good match to

229:

for each procedure which contains the address from which the procedure was called, the data (parameters) that were passed in, and space for any result values that need to be returned. In the vast majority of cases these frames are small, typically with three or fewer inputs and one or no outputs (and

233:

In this case, the call into and return from a procedure is simple and extremely fast. A single instruction is called to set up a new block of registers—a new register window—and then, with operands passed into the procedure in the "low end" of the new window, the program jumps into the procedure. On

520:

and other modern techniques which are applicable regardless of instruction architecture. The amount of silicon dedicated to instruction decoding on a modern x86 implementation is proportionately quite small, so the distinction between "complex" and RISC processor implementations has become blurred.

308:

service for production on June 22, 1981, using a 2 μm (2,000 nm) process. A variety of delays forced them to abandon their masks four separate times, and wafers with working examples did not arrive back at Berkeley until May 1982. The first working RISC I "computer" (actually a checkout board)

498:

Sun Microsystems introduced the Scalable Processor Architecture (SPARC) RISC in 1987. Building on UC Berkeley RISC and Sun compiler and operating system developments, SPARC architecture was highly adaptable to evolving semiconductor, software, and system technology and user needs. The architecture

334:

It is important to put this performance in context. Even though the RISC design had run slower than the VAX, it made no difference to the importance of the design. RISC allowed for the production of a true 32-bit processor on a real chip die using what was already an older fab. Traditional designs

241:

On the downside, this approach means that procedures with large numbers of local variables are problematic, and ones with fewer lead to registers—an expensive resource—being wasted. There are a finite number of register windows in the design, e.g., eight, so procedures can only be nested that many

151:

If, as these other papers suggested, the majority of these opcodes would never be used in practice, then this significant resource was being wasted. If one were to simply build the same processor with the unused instructions removed it would be smaller and thus less expensive, while if one instead

198:

To the casual observer, it was not clear that the RISC concept would improve performance, and it might even make it worse. The only way to be sure was to simulate it. The results of such simulations were clear; in test after test, every simulation showed an enormous overall benefit in performance

186:

would actually require four instructions (two loads, an add, and a save), the machine would have to do much more memory access to read the extra instructions, potentially slowing it down considerably. This was offset to some degree by the fact that the new designs used what was then a very large

99:

used only a small subset of all the instructions available. Both of these studies suggested that one could produce a much simpler CPU that would still run most real-world code. Another finding, not fully explored at the time, was Tanenbaum's note that 81% of the constants were either 0, 1, or 2.

170:

instruction of a traditional design would generally come in several flavours, one that added the numbers in two registers and placed it in a third, another that added numbers found in main memory and put the result in a register, etc. The RISC designs, on the other hand, included only a single

330:

of CISC designs was not actually all that impressive in reality. In terms of overall performance, the RISC I was twice as fast as the VAX, and about four times that of the Z8000. The programs ended up performing about the same overall number of memory accesses because the large register file

152:

used those transistors to improve performance instead of decoding instructions that would not be used, a faster processor was possible. The RISC concept was to take advantage of both of these, producing a CPU that was the same level of complexity as the 68000, but much faster.

401:

RISC II proved to be much more successful in silicon and in testing outperformed almost all minicomputers on almost all tasks. For instance, performance ranged from 85% of VAX speed to 256% on a variety of loads. RISC II was also benched against the famous

398:(ALU), meaning that no changes were needed in the core logic. This simple technique yielded a surprising 30% improvement in code density, making an otherwise identical program on Blue run faster than on Gold due to the decreased number of memory accesses. 230:

sometimes an input is reused as an output). In the Berkeley design, then, a register window was a set of several registers, enough of them that the entire procedure stack frame would most likely fit entirely within the register window.

107:

market was moving from 8 to 16-bit with 32-bit designs about to appear. Those designs were premised on the goal of replicating some of the more well-respected existing ISAs from the mainframe and minicomputer world. For instance, the

288:

for additional speed, but without the complex instruction re-ordering of more modern designs. This makes conditional branches a problem, because the compiler has to fill the instruction following a conditional branch (the so-called

234:

return, the results are placed in the window at the same end, and the procedure exits. The register windows are set up to overlap at the ends, so that the results from the call simply "appear" in the window of the caller,

394:, to be stored in memory in a smaller 16-bit format, and for two such instructions to be packed into a single machine word. The instructions would be invisibly expanded back to 32-bit versions before they reached the 56:

Berkeley's project was so successful that it became the name for all similar designs to follow; even the MIPS would become known as a "RISC processor". The Berkeley RISC design was later commercialized by

504:

Techniques developed for and alongside the idea of the reduced instruction set have also been adopted in successively more powerful implementations and extensions of the traditional "complex"

83:

Both RISC and MIPS were developed from the realization that the vast majority of programs used only a small minority of a processor's available instruction set. In a famous 1978 paper,

379:

of its local storage in registers, and increase the nesting depth. Nevertheless, the larger register file required fewer transistors, and the final Blue design, fabbed as

312:

Had the design worked at full speed, performance would have been excellent. Simulations using a variety of small programs compared the 4 MHz RISC I to the 5 MHz

179:

use registers for all operands. This forced the programmer to write additional instructions to load the values from memory, if needed, making a RISC program "less dense".

210:, in which the entire "register file" was broken down into blocks, allowing the compiler to "see" one block for global variables, and another for local variables. 195:

memory locations, so that the registers are used as proxies for the memory locations until their final values after a group of instructions have been determined.

390:, which invisibly "up-converted" 16-bit instructions into a 32-bit format. This allowed smaller instructions, typically things with one or no operands, like 348:

C programs and the Unix operating system, RISC-I influenced instruction sets widely used today, including those for game consoles, smartphones and tablets.

202:

Where the two projects, RISC and MIPS, differed was in the handling of the registers. MIPS simply added lots of registers and left it to the compilers (or

295:), with something selected to be "safe" (i.e., not dependent on the outcome of the conditional). Sometimes the only suitable instruction in this case is 278: 326:

showed this clearly. Program size was about 30% larger than the VAX but very close to that of the Z8000, validating the argument that the higher

254:

RISC I die shot. Most of the chip is occupied by the register file (bottom left area). Control logic only occupies the small top right corner.

729: 882: 826: 1148: 182:

In the era of expensive memory this was a real concern, notably because memory was also much slower than the CPU. Since a RISC design's

206:

programmers) to make use of them. RISC, on the other hand, added circuitry to the CPU to assist the compiler. RISC used the concept of

499:

delivered the highest performance, scalable workstations and servers, for engineering, business, Internet, and cloud computing uses.

339:

required newer fabs before becoming practical. Using the same fabs, RISC I could have largely outperformed the competition.

810: 38: 414:

Work on the original RISC designs ended with RISC II, but the concept lived on at Berkeley. The basic core was re-used in

342:

On February 12, 2015, IEEE installed a plaque at UC Berkeley to commemorate the contribution of RISC-I. The plaque reads:

274: 166:

On the downside, the instructions being removed were generally performing several "sub-instructions". For instance, the

42: 843: 335:

simply could not do this; with so much of the chip surface dedicated to decoder logic, a true 32-bit design like the

1153: 891: 875: 78: 23: 1158: 453:. It was the SPARC that first clearly demonstrated the power of the RISC concept; when they shipped in the first 128:

to decode the user-visible instruction into a series of internal operations. This microcode represented perhaps

577: 109: 159:, small bits of memory holding temporary values that can be accessed very rapidly. This contrasts with normal 88: 868: 757: 860: 669:(Technical report). Berkeley, CA, US: University of California at Berkeley. pp. 13, 59. CSD-83-135. 594: 406:, then considered to be the best commercial chip implementation, and outperformed it by 140% to 420%. 238:. Thus the common procedure call does not have to interact with main memory, greatly accelerating it. 781: 423: 87:

demonstrated that a complex 10,000 line high-level program could be represented using a simplified

704: 945: 814: 661: 833: 761: 301:. A notable number of later RISC-style designs still require the consideration of branch delay. 1119: 776: 569: 493:. On February 13, 2015, IEEE installed a plaque at Oracle Corporation in Santa Clara. It reads 474: 395: 1066: 950: 509: 285: 218: 8: 984: 84: 46: 366:

While the RISC I design ran into delays, work at Berkeley had already turned to the new

1010: 794: 680: 642: 562: 156: 508:. Much of a modern microprocessor's transistor count is devoted to large caches, many 969: 849: 839: 573: 482: 291: 203: 798: 91:

using an 8-bit fixed-length opcode. This was roughly the same conclusion reached at

1095: 786: 646: 632: 517: 505: 470: 450: 124:

was similar in general layout. To provide this rich set of instructions, CPUs used

66: 58: 620: 940: 806: 445:

RISC is less famous, but more influential, for being the basis of the commercial

304:

After a month of validation and debugging, the design was sent to the innovative

207: 117: 1105: 1076: 932: 478: 403: 336: 214: 121: 104: 27: 1142: 964: 916: 853: 50: 45:

between 1980 and 1984. The other project took place a short distance away at

461:

vendor hurrying for a RISC design of their own, leading to designs like the

383:, implemented all of the RISC instruction set with only 40,760 transistors. 1053: 960: 327: 34: 790: 637: 1071: 1048: 890: 513: 477:. By 1986, most large chip vendors followed, working on efforts like the 439: 323: 160: 358: 112:

started out as an effort to produce a single-chip implementation of the

1090: 1086: 1023: 331:

dramatically improved the odds the needed operand was already on-chip.

297: 263: 250: 457:

they outperformed anything on the market. This led to virtually every

1081: 1043: 1027: 979: 535: 486: 462: 419: 258:

The first attempt to implement the RISC concept was originally named

125: 1109: 730:"Oracle to Receive IEEE Milestone Award for SPARC RISC Architecture" 1127: 974: 955: 935: 1123: 1114: 1015: 906: 621:"Implications of Structured Programming for Machine Architecture" 490: 466: 96: 95:, whose studies of their own code running on mainframes like the 1100: 1020: 1000: 991: 530: 431: 320: 313: 188: 113: 438:, which was a full set of chips needed to build a full 32-bit 213:

The idea was to make one particularly common instruction, the

1061: 1058: 1005: 454: 446: 305: 62: 31: 995: 987: 458: 116:, which had a rich instruction set with a wide variety of 564:

Milestones in Computer Science and Information Technology

316: 92: 16:

Research project into RISC-based microprocessor design

762:"The Case for the Reduced Instruction Set Computer" 422:(in the same way that it could be claimed RISC ran 835:Reduced instruction set computer—RISC—Architecture 561: 155:To do this, RISC concentrated on adding many more 1140: 279:International Symposium on Computer Architecture 53:effort starting in 1981 and running until 1984. 805: 756: 614: 612: 876: 22:is one of two seminal research projects into 103:These realizations were taking place as the 609: 418:in 1984, basically a RISC converted to run 171:flavour of any particular instruction, the 883: 869: 838:. Research Studies Press. pp. 19–48. 217:, extremely easy to implement. Almost all 148:of the transistors of the overall design. 780: 636: 618: 434:instead of Smalltalk. Another effort was 386:The other major change was to include an 32:Defense Advanced Research Projects Agency 357: 249: 592: 553: 1141: 769:ACM SIGARCH Computer Architecture News 681:"memorabilia [RISC-I Reunion]" 653: 559: 864: 831: 816:Design and Implementation of RISC I' 659: 822:(Technical report). UCB-CSD-82-106. 275:Association for Computing Machinery 72: 13: 1149:University of California, Berkeley 43:University of California, Berkeley 41:(who coined the term RISC) at the 14: 1170: 595:"Understanding ARM Architectures" 284:RISC I also featured a two-stage 892:Reduced instruction set computer 619:Tanenbaum, Andrew (March 1978). 236:with no data having to be copied 79:Reduced instruction set computer 24:reduced instruction set computer 750: 727: 65:architecture, and inspired the 721: 705:"Berkeley Hardware Prototypes" 697: 673: 586: 110:National Semiconductor NS32000 30:design taking place under the 1: 660:Peek, James B. (1983-06-02). 593:Chisnal, David (2010-08-23). 541: 409: 663:The VLSI Circuitry of RISC I 546: 426:), and later in the similar 89:instruction set architecture 7: 524: 388:instruction-format expander 10: 1175: 353: 76: 1036: 925: 899: 625:Communications of the ACM 560:Reilly, Edwin D. (2003). 245: 221:use a system known as an 760:; Ditzel, David (1980). 709:people.eecs.berkeley.edu 269:The final design, named 1154:Central processing unit 1159:Instruction processing 832:Tabak, Daniel (1987). 516:instruction dispatch, 449:processor design from 363: 255: 175:, for instance, would 791:10.1145/641914.641917 638:10.1145/359361.359454 475:MIPS Computer Systems 396:arithmetic logic unit 361: 253: 219:programming languages 894:(RISC) architectures 286:instruction pipeline 187:instruction word of 319:and the 5 MHz 273:, was published in 85:Andrew S. Tanenbaum 47:Stanford University 364: 256: 199:from this design. 37:. RISC was led by 1136: 1135: 685:risc.berkeley.edu 518:branch prediction 483:Fairchild Clipper 375:sped up as well. 292:branch delay slot 223:activation record 204:assembly language 1166: 885: 878: 871: 862: 861: 857: 827:Berkeley RISC II 823: 821: 813:(October 1982). 802: 784: 766: 758:Patterson, David 744: 743: 741: 740: 734:blogs.oracle.com 725: 719: 718: 716: 715: 701: 695: 694: 692: 691: 677: 671: 670: 668: 657: 651: 650: 640: 616: 607: 606: 604: 602: 590: 584: 583: 567: 557: 506:x86 architecture 473:(SGI) purchased 471:Silicon Graphics 451:Sun Microsystems 393: 362:RISC II die shot 300: 208:register windows 185: 174: 169: 147: 146: 142: 137: 136: 132: 118:addressing modes 73:The RISC concept 67:ARM architecture 59:Sun Microsystems 1174: 1173: 1169: 1168: 1167: 1165: 1164: 1163: 1139: 1138: 1137: 1132: 1032: 921: 895: 889: 846: 819: 811:Patterson, D.A. 764: 753: 748: 747: 738: 736: 726: 722: 713: 711: 703: 702: 698: 689: 687: 679: 678: 674: 666: 658: 654: 617: 610: 600: 598: 591: 587: 580: 558: 554: 549: 544: 527: 412: 391: 356: 296: 248: 183: 172: 167: 144: 140: 139: 134: 130: 129: 81: 75: 39:David Patterson 17: 12: 11: 5: 1172: 1162: 1161: 1156: 1151: 1134: 1133: 1131: 1130: 1117: 1112: 1106:Motorola 88000 1103: 1098: 1093: 1084: 1079: 1074: 1069: 1064: 1056: 1051: 1046: 1040: 1038: 1034: 1033: 1031: 1030: 1018: 1013: 1008: 1003: 998: 982: 977: 972: 967: 958: 953: 948: 943: 938: 933:Analog Devices 929: 927: 923: 922: 920: 919: 914: 909: 903: 901: 897: 896: 888: 887: 880: 873: 865: 859: 858: 844: 829: 824: 803: 782:10.1.1.68.9623 752: 749: 746: 745: 720: 696: 672: 652: 631:(3): 237–246. 608: 585: 578: 551: 550: 548: 545: 543: 540: 539: 538: 533: 526: 523: 502: 501: 479:Motorola 88000 411: 408: 404:Motorola 68000 355: 352: 351: 350: 337:Motorola 68020 247: 244: 215:procedure call 122:Motorola 68000 105:microprocessor 77:Main article: 74: 71: 28:microprocessor 15: 9: 6: 4: 3: 2: 1171: 1160: 1157: 1155: 1152: 1150: 1147: 1146: 1144: 1129: 1125: 1121: 1118: 1116: 1113: 1111: 1107: 1104: 1102: 1099: 1097: 1094: 1092: 1088: 1085: 1083: 1080: 1078: 1075: 1073: 1070: 1068: 1065: 1063: 1060: 1057: 1055: 1052: 1050: 1047: 1045: 1042: 1041: 1039: 1035: 1029: 1025: 1022: 1019: 1017: 1014: 1012: 1009: 1007: 1004: 1002: 999: 997: 993: 989: 986: 983: 981: 978: 976: 973: 971: 968: 966: 965:LatticeMico32 962: 959: 957: 954: 952: 949: 947: 944: 942: 939: 937: 934: 931: 930: 928: 924: 918: 917:Stanford MIPS 915: 913: 912:Berkeley RISC 910: 908: 905: 904: 902: 898: 893: 886: 881: 879: 874: 872: 867: 866: 863: 855: 851: 847: 845:9780863800474 841: 837: 836: 830: 828: 825: 818: 817: 812: 808: 804: 800: 796: 792: 788: 783: 778: 774: 770: 763: 759: 755: 754: 735: 731: 728:Gee, Kelvin. 724: 710: 706: 700: 686: 682: 676: 665: 664: 656: 648: 644: 639: 634: 630: 626: 622: 615: 613: 596: 589: 581: 575: 571: 566: 565: 556: 552: 537: 534: 532: 529: 528: 522: 519: 515: 511: 507: 500: 496: 495: 494: 492: 488: 484: 480: 476: 472: 468: 464: 460: 456: 452: 448: 443: 441: 437: 433: 429: 425: 421: 417: 407: 405: 399: 397: 389: 384: 382: 376: 372: 369: 360: 349: 345: 344: 343: 340: 338: 332: 329: 325: 322: 318: 315: 310: 307: 302: 299: 294: 293: 287: 282: 280: 276: 272: 267: 265: 261: 252: 243: 239: 237: 231: 228: 224: 220: 216: 211: 209: 205: 200: 196: 194: 190: 180: 178: 164: 162: 158: 153: 149: 127: 123: 119: 115: 111: 106: 101: 98: 94: 90: 86: 80: 70: 68: 64: 60: 54: 52: 48: 44: 40: 36: 33: 29: 26:(RISC) based 25: 21: 20:Berkeley RISC 1054:Apollo PRISM 1037:Discontinued 961:LatticeMico8 911: 834: 815: 807:Sequin, C.H. 775:(6): 25–33. 772: 768: 751:Bibliography 737:. Retrieved 733: 723: 712:. Retrieved 708: 699: 688:. Retrieved 684: 675: 662: 655: 628: 624: 599:. Retrieved 588: 563: 555: 503: 497: 444: 435: 427: 415: 413: 400: 387: 385: 380: 377: 373: 367: 365: 346: 341: 333: 328:code density 311: 303: 290: 283: 270: 268: 259: 257: 240: 235: 232: 226: 222: 212: 201: 197: 192: 181: 176: 165: 154: 150: 102: 82: 55: 49:under their 35:VLSI Project 19: 18: 1049:AMD Am29000 514:superscalar 440:workstation 324:Zilog Z8000 227:stack frame 161:main memory 1143:Categories 1087:Intel i860 1024:MicroBlaze 739:2020-03-19 714:2021-11-06 690:2020-03-19 601:13 October 597:. Informit 579:1573565210 568:. p. 542:References 410:Follow-ons 317:VAX 11/780 264:VAX-11/780 1082:DEC PRISM 1028:PicoBlaze 980:Power ISA 854:801855772 777:CiteSeerX 547:Citations 536:Power ISA 487:AMD 29000 463:DEC Alpha 430:that ran 420:Smalltalk 157:registers 126:microcode 975:OpenRISC 956:eSi-RISC 936:Blackfin 799:12034303 525:See also 512:stages, 510:pipeline 489:and the 469:, while 428:VLSI-BAM 1124:PowerPC 1115:PA-RISC 1067:Clipper 1016:Unicore 985:Renesas 907:IBM 801 900:Origins 647:3261560 491:PowerPC 467:PA-RISC 381:RISC II 354:RISC II 189:32-bits 143:⁄ 133:⁄ 97:IBM 360 61:as the 1110:M·CORE 1101:MIPS-X 1021:Xilinx 1011:Sunway 1001:RISC-V 992:SuperH 926:Active 852: 842: 797: 779: 645: 576: 531:RISC-V 455:Sun-4s 432:Prolog 321:16-bit 314:32-bit 277:(ACM) 271:RISC I 246:RISC I 193:shadow 177:always 120:. The 114:VAX-11 1120:POWER 1077:CRISP 1062:AVR32 1059:Atmel 1044:Alpha 1006:SPARC 820:(PDF) 795:S2CID 765:(PDF) 667:(PDF) 643:S2CID 447:SPARC 306:MOSIS 63:SPARC 1128:ROMP 1096:META 1091:i960 1072:CR16 996:V850 988:M32R 970:MIPS 850:OCLC 840:ISBN 603:2015 574:ISBN 465:and 459:Unix 436:SPUR 416:SOAR 368:Blue 260:Gold 51:MIPS 951:AVR 946:ARM 941:ARC 787:doi 633:doi 392:NOP 298:NOP 225:or 184:ADD 173:ADD 168:ADD 138:to 93:IBM 1145:: 1126:, 1122:, 1108:, 1089:, 1026:, 994:, 990:, 963:, 848:. 809:; 793:. 785:. 771:. 767:. 732:. 707:. 683:. 641:. 629:21 627:. 623:. 611:^ 572:. 570:50 485:, 481:, 442:. 266:. 69:. 884:e 877:t 870:v 856:. 801:. 789:: 773:8 742:. 717:. 693:. 649:. 635:: 605:. 582:. 424:C 145:3 141:1 135:4 131:1

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index