Knowledge

Audio time stretching and pitch scaling

Source 📝

168: 486: 203: 299: 506: 98:. When resampling audio to a notably lower pitch, it may be preferred that the source audio is of a higher sample rate, as slowing down the playback rate will reproduce an audio signal of a lower resolution, and therefore reduce the perceived clarity of the sound. On the contrary, when resampling audio to a notably higher pitch, it may be preferred to incorporate an interpolation filter, as frequencies that surpass the 1490: 94:. When using this method, the frequencies in the recording are always scaled at the same ratio as the speed, transposing its perceived pitch up or down in the process. Slowing down the recording to increase duration also lowers the pitch, while speeding it up for a shorter duration respectively raises the pitch, creating the so-called 464:
is the representation of verbal text in compressed time. While one might expect speeding up to reduce comprehension, Herb Friedman says that "Experiments have shown that the brain works most efficiently if the information rate through the ears—via speech—is the 'average' reading rate, which is about
198:
of the signal, and sinusoidal "tracks" are created by connecting peaks in adjacent frames. The tracks are then re-synthesized at a new time scale. This method can yield good results on both polyphonic and percussive material, especially when the signal is separated into sub-bands. However, this
306:
In order to preserve an audio signal's pitch when stretching or compressing its duration, many time-scale modification (TSM) procedures follow a frame-based approach. Given an original discrete-time audio signal, this strategy's first step is to split the signal into short
282:
This is much more limited in scope than the phase vocoder based processing, but can be made much less processor intensive, for real-time applications. It provides the most coherent results for single-pitched sounds like voice or musically monophonic instrument recordings.
161:("beat") waveforms at all non-integer compression/expansion rates, which renders the results phasey and diffuse. Recent improvements allow better quality results at all compression/expansion ratios but a residual smearing effect still remains. 554:
For example, one could move the pitch of every note up by a perfect fifth, keeping the tempo the same. One can view this transposition as "pitch shifting", "shifting" each note up 7 keys on a piano keyboard, or adding a fixed amount on the
439:. However, simply superimposing the unmodified analysis frames typically results in undesired artifacts such as phase discontinuities or amplitude fluctuations. To prevent these kinds of artifacts, the analysis frames are adapted to form 264:
or the synchronized overlap-add method (SOLA) and performs somewhat faster than the phase vocoder on slower machines but fails when the autocorrelation mis-estimates the period of a signal with complicated harmonics (such as
530:
an audio sample while holding speed or duration constant. This may be accomplished by time stretching and then resampling back to the original length. Alternatively, the frequency of the sinusoids in a
164:
The phase vocoder technique can also be used to perform pitch shifting, chorusing, timbre manipulation, harmonizing, and other unusual modifications, all of which can be changed as a function of time.
286:
High-end commercial audio processing packages either combine the two techniques (for example by separating the signal into sinusoid and transient waveforms), or use other techniques based on the
70:
These processes are often used to match the pitches and tempos of two pre-recorded clips for mixing when the clips cannot be reperformed or resampled. Time stretching is often used to adjust
275:(formerly Cool Edit Pro) seems to solve this by looking for the period closest to a center period that the user specifies, which should be an integer multiple of the tempo, and between 30 385: 347: 437: 102:(determined by the sampling rate of the audio reproduction software or device) will create usually undesired sound distortions, a phenomenon that is also known as 773: 1013: 149:
perform an inverse STFT by taking the inverse Fourier transform on each chunk and adding the resulting waveform chunks, also called overlap and add (OLA).
597:-like effect, which may be desirable or undesirable. A process that preserves the formants and character of a voice involves analyzing the signal with a 78:
to fit exactly into the 30 or 60 seconds available. It can be used to conform longer material to a designated time slot, such as a 1-hour broadcast.
973: 871: 696: 978: 1063: 1470: 103: 446:
The strategy of how to derive the synthesis frames from the analysis frames is a key difference among different TSM procedures.
17: 563:. One can view the same transposition as "frequency scaling", "scaling" (multiplying) the frequency of every note by 3/2. 811:
David Malah (April 1979). "Time-domain algorithms for harmonic bandwidth reduction and time scaling of speech signals".
1519: 788: 753:
Jont B. Allen (June 1977). "Short Time Spectral Analysis, Synthesis, and Modification by Discrete Fourier Transform".
589:
Time domain processing works much better here, as smearing is less noticeable, but scaling vocal samples distorts the
1311: 613: 95: 914: 992:
Free and commercial versions of a popular 3rd party time stretching library for iOS, Linux, Windows and Mac OS X
1056: 67:
is a simpler process which affects pitch and speed simultaneously by slowing down or speeding up a recording.
86:
The simplest way to change the duration or pitch of an audio recording is to change the playback speed. For a
1445: 1098: 582:, which adds a fixed frequency offset to the frequency of every note. (In theory one could perform a literal 349:. To achieve the actual time-scale modification, the analysis frames are then temporally relocated to have a 167: 896: 355: 317: 261: 195: 191: 136: 953: 875: 632:
standard for media playback. Similar controls are ubiquitous in media applications and frameworks such as
146:
apply some processing to the Fourier transform magnitudes and phases (like resampling the FFT blocks); and
1465: 394: 185: 1514: 735: 465:
200–300 wpm (words per minute), yet the average rate of speech is in the neighborhood of 100–150 wpm."
140: 858: 700: 1219: 1204: 1195: 1049: 986:
Theory, equations, figures and performances of a real-time guitar pitch shifter running on a DSP chip
606: 242: 721: 586:
in which the musical pitch space location is scaled , but that is highly unusual, and not musical.)
1494: 602: 290:
transform, or artificial neural network processing, producing the highest-quality time stretching.
254: 75: 1170: 947: 594: 485: 44: 1524: 1306: 708: 612:
A detailed description of older analog recording techniques for pitch shifting can be found at
461: 91: 1007: 206:
Modelling a monophonic sound as observation along a helix of a function with a cylinder domain
1130: 1020: (archived 2023-02-02), a well-known algorithm for extreme (>10×) time stretching 944:
A comprehensive overview of current time and pitch modification techniques by Stephan Bernsee
527: 238: 158: 995: 1230: 1200: 968: 579: 8: 1145: 941: 637: 311:
of fixed length. The analysis frames are spaced by a fixed number of samples, called the
31: 1296: 1210: 769: 124:
One way of stretching the length of a signal without affecting the pitch is to build a
1103: 1093: 969:
New Phase-Vocoder Techniques for Pitch-Shifting, Harmonizing and Other Exotic Effects
672: 535:
may be altered directly, and the signal reconstructed at the appropriate time scale.
99: 1407: 1316: 1235: 839: 667: 654: 532: 135:
compute the instantaneous frequency/amplitude relationship of the signal using the
71: 1397: 1260: 1255: 1215: 1155: 1072: 1017: 662: 246: 234: 387:. This frame relocation results in a modification of the signal's duration by a 55:
is the opposite: the process of changing the pitch without affecting the speed.
1460: 1455: 1402: 1382: 1362: 1080: 658: 272: 202: 157:
components well, but early implementations introduced considerable smearing on
1508: 1412: 1326: 1265: 1245: 1190: 1123: 962: 492: 469: 125: 119: 87: 64: 48: 27:
Changing the speed or duration of an audio signal without affecting its pitch
957: 956:
A Javascript pitchshifter based on smbPitchShift code, from the open source
1450: 1417: 1392: 1387: 1270: 1150: 1113: 1108: 1088: 649: 298: 60: 998:
commercial cross-platform library, mainly used by DJ and DAW manufacturers
1440: 1424: 1367: 1357: 1321: 1301: 1140: 1010:
Free MATLAB implementations of various Time-Scale Modification procedures
625: 560: 547: 512: 230: 194:
of the signal. In this method, peaks are identified in frames using the
56: 1029: 454:
For the specific case of speech, time stretching can be performed using
443:, prior to the reconstruction of the time-scale modified output signal. 1372: 1331: 1250: 1135: 844: 827: 505: 495: 1023: 229:
and Schafer in 1978 put forth an alternate solution that works in the
1185: 1160: 1038:— open-source library for changing the tempo, pitch and playback rate 1035: 677: 633: 556: 540: 266: 154: 983: 825: 1240: 1118: 1041: 1026:
open source and commercial libraries for real time audio stretching
567: 250: 1225: 609:
and then resynthesizing it at a different fundamental frequency.
598: 590: 468:
Listening to time-compressed speech is seen as the equivalent of
287: 226: 1004:
from Qneo - specialized synthesizer for creative voice sculpting
900: 989: 571: 143:
of a short, overlapping and smoothly windowed block of samples;
1475: 813:
IEEE Transactions on Acoustics, Speech, and Signal Processing
755:
IEEE Transactions on Acoustics, Speech, and Signal Processing
455: 276: 221: 199:
method is more computationally demanding than other methods.
1032:— open source library for time stretching and pitch shifting 1001: 974:
A new Approach to Transient Processing in the Phase Vocoder
950:
C source code for doing frequency domain pitch manipulation
629: 624:
Pitch-corrected audio timestretch is found in every modern
1377: 861:, Creative Computing Vol. 9, No. 7 / July 1983 / p. 122 43:
is the process of changing the speed or duration of an
828:"A Review of Time-Scale Modification of Music Signals" 397: 358: 320: 614:Alvin and the Chipmunks § Recording technique 566:Musical transposition preserves the ratios of the 491:Pitch shifting (frequency scaling) is provided on 449: 431: 379: 341: 30:"Timestretch" redirects here. For the album, see 1506: 774:"Speech Processing Based on a Sinusoidal Model" 767: 190:Another method for time stretching relies on a 179: 172: 171:Sinusoidal analysis/synthesis system (based on 1057: 948:Stephan Bernsee's smbPitchShift C source code 826:Jonathan Driedger and Meinard Müller (2016). 752: 241:) of a given section of the wave using some 90:recording, this can be accomplished through 942:Time Stretching and Pitch Shifting Overview 810: 302:Frame-based approach of many TSM procedures 1064: 1050: 915:"HTMLMediaElement.playbackRate - Web APIs" 1471:Music technology (electronic and digital) 965:- A good description of the phase vocoder 843: 373: 335: 297: 201: 166: 619: 570:frequencies that determine the sound's 293: 14: 1507: 128:after Flanagan, Golden, and Portnoff. 1045: 872:"Listen to podcasts in half the time" 559:, or adding a fixed amount in linear 526:These techniques can also be used to 380:{\displaystyle H_{s}\in \mathbb {N} } 342:{\displaystyle H_{a}\in \mathbb {N} } 1071: 432:{\displaystyle \alpha =H_{s}/H_{a}} 245:(commonly the peak of the signal's 108: 63:and intended for live performance. 59:is pitch scaling implemented in an 24: 697:"Dolby, The Chipmunks And NAB2004" 25: 1536: 1312:Recording studio as an instrument 935: 519:keep frequency ratio and harmony. 1488: 511:Frequency shifting provided by 504: 484: 475: 113: 907: 450:Speed hearing and speed talking 279:and the lowest bass frequency. 889: 864: 852: 819: 804: 781:The Lincoln Laboratory Journal 761: 746: 728: 689: 210: 13: 1: 963:The Phase Vocoder: A Tutorial 683: 81: 1495:Record production portal 984:How to build a pitch shifter 787:(2): 153–167, archived from 605:vocoder plus any of several 551:, depending on perspective. 262:time-domain harmonic scaling 180:Sinusoidal spectral modeling 7: 1466:Music technology (electric) 990:ZTX Time Stretching Library 643: 186:Spectral modeling synthesis 173:McAulay & Quatieri 1988 10: 1541: 607:pitch detection algorithms 538:Transposing can be called 219: 183: 153:The phase vocoder handles 141:discrete Fourier transform 117: 29: 1520:Digital signal processing 1484: 1433: 1340: 1279: 1169: 1079: 954:pitchshift.js from KievII 257:one period into another. 243:pitch detection algorithm 76:television advertisements 815:. ASSP-27 (2): 121–133. 757:. ASSP-25 (3): 235–238. 716:Cite magazine requires 657:— real-time changes of 595:Alvin and the Chipmunks 215: 740:www.atarimagazines.com 462:Time-compressed speech 433: 381: 343: 303: 233:: attempt to find the 207: 176: 92:sample rate conversion 47:without affecting its 1383:Ghostwriters in music 434: 382: 344: 301: 239:fundamental frequency 237:(or equivalently the 205: 170: 18:Audio time stretching 620:In consumer software 580:amplitude modulation 395: 356: 318: 294:Frame-based approach 996:Elastique by zplane 1297:Hip hop production 845:10.3390/app6020057 515:Frequency Shifter 429: 377: 339: 304: 208: 177: 1515:Audio engineering 1502: 1501: 1104:Critical distance 736:"Variable speech" 673:Scrubbing (audio) 389:stretching factor 351:synthesis hopsize 253:processing), and 100:Nyquist frequency 74:and the audio of 72:radio commercials 16:(Redirected from 1532: 1493: 1492: 1491: 1408:Session musician 1073:Music production 1066: 1059: 1052: 1043: 1042: 930: 929: 927: 925: 911: 905: 904: 899:. Archived from 897:"Speeding iPods" 893: 887: 886: 884: 883: 874:. Archived from 868: 862: 856: 850: 849: 847: 832:Applied Sciences 823: 817: 816: 808: 802: 801: 800: 799: 793: 778: 768:McAulay, R. J.; 765: 759: 758: 750: 744: 743: 732: 726: 725: 719: 714: 712: 704: 699:. Archived from 693: 668:Pitch correction 655:Dynamic tonality 533:sinusoidal model 508: 488: 441:synthesis frames 438: 436: 435: 430: 428: 427: 418: 413: 412: 386: 384: 383: 378: 376: 368: 367: 348: 346: 345: 340: 338: 330: 329: 313:analysis hopsize 109:Frequency domain 21: 1540: 1539: 1535: 1534: 1533: 1531: 1530: 1529: 1505: 1504: 1503: 1498: 1489: 1487: 1480: 1429: 1398:Record producer 1351: 1347: 1336: 1290: 1286: 1275: 1216:Double tracking 1172: 1165: 1156:Sound recording 1094:Audio mastering 1075: 1070: 1018:Wayback Machine 979:PICOLA and TDHS 938: 933: 923: 921: 913: 912: 908: 895: 894: 890: 881: 879: 870: 869: 865: 859:Variable Speech 857: 853: 824: 820: 809: 805: 797: 795: 791: 776: 770:Quatieri, T. F. 766: 762: 751: 747: 734: 733: 729: 718:|magazine= 717: 715: 706: 705: 695: 694: 690: 686: 646: 628:as part of the 622: 599:channel vocoder 593:into a sort of 576:frequency shift 524: 523: 522: 521: 520: 509: 500: 499: 498: 489: 478: 452: 423: 419: 414: 408: 404: 396: 393: 392: 372: 363: 359: 357: 354: 353: 334: 325: 321: 319: 316: 315: 309:analysis frames 296: 260:This is called 249:, or sometimes 247:autocorrelation 224: 218: 213: 188: 182: 139:, which is the 122: 116: 111: 96:Chipmunk effect 84: 41:Time stretching 38: 28: 23: 22: 15: 12: 11: 5: 1538: 1528: 1527: 1522: 1517: 1500: 1499: 1485: 1482: 1481: 1479: 1478: 1473: 1468: 1463: 1458: 1453: 1448: 1443: 1437: 1435: 1431: 1430: 1428: 1427: 1422: 1421: 1420: 1410: 1405: 1403:Rhythm section 1400: 1395: 1390: 1385: 1380: 1375: 1370: 1365: 1363:Audio engineer 1360: 1354: 1352: 1350: 1349: 1345: 1341: 1338: 1337: 1335: 1334: 1329: 1324: 1319: 1314: 1309: 1307:Overproduction 1304: 1299: 1293: 1291: 1289: 1288: 1284: 1280: 1277: 1276: 1274: 1273: 1268: 1263: 1258: 1253: 1248: 1243: 1238: 1236:Exciter effect 1233: 1228: 1223: 1213: 1208: 1198: 1193: 1188: 1183: 1177: 1175: 1167: 1166: 1164: 1163: 1158: 1153: 1148: 1143: 1138: 1133: 1128: 1127: 1126: 1121: 1111: 1106: 1101: 1096: 1091: 1085: 1083: 1077: 1076: 1069: 1068: 1061: 1054: 1046: 1040: 1039: 1033: 1027: 1021: 1011: 1005: 999: 993: 987: 981: 976: 971: 966: 960: 958:KievII library 951: 945: 937: 936:External links 934: 932: 931: 906: 903:on 2006-09-02. 888: 863: 851: 818: 803: 760: 745: 727: 703:on 2008-05-27. 687: 685: 682: 681: 680: 675: 670: 665: 652: 645: 642: 621: 618: 548:pitch shifting 510: 503: 502: 501: 490: 483: 482: 481: 480: 479: 477: 474: 451: 448: 426: 422: 417: 411: 407: 403: 400: 375: 371: 366: 362: 337: 333: 328: 324: 295: 292: 273:Adobe Audition 217: 214: 212: 209: 192:spectral model 181: 178: 175:, p. 161) 151: 150: 147: 144: 118:Main article: 115: 112: 110: 107: 83: 80: 26: 9: 6: 4: 3: 2: 1537: 1526: 1525:Sound effects 1523: 1521: 1518: 1516: 1513: 1512: 1510: 1497: 1496: 1483: 1477: 1474: 1472: 1469: 1467: 1464: 1462: 1459: 1457: 1454: 1452: 1449: 1447: 1446:Interpolation 1444: 1442: 1439: 1438: 1436: 1432: 1426: 1423: 1419: 1416: 1415: 1414: 1413:Backup singer 1411: 1409: 1406: 1404: 1401: 1399: 1396: 1394: 1391: 1389: 1386: 1384: 1381: 1379: 1376: 1374: 1371: 1369: 1366: 1364: 1361: 1359: 1356: 1355: 1353: 1346: 1343: 1342: 1339: 1333: 1330: 1328: 1327:Wall of Sound 1325: 1323: 1320: 1318: 1315: 1313: 1310: 1308: 1305: 1303: 1300: 1298: 1295: 1294: 1292: 1285: 1282: 1281: 1278: 1272: 1269: 1267: 1264: 1262: 1259: 1257: 1254: 1252: 1249: 1247: 1246:Octave effect 1244: 1242: 1239: 1237: 1234: 1232: 1229: 1227: 1224: 1221: 1217: 1214: 1212: 1209: 1206: 1202: 1199: 1197: 1194: 1192: 1191:Chorus effect 1189: 1187: 1184: 1182: 1179: 1178: 1176: 1174: 1168: 1162: 1159: 1157: 1154: 1152: 1149: 1147: 1144: 1142: 1139: 1137: 1134: 1132: 1129: 1125: 1124:Wah-wah pedal 1122: 1120: 1117: 1116: 1115: 1112: 1110: 1107: 1105: 1102: 1100: 1097: 1095: 1092: 1090: 1087: 1086: 1084: 1082: 1078: 1074: 1067: 1062: 1060: 1055: 1053: 1048: 1047: 1044: 1037: 1034: 1031: 1028: 1025: 1022: 1019: 1015: 1012: 1009: 1006: 1003: 1000: 997: 994: 991: 988: 985: 982: 980: 977: 975: 972: 970: 967: 964: 961: 959: 955: 952: 949: 946: 943: 940: 939: 920: 916: 910: 902: 898: 892: 878:on 2011-08-29 877: 873: 867: 860: 855: 846: 841: 837: 833: 829: 822: 814: 807: 794:on 2012-05-21 790: 786: 782: 775: 771: 764: 756: 749: 741: 737: 731: 723: 710: 709:cite magazine 702: 698: 692: 688: 679: 676: 674: 671: 669: 666: 664: 660: 656: 653: 651: 648: 647: 641: 639: 635: 631: 627: 617: 615: 610: 608: 604: 600: 596: 592: 587: 585: 584:pitch scaling 581: 578:performed by 577: 574:, unlike the 573: 569: 564: 562: 558: 552: 550: 549: 544: 542: 536: 534: 529: 518: 514: 507: 497: 494: 487: 476:Pitch scaling 473: 471: 470:speed reading 466: 463: 459: 457: 447: 444: 442: 424: 420: 415: 409: 405: 401: 398: 390: 369: 364: 360: 352: 331: 326: 322: 314: 310: 300: 291: 289: 284: 280: 278: 274: 270: 268: 263: 258: 256: 252: 248: 244: 240: 236: 232: 228: 223: 204: 200: 197: 193: 187: 174: 169: 165: 162: 160: 156: 148: 145: 142: 138: 134: 133: 132: 131:Basic steps: 129: 127: 126:phase vocoder 121: 120:Phase vocoder 114:Phase vocoder 106: 105: 101: 97: 93: 89: 88:digital audio 79: 77: 73: 68: 66: 65:Pitch control 62: 58: 54: 53:Pitch scaling 50: 46: 42: 36: 34: 19: 1486: 1451:Loudness war 1418:Ghost singer 1393:Orchestrator 1388:Horn section 1271:Reverse echo 1231:Equalization 1201:Delay effect 1180: 1151:Punch in/out 1146:Ping-ponging 1114:Effects unit 1109:Effects loop 1099:Audio mixing 1089:Audio filter 922:. Retrieved 918: 909: 901:the original 891: 880:. Retrieved 876:the original 866: 854: 835: 831: 821: 812: 806: 796:, retrieved 789:the original 784: 780: 763: 754: 748: 739: 730: 701:the original 691: 650:Beatmatching 623: 611: 588: 583: 575: 565: 553: 546: 539: 537: 525: 516: 467: 460: 453: 445: 440: 388: 350: 312: 308: 305: 285: 281: 271: 259: 225: 189: 163: 152: 130: 123: 85: 69: 61:effects unit 52: 45:audio signal 40: 39: 32: 1441:Click track 1425:Vocal coach 1368:Backup band 1348:professions 1322:Turntablism 1196:Compression 1181:Pitch shift 1141:Overdubbing 1081:Engineering 1030:Rubber Band 1014:PaulStretch 1008:TSM toolbox 1002:Voice Synth 924:1 September 626:web browser 561:pitch space 231:time domain 211:Time domain 57:Pitch shift 33:Timestretch 1509:Categories 1373:Bandleader 1332:Xenochrony 1287:aesthetics 1251:Noise gate 1211:Distortion 1173:processing 1136:Microphone 1036:SoundTouch 882:2008-07-24 798:2014-09-07 684:References 496:Harmonizer 267:orchestral 220:See also: 184:See also: 82:Resampling 1283:Practices 1186:Auto-Tune 1161:Tape loop 1131:Diffusion 838:(2): 57. 678:Nightcore 634:GStreamer 557:Mel scale 541:frequency 528:transpose 399:α 370:∈ 332:∈ 269:pieces). 255:crossfade 159:transient 104:aliasing. 1358:Arranger 1317:Sampling 1241:Flanging 1119:Talk box 772:(1988), 644:See also 591:formants 568:harmonic 517:does not 493:Eventide 251:cepstral 155:sinusoid 1261:Pumping 1226:Ducking 1171:Signal 1016:at the 543:scaling 288:wavelet 227:Rabiner 35:(album) 1461:Medley 1456:Mashup 1266:Reverb 1256:Phaser 1024:Bungee 663:timbre 659:tuning 572:timbre 235:period 1476:Remix 1434:Other 1344:Roles 1302:Lo-fi 1205:STEED 792:(PDF) 777:(PDF) 638:Unity 456:PSOLA 222:PSOLA 49:pitch 926:2021 722:help 661:and 636:and 630:HTML 513:Bode 216:SOLA 196:STFT 137:STFT 1220:ADT 919:MDN 840:doi 603:LPC 601:or 545:or 391:of 1511:: 1378:DJ 917:. 834:. 830:. 783:, 779:, 738:. 713:: 711:}} 707:{{ 640:. 616:. 472:. 458:. 277:Hz 51:. 1222:) 1218:( 1207:) 1203:( 1065:e 1058:t 1051:v 928:. 885:. 848:. 842:: 836:6 785:1 742:. 724:) 720:( 425:a 421:H 416:/ 410:s 406:H 402:= 374:N 365:s 361:H 336:N 327:a 323:H 37:. 20:)

Index

Audio time stretching
Timestretch (album)
audio signal
pitch
Pitch shift
effects unit
Pitch control
radio commercials
television advertisements
digital audio
sample rate conversion
Chipmunk effect
Nyquist frequency
aliasing.
Phase vocoder
phase vocoder
STFT
discrete Fourier transform
sinusoid
transient

McAulay & Quatieri 1988
Spectral modeling synthesis
spectral model
STFT

PSOLA
Rabiner
time domain
period

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.