Knowledge

Audio time stretching and pitch scaling

Source 📝

157: 475: 192: 288: 495: 87:. When resampling audio to a notably lower pitch, it may be preferred that the source audio is of a higher sample rate, as slowing down the playback rate will reproduce an audio signal of a lower resolution, and therefore reduce the perceived clarity of the sound. On the contrary, when resampling audio to a notably higher pitch, it may be preferred to incorporate an interpolation filter, as frequencies that surpass the 1479: 83:. When using this method, the frequencies in the recording are always scaled at the same ratio as the speed, transposing its perceived pitch up or down in the process. Slowing down the recording to increase duration also lowers the pitch, while speeding it up for a shorter duration respectively raises the pitch, creating the so-called 453:
is the representation of verbal text in compressed time. While one might expect speeding up to reduce comprehension, Herb Friedman says that "Experiments have shown that the brain works most efficiently if the information rate through the ears—via speech—is the 'average' reading rate, which is about
187:
of the signal, and sinusoidal "tracks" are created by connecting peaks in adjacent frames. The tracks are then re-synthesized at a new time scale. This method can yield good results on both polyphonic and percussive material, especially when the signal is separated into sub-bands. However, this
295:
In order to preserve an audio signal's pitch when stretching or compressing its duration, many time-scale modification (TSM) procedures follow a frame-based approach. Given an original discrete-time audio signal, this strategy's first step is to split the signal into short
271:
This is much more limited in scope than the phase vocoder based processing, but can be made much less processor intensive, for real-time applications. It provides the most coherent results for single-pitched sounds like voice or musically monophonic instrument recordings.
150:("beat") waveforms at all non-integer compression/expansion rates, which renders the results phasey and diffuse. Recent improvements allow better quality results at all compression/expansion ratios but a residual smearing effect still remains. 543:
For example, one could move the pitch of every note up by a perfect fifth, keeping the tempo the same. One can view this transposition as "pitch shifting", "shifting" each note up 7 keys on a piano keyboard, or adding a fixed amount on the
428:. However, simply superimposing the unmodified analysis frames typically results in undesired artifacts such as phase discontinuities or amplitude fluctuations. To prevent these kinds of artifacts, the analysis frames are adapted to form 253:
or the synchronized overlap-add method (SOLA) and performs somewhat faster than the phase vocoder on slower machines but fails when the autocorrelation mis-estimates the period of a signal with complicated harmonics (such as
519:
an audio sample while holding speed or duration constant. This may be accomplished by time stretching and then resampling back to the original length. Alternatively, the frequency of the sinusoids in a
153:
The phase vocoder technique can also be used to perform pitch shifting, chorusing, timbre manipulation, harmonizing, and other unusual modifications, all of which can be changed as a function of time.
275:
High-end commercial audio processing packages either combine the two techniques (for example by separating the signal into sinusoid and transient waveforms), or use other techniques based on the
59:
These processes are often used to match the pitches and tempos of two pre-recorded clips for mixing when the clips cannot be reperformed or resampled. Time stretching is often used to adjust
264:(formerly Cool Edit Pro) seems to solve this by looking for the period closest to a center period that the user specifies, which should be an integer multiple of the tempo, and between 30 374: 336: 426: 91:(determined by the sampling rate of the audio reproduction software or device) will create usually undesired sound distortions, a phenomenon that is also known as 762: 1002: 138:
perform an inverse STFT by taking the inverse Fourier transform on each chunk and adding the resulting waveform chunks, also called overlap and add (OLA).
586:-like effect, which may be desirable or undesirable. A process that preserves the formants and character of a voice involves analyzing the signal with a 67:
to fit exactly into the 30 or 60 seconds available. It can be used to conform longer material to a designated time slot, such as a 1-hour broadcast.
962: 860: 685: 967: 1052: 1459: 92: 435:
The strategy of how to derive the synthesis frames from the analysis frames is a key difference among different TSM procedures.
552:. One can view the same transposition as "frequency scaling", "scaling" (multiplying) the frequency of every note by 3/2. 800:
David Malah (April 1979). "Time-domain algorithms for harmonic bandwidth reduction and time scaling of speech signals".
1508: 777: 742:
Jont B. Allen (June 1977). "Short Time Spectral Analysis, Synthesis, and Modification by Discrete Fourier Transform".
578:
Time domain processing works much better here, as smearing is less noticeable, but scaling vocal samples distorts the
1300: 602: 84: 903: 981:
Free and commercial versions of a popular 3rd party time stretching library for iOS, Linux, Windows and Mac OS X
1045: 56:
is a simpler process which affects pitch and speed simultaneously by slowing down or speeding up a recording.
75:
The simplest way to change the duration or pitch of an audio recording is to change the playback speed. For a
1434: 1087: 571:, which adds a fixed frequency offset to the frequency of every note. (In theory one could perform a literal 338:. To achieve the actual time-scale modification, the analysis frames are then temporally relocated to have a 156: 885: 344: 306: 250: 184: 180: 125: 942: 864: 621:
standard for media playback. Similar controls are ubiquitous in media applications and frameworks such as
135:
apply some processing to the Fourier transform magnitudes and phases (like resampling the FFT blocks); and
1454: 383: 174: 1503: 724: 454:
200–300 wpm (words per minute), yet the average rate of speech is in the neighborhood of 100–150 wpm."
129: 847: 689: 1208: 1193: 1184: 1038: 975:
Theory, equations, figures and performances of a real-time guitar pitch shifter running on a DSP chip
595: 231: 710: 575:
in which the musical pitch space location is scaled , but that is highly unusual, and not musical.)
1483: 591: 279:
transform, or artificial neural network processing, producing the highest-quality time stretching.
243: 64: 1159: 936: 583: 474: 33: 1513: 1295: 697: 601:
A detailed description of older analog recording techniques for pitch shifting can be found at
450: 80: 996: 195:
Modelling a monophonic sound as observation along a helix of a function with a cylinder domain
1119: 1009: (archived 2023-02-02), a well-known algorithm for extreme (>10×) time stretching 933:
A comprehensive overview of current time and pitch modification techniques by Stephan Bernsee
516: 227: 147: 984: 1219: 1189: 957: 568: 8: 1134: 930: 626: 300:
of fixed length. The analysis frames are spaced by a fixed number of samples, called the
20: 1285: 1199: 758: 113:
One way of stretching the length of a signal without affecting the pitch is to build a
1092: 1082: 958:
New Phase-Vocoder Techniques for Pitch-Shifting, Harmonizing and Other Exotic Effects
661: 524:
may be altered directly, and the signal reconstructed at the appropriate time scale.
88: 1396: 1305: 1224: 828: 656: 643: 521: 124:
compute the instantaneous frequency/amplitude relationship of the signal using the
60: 1386: 1249: 1244: 1204: 1144: 1061: 1006: 651: 235: 223: 376:. This frame relocation results in a modification of the signal's duration by a 44:
is the opposite: the process of changing the pitch without affecting the speed.
1449: 1444: 1391: 1371: 1351: 1069: 647: 261: 191: 146:
components well, but early implementations introduced considerable smearing on
1497: 1401: 1315: 1254: 1234: 1179: 1112: 951: 481: 458: 114: 108: 76: 53: 37: 16:
Changing the speed or duration of an audio signal without affecting its pitch
946: 945:
A Javascript pitchshifter based on smbPitchShift code, from the open source
1439: 1406: 1381: 1376: 1259: 1139: 1102: 1097: 1077: 638: 287: 49: 987:
commercial cross-platform library, mainly used by DJ and DAW manufacturers
1429: 1413: 1356: 1346: 1310: 1290: 1129: 999:
Free MATLAB implementations of various Time-Scale Modification procedures
614: 549: 536: 501: 219: 183:
of the signal. In this method, peaks are identified in frames using the
45: 1018: 443:
For the specific case of speech, time stretching can be performed using
432:, prior to the reconstruction of the time-scale modified output signal. 1361: 1320: 1239: 1124: 833: 816: 494: 484: 1012: 218:
and Schafer in 1978 put forth an alternate solution that works in the
1174: 1149: 1027:— open-source library for changing the tempo, pitch and playback rate 1024: 666: 622: 545: 529: 255: 143: 972: 814: 1229: 1107: 1030: 1015:
open source and commercial libraries for real time audio stretching
556: 239: 1214: 598:
and then resynthesizing it at a different fundamental frequency.
587: 579: 457:
Listening to time-compressed speech is seen as the equivalent of
276: 215: 993:
from Qneo - specialized synthesizer for creative voice sculpting
889: 978: 560: 132:
of a short, overlapping and smoothly windowed block of samples;
1464: 802:
IEEE Transactions on Acoustics, Speech, and Signal Processing
744:
IEEE Transactions on Acoustics, Speech, and Signal Processing
444: 265: 210: 188:
method is more computationally demanding than other methods.
1021:— open source library for time stretching and pitch shifting 990: 963:
A new Approach to Transient Processing in the Phase Vocoder
939:
C source code for doing frequency domain pitch manipulation
618: 613:
Pitch-corrected audio timestretch is found in every modern
1366: 850:, Creative Computing Vol. 9, No. 7 / July 1983 / p. 122 32:
is the process of changing the speed or duration of an
817:"A Review of Time-Scale Modification of Music Signals" 386: 347: 309: 603:Alvin and the Chipmunks § Recording technique 555:Musical transposition preserves the ratios of the 480:Pitch shifting (frequency scaling) is provided on 438: 420: 368: 330: 19:"Timestretch" redirects here. For the album, see 1495: 763:"Speech Processing Based on a Sinusoidal Model" 756: 179:Another method for time stretching relies on a 168: 161: 160:Sinusoidal analysis/synthesis system (based on 1046: 937:Stephan Bernsee's smbPitchShift C source code 815:Jonathan Driedger and Meinard Müller (2016). 741: 230:) of a given section of the wave using some 79:recording, this can be accomplished through 931:Time Stretching and Pitch Shifting Overview 799: 291:Frame-based approach of many TSM procedures 1053: 1039: 904:"HTMLMediaElement.playbackRate - Web APIs" 1460:Music technology (electronic and digital) 954:- A good description of the phase vocoder 832: 362: 324: 286: 190: 155: 608: 559:frequencies that determine the sound's 282: 1496: 117:after Flanagan, Golden, and Portnoff. 1034: 861:"Listen to podcasts in half the time" 548:, or adding a fixed amount in linear 515:These techniques can also be used to 369:{\displaystyle H_{s}\in \mathbb {N} } 331:{\displaystyle H_{a}\in \mathbb {N} } 1060: 421:{\displaystyle \alpha =H_{s}/H_{a}} 234:(commonly the peak of the signal's 97: 52:and intended for live performance. 48:is pitch scaling implemented in an 13: 686:"Dolby, The Chipmunks And NAB2004" 14: 1525: 1301:Recording studio as an instrument 924: 508:keep frequency ratio and harmony. 1477: 500:Frequency shifting provided by 493: 473: 464: 102: 896: 439:Speed hearing and speed talking 268:and the lowest bass frequency. 878: 853: 841: 808: 793: 770:The Lincoln Laboratory Journal 750: 735: 717: 678: 199: 1: 952:The Phase Vocoder: A Tutorial 672: 70: 1484:Record production portal 973:How to build a pitch shifter 776:(2): 153–167, archived from 594:vocoder plus any of several 540:, depending on perspective. 251:time-domain harmonic scaling 169:Sinusoidal spectral modeling 7: 1455:Music technology (electric) 979:ZTX Time Stretching Library 632: 175:Spectral modeling synthesis 162:McAulay & Quatieri 1988 10: 1530: 596:pitch detection algorithms 527:Transposing can be called 208: 172: 142:The phase vocoder handles 130:discrete Fourier transform 106: 18: 1509:Digital signal processing 1473: 1422: 1329: 1268: 1158: 1068: 943:pitchshift.js from KievII 246:one period into another. 232:pitch detection algorithm 65:television advertisements 804:. ASSP-27 (2): 121–133. 746:. ASSP-25 (3): 235–238. 705:Cite magazine requires 646:— real-time changes of 584:Alvin and the Chipmunks 204: 729:www.atarimagazines.com 451:Time-compressed speech 422: 370: 332: 292: 222:: attempt to find the 196: 165: 81:sample rate conversion 36:without affecting its 1372:Ghostwriters in music 423: 371: 333: 290: 228:fundamental frequency 226:(or equivalently the 194: 159: 609:In consumer software 569:amplitude modulation 384: 345: 307: 283:Frame-based approach 985:Elastique by zplane 1286:Hip hop production 834:10.3390/app6020057 504:Frequency Shifter 418: 366: 328: 293: 197: 166: 1504:Audio engineering 1491: 1490: 1093:Critical distance 725:"Variable speech" 662:Scrubbing (audio) 378:stretching factor 340:synthesis hopsize 242:processing), and 89:Nyquist frequency 63:and the audio of 61:radio commercials 1521: 1482: 1481: 1480: 1397:Session musician 1062:Music production 1055: 1048: 1041: 1032: 1031: 919: 918: 916: 914: 900: 894: 893: 888:. Archived from 886:"Speeding iPods" 882: 876: 875: 873: 872: 863:. Archived from 857: 851: 845: 839: 838: 836: 821:Applied Sciences 812: 806: 805: 797: 791: 790: 789: 788: 782: 767: 757:McAulay, R. J.; 754: 748: 747: 739: 733: 732: 721: 715: 714: 708: 703: 701: 693: 688:. Archived from 682: 657:Pitch correction 644:Dynamic tonality 522:sinusoidal model 497: 477: 430:synthesis frames 427: 425: 424: 419: 417: 416: 407: 402: 401: 375: 373: 372: 367: 365: 357: 356: 337: 335: 334: 329: 327: 319: 318: 302:analysis hopsize 98:Frequency domain 1529: 1528: 1524: 1523: 1522: 1520: 1519: 1518: 1494: 1493: 1492: 1487: 1478: 1476: 1469: 1418: 1387:Record producer 1340: 1336: 1325: 1279: 1275: 1264: 1205:Double tracking 1161: 1154: 1145:Sound recording 1083:Audio mastering 1064: 1059: 1007:Wayback Machine 968:PICOLA and TDHS 927: 922: 912: 910: 902: 901: 897: 884: 883: 879: 870: 868: 859: 858: 854: 848:Variable Speech 846: 842: 813: 809: 798: 794: 786: 784: 780: 765: 759:Quatieri, T. F. 755: 751: 740: 736: 723: 722: 718: 707:|magazine= 706: 704: 695: 694: 684: 683: 679: 675: 635: 617:as part of the 611: 588:channel vocoder 582:into a sort of 565:frequency shift 513: 512: 511: 510: 509: 498: 489: 488: 487: 478: 467: 441: 412: 408: 403: 397: 393: 385: 382: 381: 361: 352: 348: 346: 343: 342: 323: 314: 310: 308: 305: 304: 298:analysis frames 285: 249:This is called 238:, or sometimes 236:autocorrelation 213: 207: 202: 177: 171: 128:, which is the 111: 105: 100: 85:Chipmunk effect 73: 30:Time stretching 27: 17: 12: 11: 5: 1527: 1517: 1516: 1511: 1506: 1489: 1488: 1474: 1471: 1470: 1468: 1467: 1462: 1457: 1452: 1447: 1442: 1437: 1432: 1426: 1424: 1420: 1419: 1417: 1416: 1411: 1410: 1409: 1399: 1394: 1392:Rhythm section 1389: 1384: 1379: 1374: 1369: 1364: 1359: 1354: 1352:Audio engineer 1349: 1343: 1341: 1339: 1338: 1334: 1330: 1327: 1326: 1324: 1323: 1318: 1313: 1308: 1303: 1298: 1296:Overproduction 1293: 1288: 1282: 1280: 1278: 1277: 1273: 1269: 1266: 1265: 1263: 1262: 1257: 1252: 1247: 1242: 1237: 1232: 1227: 1225:Exciter effect 1222: 1217: 1212: 1202: 1197: 1187: 1182: 1177: 1172: 1166: 1164: 1156: 1155: 1153: 1152: 1147: 1142: 1137: 1132: 1127: 1122: 1117: 1116: 1115: 1110: 1100: 1095: 1090: 1085: 1080: 1074: 1072: 1066: 1065: 1058: 1057: 1050: 1043: 1035: 1029: 1028: 1022: 1016: 1010: 1000: 994: 988: 982: 976: 970: 965: 960: 955: 949: 947:KievII library 940: 934: 926: 925:External links 923: 921: 920: 895: 892:on 2006-09-02. 877: 852: 840: 807: 792: 749: 734: 716: 692:on 2008-05-27. 676: 674: 671: 670: 669: 664: 659: 654: 641: 634: 631: 610: 607: 537:pitch shifting 499: 492: 491: 490: 479: 472: 471: 470: 469: 468: 466: 463: 440: 437: 415: 411: 406: 400: 396: 392: 389: 364: 360: 355: 351: 326: 322: 317: 313: 284: 281: 262:Adobe Audition 206: 203: 201: 198: 181:spectral model 170: 167: 164:, p. 161) 140: 139: 136: 133: 107:Main article: 104: 101: 99: 96: 72: 69: 15: 9: 6: 4: 3: 2: 1526: 1515: 1514:Sound effects 1512: 1510: 1507: 1505: 1502: 1501: 1499: 1486: 1485: 1472: 1466: 1463: 1461: 1458: 1456: 1453: 1451: 1448: 1446: 1443: 1441: 1438: 1436: 1435:Interpolation 1433: 1431: 1428: 1427: 1425: 1421: 1415: 1412: 1408: 1405: 1404: 1403: 1402:Backup singer 1400: 1398: 1395: 1393: 1390: 1388: 1385: 1383: 1380: 1378: 1375: 1373: 1370: 1368: 1365: 1363: 1360: 1358: 1355: 1353: 1350: 1348: 1345: 1344: 1342: 1335: 1332: 1331: 1328: 1322: 1319: 1317: 1316:Wall of Sound 1314: 1312: 1309: 1307: 1304: 1302: 1299: 1297: 1294: 1292: 1289: 1287: 1284: 1283: 1281: 1274: 1271: 1270: 1267: 1261: 1258: 1256: 1253: 1251: 1248: 1246: 1243: 1241: 1238: 1236: 1235:Octave effect 1233: 1231: 1228: 1226: 1223: 1221: 1218: 1216: 1213: 1210: 1206: 1203: 1201: 1198: 1195: 1191: 1188: 1186: 1183: 1181: 1180:Chorus effect 1178: 1176: 1173: 1171: 1168: 1167: 1165: 1163: 1157: 1151: 1148: 1146: 1143: 1141: 1138: 1136: 1133: 1131: 1128: 1126: 1123: 1121: 1118: 1114: 1113:Wah-wah pedal 1111: 1109: 1106: 1105: 1104: 1101: 1099: 1096: 1094: 1091: 1089: 1086: 1084: 1081: 1079: 1076: 1075: 1073: 1071: 1067: 1063: 1056: 1051: 1049: 1044: 1042: 1037: 1036: 1033: 1026: 1023: 1020: 1017: 1014: 1011: 1008: 1004: 1001: 998: 995: 992: 989: 986: 983: 980: 977: 974: 971: 969: 966: 964: 961: 959: 956: 953: 950: 948: 944: 941: 938: 935: 932: 929: 928: 909: 905: 899: 891: 887: 881: 867:on 2011-08-29 866: 862: 856: 849: 844: 835: 830: 826: 822: 818: 811: 803: 796: 783:on 2012-05-21 779: 775: 771: 764: 760: 753: 745: 738: 730: 726: 720: 712: 699: 698:cite magazine 691: 687: 681: 677: 668: 665: 663: 660: 658: 655: 653: 649: 645: 642: 640: 637: 636: 630: 628: 624: 620: 616: 606: 604: 599: 597: 593: 589: 585: 581: 576: 574: 573:pitch scaling 570: 567:performed by 566: 563:, unlike the 562: 558: 553: 551: 547: 541: 539: 538: 533: 531: 525: 523: 518: 507: 503: 496: 486: 483: 476: 465:Pitch scaling 462: 460: 459:speed reading 455: 452: 448: 446: 436: 433: 431: 413: 409: 404: 398: 394: 390: 387: 379: 358: 353: 349: 341: 320: 315: 311: 303: 299: 289: 280: 278: 273: 269: 267: 263: 259: 257: 252: 247: 245: 241: 237: 233: 229: 225: 221: 217: 212: 193: 189: 186: 182: 176: 163: 158: 154: 151: 149: 145: 137: 134: 131: 127: 123: 122: 121: 120:Basic steps: 118: 116: 115:phase vocoder 110: 109:Phase vocoder 103:Phase vocoder 95: 94: 90: 86: 82: 78: 77:digital audio 68: 66: 62: 57: 55: 54:Pitch control 51: 47: 43: 42:Pitch scaling 39: 35: 31: 25: 23: 1475: 1440:Loudness war 1407:Ghost singer 1382:Orchestrator 1377:Horn section 1260:Reverse echo 1220:Equalization 1190:Delay effect 1169: 1140:Punch in/out 1135:Ping-ponging 1103:Effects unit 1098:Effects loop 1088:Audio mixing 1078:Audio filter 911:. Retrieved 907: 898: 890:the original 880: 869:. Retrieved 865:the original 855: 843: 824: 820: 810: 801: 795: 785:, retrieved 778:the original 773: 769: 752: 743: 737: 728: 719: 690:the original 680: 639:Beatmatching 612: 600: 577: 572: 564: 554: 542: 535: 528: 526: 514: 505: 456: 449: 442: 434: 429: 377: 339: 301: 297: 294: 274: 270: 260: 248: 214: 178: 152: 141: 119: 112: 74: 58: 50:effects unit 41: 34:audio signal 29: 28: 21: 1430:Click track 1414:Vocal coach 1357:Backup band 1337:professions 1311:Turntablism 1185:Compression 1170:Pitch shift 1130:Overdubbing 1070:Engineering 1019:Rubber Band 1003:PaulStretch 997:TSM toolbox 991:Voice Synth 913:1 September 615:web browser 550:pitch space 220:time domain 200:Time domain 46:Pitch shift 22:Timestretch 1498:Categories 1362:Bandleader 1321:Xenochrony 1276:aesthetics 1240:Noise gate 1200:Distortion 1162:processing 1125:Microphone 1025:SoundTouch 871:2008-07-24 787:2014-09-07 673:References 485:Harmonizer 256:orchestral 209:See also: 173:See also: 71:Resampling 1272:Practices 1175:Auto-Tune 1150:Tape loop 1120:Diffusion 827:(2): 57. 667:Nightcore 623:GStreamer 546:Mel scale 530:frequency 517:transpose 388:α 359:∈ 321:∈ 258:pieces). 244:crossfade 148:transient 93:aliasing. 1347:Arranger 1306:Sampling 1230:Flanging 1108:Talk box 761:(1988), 633:See also 580:formants 557:harmonic 506:does not 482:Eventide 240:cepstral 144:sinusoid 1250:Pumping 1215:Ducking 1160:Signal 1005:at the 532:scaling 277:wavelet 216:Rabiner 24:(album) 1450:Medley 1445:Mashup 1255:Reverb 1245:Phaser 1013:Bungee 652:timbre 648:tuning 561:timbre 224:period 1465:Remix 1423:Other 1333:Roles 1291:Lo-fi 1194:STEED 781:(PDF) 766:(PDF) 627:Unity 445:PSOLA 211:PSOLA 38:pitch 915:2021 711:help 650:and 625:and 619:HTML 502:Bode 205:SOLA 185:STFT 126:STFT 1209:ADT 908:MDN 829:doi 592:LPC 590:or 534:or 380:of 1500:: 1367:DJ 906:. 823:. 819:. 772:, 768:, 727:. 702:: 700:}} 696:{{ 629:. 605:. 461:. 447:. 266:Hz 40:. 1211:) 1207:( 1196:) 1192:( 1054:e 1047:t 1040:v 917:. 874:. 837:. 831:: 825:6 774:1 731:. 713:) 709:( 414:a 410:H 405:/ 399:s 395:H 391:= 363:N 354:s 350:H 325:N 316:a 312:H 26:.

Index

Timestretch (album)
audio signal
pitch
Pitch shift
effects unit
Pitch control
radio commercials
television advertisements
digital audio
sample rate conversion
Chipmunk effect
Nyquist frequency
aliasing.
Phase vocoder
phase vocoder
STFT
discrete Fourier transform
sinusoid
transient

McAulay & Quatieri 1988
Spectral modeling synthesis
spectral model
STFT

PSOLA
Rabiner
time domain
period
fundamental frequency

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.