Knowledge

Voice computing

Source đź“ť

20: 487:
Voice computing applications span many industries including voice assistants, healthcare, e-Commerce, finance, supply chain, agriculture, text-to-speech, security, marketing, customer support, recruiting, cloud computing, microphones, speakers, and podcasting. Voice technology is projected to grow at
576:
released the Common Voice Project, a collection of speech files to help contribute to the larger open source machine learning community. The voicebank is currently 12GB in size, with more than 500 hours of English-language voice data that have been collected from 112 countries since the project's
522:
and many other clauses for EU citizens. GDPR also is clear that companies need to outline clear measures to obtain consent if audio recordings are made and define the purpose and scope as to how these recordings will be used, e.g., for training purposes. The bar for valid consent has been raised
941:
Gemmeke, J. F., Ellis, D. P., Freedman, D., Jansen, A., Lawrence, W., Moore, & Ritter, M. (2017, March). Audio set: An ontology and human-labeled dataset for audio events. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on (pp. 776-780).
568:
released AudioSet, a large-scale collection of human-labeled 10-second sound clips drawn from YouTube videos. It contains 1,010,480 videos of human speech files, or 2,793.5 hours in total. It was released as part of the IEEE ICASSP 2017 Conference.
819: 535: 507:
is a significant law to protect minors using the Internet. With an increasing number of minors interacting with voice computing devices (e.g. the Amazon Alexa), on October 23, 2017 the
740:"PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop: Cmusphinx/Pocketsphinx" 964: 364:
Voice computing software can read/write, record, clean, encrypt/decrypt, playback, transcode, transcribe, compress, publish, featurize, model, and visualize voice files.
290:
emerged on Apple iPhones as the first voice assistant accessible to consumers. This innovation led to a dramatic shift to building voice-first computing architectures.
1011: 905: 655: 893: 128:
started to build speech machines to produce the earliest synthetic speech sounds. This led to further work by Thomas Edison to record audio with
795: 577:
inception in June 2017. This dataset has already resulted in creative projects like the DeepSpeech model, an open source transcription model.
314:(2018 - 500,000 devices sold and 1 billion devices active with iOS/Siri). These shifts, along with advancements in cloud infrastructure (e.g. 500:. In some states, it is legal to record a conversation with the consent of only one party, in others the consent of all parties is required. 965:
https://blog.mozilla.org/blog/2017/11/29/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset/
807: 643: 349:
could be used as voice computers. Moreover, there has become increasingly more interfaces for voice computers with the advent of
1031: 1021: 523:
under the GDPR. Consents must be freely given, specific, informed, and unambiguous; tacit consent is no longer sufficient.
1026: 833: 630:
Schwoebel, J. (2018). An Introduction to Voice Computing in Python. Boston; Seattle, Atlanta: NeuroLex Laboratories.
906:
https://voicebot.ai/2018/09/02/amazon-alexa-now-has-50000-skills-worldwide-is-on-20000-devices-used-by-3500-brands/
656:
https://voicebot.ai/2018/09/02/amazon-alexa-now-has-50000-skills-worldwide-is-on-20000-devices-used-by-3500-brands/
894:
https://voicebot.ai/2018/01/24/google-assistant-app-total-reaches-nearly-2400-thats-not-real-number-really-1719/
796:
https://www.businesswire.com/news/home/20180417006122/en/Global-Speech-Voice-Recognition-Market-2018-Forecast
497: 132:
and play it back in corporate settings. In the 1950s-1960s there were primitive attempts to build automated
54: 1006: 591: 50: 808:
https://techcrunch.com/2017/10/24/ftc-relaxes-coppa-rule-so-kids-can-issue-voice-searches-and-commands/
38: 1016: 356:
As of September 2018, there are currently over 20,000 types of devices compatible with Amazon Alexa.
66: 508: 415: 240: 92:
Voice computing has become increasingly significant in modern times, especially with the advent of
42: 395: 644:
https://medium.com/swlh/the-past-present-and-future-of-speech-recognition-technology-cf13c179aaf
322:, have solidified the voice computing field and made it widely relevant to the public at large. 519: 488:
a CAGR of 19-25% by 2025, making it an attractive industry for startups and investors alike.
165: 930: 611: 606: 596: 531:
There are many research conferences that relate to voice computing. Some of these include:
229: 145: 105: 86: 148:
were used to recognize up to 1,000 words that speech recognition systems became relevant.
8: 315: 294:
was released by Sony in North America in 2013 (70+ million devices), Amazon released the
586: 573: 350: 337:
Note that voice computers do not necessarily need a screen, such as in the traditional
133: 129: 109: 180: 125: 62: 975:
Mozilla's large repository of voice data will shape the future of machine learning.
976: 342: 303: 273: 101: 58: 834:
https://iapp.org/news/a/how-do-the-rules-on-audio-recording-change-under-the-gdpr/
727: 442:
for featurizing audio files with things like mel-frequency cepstrum coefficients.
771: 421: 70: 549:
ACII2019 The 8th Int'l Conf. on Affective Computing and Intelligent Interaction
511:
relaxed the COPAA rule so that children can issue voice searches and commands.
113: 93: 34:
is the discipline that develops hardware or software to process voice inputs.
1000: 783: 346: 176: 739: 74: 254:
launches a voice application, bring speech recognition to mobile devices.
759: 601: 387: 338: 295: 277: 206: 97: 46: 24: 988: 918: 302:
released Cortana (2015 - 400 million Windows 10 users), Google released
447: 307: 262: 19: 715: 437: 432:
for visualizing audio file spectrograms and featurizing audio files.
299: 191: 137: 536:
International Conference on Acoustics, Speech, and Signal Processing
367:
Here are some popular software packages related to voice computing:
561:
There are over 50,000 Alexa skills worldwide as of September 2018.
243:
begins research in hotword detection during normal conversations.
311: 679: 631: 390:
audio files from one format to another (e.g. .WAV --> .MP3).
744: 565: 558:
Google Assistant has roughly 2,000 actions as of January 2018.
410:
for manipulating audio files and removing environmental noise.
381: 319: 306:(2016 - 2 billion active monthly users on Android phones), and 251: 195: 78: 845: 691: 952: 881: 869: 504: 217: 334:
is assembled hardware and software to process voice inputs.
546:
IEEE Int'l Conf. on Automatic Face and Gesture Recognition
515: 287: 124:
Voice computing has a rich history. First, scientists like
667: 198:, capable of recognizing spoken digits with 90% accuracy. 405: 291: 280:
to make voice computing relevant to the public at large.
141: 82: 931:
https://research.google.com/audioset/dataset/speech.html
703: 857: 144:, and others. However, it was not until the 1980s that 353:-enabled devices, such as within cars or televisions. 220:is created, which can understand over 1,000 words. 476:for audio and music analysis, feature extraction. 998: 977:https://opensource.com/article/18/4/common-voice 168:creates the Acoustic-Mechanical speech machine. 728:https://www.audeering.com/technology/opensmile/ 460:for playing back audio files (text-to-speech). 772:https://pycryptodome.readthedocs.io/en/latest/ 496:In the United States, the states have varying 420:for featurizing transcripts with things like 468:for encrypting and decrypting audio files. 784:https://github.com/libAudioFlux/audioFlux/ 820:"Federal Register :: Request Access" 452:for transcribing speech files into text. 18: 526: 518:is a new European law that governs the 491: 999: 760:https://github.com/nateshmbhat/pyttsx3 553: 1012:History of human–computer interaction 989:https://github.com/mozilla/DeepSpeech 919:https://research.google.com/audioset/ 37:It spans many other fields including 341:. In other embodiments, traditional 400:for recording and filtering audio. 13: 716:https://librosa.github.io/librosa/ 14: 1043: 642:Timeline for Speech Recognition. 27:, an example of a voice computer 981: 969: 957: 945: 935: 923: 911: 898: 886: 874: 862: 850: 838: 826: 812: 800: 788: 776: 764: 752: 732: 482: 298:in 2014 (30+ million devices), 232:to predict phonemes in speech. 720: 708: 696: 684: 672: 660: 648: 636: 624: 209:can recognize up to 16 words. 1: 1032:Computational fields of study 680:https://www.audacityteam.org/ 632:https://neurolex.ai/voicebook 617: 498:telephone call recording laws 55:automatic speech recognition 7: 1022:Natural language processing 846:http://interspeech2018.org/ 692:http://sox.sourceforge.net/ 592:Natural language processing 580: 359: 325: 108:, and improved accuracy of 51:natural language processing 10: 1048: 953:https://voice.mozilla.org/ 882:http://acii-conf.org/2019/ 870:https://fg2018.cse.sc.edu/ 119: 39:human-computer interaction 1027:Computational linguistics 67:digital signal processing 509:Federal Trade Commission 416:Natural Language Toolkit 265:releases Siri on iPhone 241:National Security Agency 43:conversational computing 668:https://www.ffmpeg.org/ 16:Discipline in computing 963:Common Voice Project. 951:Common Voice Project. 28: 704:https://www.nltk.org/ 520:right to be forgotten 166:Wolfgang von Kempelen 22: 858:http://avec2018.org/ 612:Hands-free computing 607:Ubiquitous computing 597:Voice user interface 527:Research conferences 492:Legal considerations 230:Hidden Markov Models 146:Hidden Markov Models 106:serverless computing 87:information security 554:Developer community 316:Amazon Web Services 1007:Speech recognition 844:Interspeech 2018. 587:Speech recognition 574:Mozilla Foundation 572:In November 2017, 179:invents the first 134:speech recognition 130:dictation machines 110:speech recognition 104:, a shift towards 29: 917:Google AudioSet. 480: 479: 284: 283: 228:IBM Tangora uses 181:dictation machine 126:Wolfgang Kempelen 63:audio engineering 1039: 1017:Voice technology 991: 985: 979: 973: 967: 961: 955: 949: 943: 939: 933: 927: 921: 915: 909: 902: 896: 890: 884: 878: 872: 866: 860: 854: 848: 842: 836: 830: 824: 823: 816: 810: 804: 798: 792: 786: 780: 774: 768: 762: 756: 750: 749: 748:. 29 March 2020. 736: 730: 724: 718: 712: 706: 700: 694: 688: 682: 676: 670: 664: 658: 652: 646: 640: 634: 628: 370: 369: 343:laptop computers 304:Google Assistant 151: 150: 102:Google Assistant 59:speech synthesis 1047: 1046: 1042: 1041: 1040: 1038: 1037: 1036: 997: 996: 995: 994: 986: 982: 974: 970: 962: 958: 950: 946: 940: 936: 929:Audioset data. 928: 924: 916: 912: 903: 899: 891: 887: 879: 875: 867: 863: 855: 851: 843: 839: 831: 827: 818: 817: 813: 805: 801: 793: 789: 781: 777: 769: 765: 757: 753: 738: 737: 733: 725: 721: 713: 709: 701: 697: 689: 685: 677: 673: 665: 661: 653: 649: 641: 637: 629: 625: 620: 583: 556: 529: 494: 485: 422:parts of speech 362: 328: 122: 71:cloud computing 32:Voice computing 17: 12: 11: 5: 1045: 1035: 1034: 1029: 1024: 1019: 1014: 1009: 993: 992: 980: 968: 956: 944: 934: 922: 910: 897: 885: 873: 861: 849: 837: 825: 811: 799: 794:Businesswire. 787: 775: 770:Pycryptodome. 763: 751: 731: 719: 707: 695: 683: 671: 659: 647: 635: 622: 621: 619: 616: 615: 614: 609: 604: 599: 594: 589: 582: 579: 564:In June 2017, 555: 552: 551: 550: 547: 544: 541: 538: 528: 525: 493: 490: 484: 481: 478: 477: 474: 470: 469: 466: 462: 461: 458: 454: 453: 450: 444: 443: 440: 434: 433: 430: 426: 425: 418: 412: 411: 408: 402: 401: 398: 392: 391: 384: 378: 377: 374: 361: 358: 332:voice computer 327: 324: 282: 281: 271: 267: 266: 260: 256: 255: 249: 245: 244: 238: 234: 233: 226: 222: 221: 215: 211: 210: 204: 200: 199: 189: 185: 184: 174: 170: 169: 163: 159: 158: 155: 121: 118: 114:text-to-speech 94:smart speakers 15: 9: 6: 4: 3: 2: 1044: 1033: 1030: 1028: 1025: 1023: 1020: 1018: 1015: 1013: 1010: 1008: 1005: 1004: 1002: 990: 984: 978: 972: 966: 960: 954: 948: 938: 932: 926: 920: 914: 907: 904:Voicebot.ai. 901: 895: 892:Voicebot.ai. 889: 883: 877: 871: 865: 859: 853: 847: 841: 835: 829: 821: 815: 809: 803: 797: 791: 785: 779: 773: 767: 761: 755: 747: 746: 741: 735: 729: 723: 717: 711: 705: 699: 693: 687: 681: 675: 669: 663: 657: 654:Voicebot.AI. 651: 645: 639: 633: 627: 623: 613: 610: 608: 605: 603: 600: 598: 595: 593: 590: 588: 585: 584: 578: 575: 570: 567: 562: 559: 548: 545: 542: 539: 537: 534: 533: 532: 524: 521: 517: 512: 510: 506: 501: 499: 489: 475: 472: 471: 467: 465:Pycryptodome 464: 463: 459: 456: 455: 451: 449: 446: 445: 441: 439: 436: 435: 431: 428: 427: 423: 419: 417: 414: 413: 409: 407: 404: 403: 399: 397: 394: 393: 389: 385: 383: 380: 379: 375: 373:Package name 372: 371: 368: 365: 357: 354: 352: 348: 347:mobile phones 344: 340: 335: 333: 323: 321: 317: 313: 309: 305: 301: 297: 293: 289: 286:Around 2011, 279: 275: 272: 269: 268: 264: 261: 258: 257: 253: 250: 247: 246: 242: 239: 236: 235: 231: 227: 224: 223: 219: 216: 213: 212: 208: 205: 202: 201: 197: 193: 190: 187: 186: 182: 178: 177:Thomas Edison 175: 172: 171: 167: 164: 161: 160: 156: 153: 152: 149: 147: 143: 139: 135: 131: 127: 117: 115: 111: 107: 103: 99: 95: 90: 88: 84: 80: 76: 72: 68: 64: 60: 56: 52: 48: 44: 40: 35: 33: 26: 21: 987:DeepSpeech. 983: 971: 959: 947: 937: 925: 913: 900: 888: 880:ASCII 2019. 876: 864: 852: 840: 828: 814: 806:Techcrunch. 802: 790: 778: 766: 754: 743: 734: 722: 710: 698: 686: 674: 662: 650: 638: 626: 571: 563: 560: 557: 540:Interspeech 530: 513: 502: 495: 486: 483:Applications 376:Description 366: 363: 355: 336: 331: 329: 285: 123: 91: 75:data science 36: 31: 30: 856:AVEC 2018. 782:AudioFlux. 726:OpenSMILE. 602:Audio codec 388:transcoding 339:Amazon Echo 296:Amazon Echo 278:Amazon Echo 207:IBM Shoebox 136:systems by 98:Amazon Echo 47:linguistics 25:Amazon Echo 1001:Categories 678:Audacity. 618:References 503:Moreover, 473:AudioFlux 448:CMU Sphinx 868:2018 FG. 758:Pyttsx3. 714:LibROSA. 438:OpenSMILE 310:released 300:Microsoft 276:releases 194:releases 192:Bell Labs 138:Bell Labs 96:like the 666:FFmpeg. 581:See also 514:Lastly, 457:Pyttsx3 429:LibROSA 396:Audacity 360:Software 326:Hardware 116:models. 312:HomePod 120:History 832:IAPP. 745:GitHub 702:NLTK. 566:Google 382:FFmpeg 320:codecs 318:) and 274:Amazon 252:Google 196:Audrey 157:Event 85:, and 79:ethics 942:IEEE. 690:SoX. 543:AVEC 505:COPPA 308:Apple 270:2014 263:Apple 259:2011 248:2008 237:2006 225:1986 218:Harpy 214:1971 203:1962 188:1952 173:1879 162:1784 154:Date 516:GDPR 386:for 288:Siri 112:and 100:and 23:The 406:SoX 351:IoT 345:or 292:PS4 142:IBM 83:law 1003:: 742:. 424:. 330:A 183:. 140:, 89:. 81:, 77:, 73:, 69:, 65:, 61:, 57:, 53:, 49:, 45:, 41:, 908:. 822:.

Index


Amazon Echo
human-computer interaction
conversational computing
linguistics
natural language processing
automatic speech recognition
speech synthesis
audio engineering
digital signal processing
cloud computing
data science
ethics
law
information security
smart speakers
Amazon Echo
Google Assistant
serverless computing
speech recognition
text-to-speech
Wolfgang Kempelen
dictation machines
speech recognition
Bell Labs
IBM
Hidden Markov Models
Wolfgang von Kempelen
Thomas Edison
dictation machine

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑