20:
487:
Voice computing applications span many industries including voice assistants, healthcare, e-Commerce, finance, supply chain, agriculture, text-to-speech, security, marketing, customer support, recruiting, cloud computing, microphones, speakers, and podcasting. Voice technology is projected to grow at
576:
released the Common Voice
Project, a collection of speech files to help contribute to the larger open source machine learning community. The voicebank is currently 12GB in size, with more than 500 hours of English-language voice data that have been collected from 112 countries since the project's
522:
and many other clauses for EU citizens. GDPR also is clear that companies need to outline clear measures to obtain consent if audio recordings are made and define the purpose and scope as to how these recordings will be used, e.g., for training purposes. The bar for valid consent has been raised
941:
Gemmeke, J. F., Ellis, D. P., Freedman, D., Jansen, A., Lawrence, W., Moore, & Ritter, M. (2017, March). Audio set: An ontology and human-labeled dataset for audio events. In
Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on (pp. 776-780).
568:
released AudioSet, a large-scale collection of human-labeled 10-second sound clips drawn from YouTube videos. It contains 1,010,480 videos of human speech files, or 2,793.5 hours in total. It was released as part of the IEEE ICASSP 2017 Conference.
819:
535:
507:
is a significant law to protect minors using the
Internet. With an increasing number of minors interacting with voice computing devices (e.g. the Amazon Alexa), on October 23, 2017 the
740:"PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop: Cmusphinx/Pocketsphinx"
964:
364:
Voice computing software can read/write, record, clean, encrypt/decrypt, playback, transcode, transcribe, compress, publish, featurize, model, and visualize voice files.
290:
emerged on Apple iPhones as the first voice assistant accessible to consumers. This innovation led to a dramatic shift to building voice-first computing architectures.
1011:
905:
655:
893:
128:
started to build speech machines to produce the earliest synthetic speech sounds. This led to further work by Thomas Edison to record audio with
795:
577:
inception in June 2017. This dataset has already resulted in creative projects like the DeepSpeech model, an open source transcription model.
314:(2018 - 500,000 devices sold and 1 billion devices active with iOS/Siri). These shifts, along with advancements in cloud infrastructure (e.g.
500:. In some states, it is legal to record a conversation with the consent of only one party, in others the consent of all parties is required.
965:
https://blog.mozilla.org/blog/2017/11/29/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset/
807:
643:
349:
could be used as voice computers. Moreover, there has become increasingly more interfaces for voice computers with the advent of
1031:
1021:
523:
under the GDPR. Consents must be freely given, specific, informed, and unambiguous; tacit consent is no longer sufficient.
1026:
833:
630:
Schwoebel, J. (2018). An
Introduction to Voice Computing in Python. Boston; Seattle, Atlanta: NeuroLex Laboratories.
906:
https://voicebot.ai/2018/09/02/amazon-alexa-now-has-50000-skills-worldwide-is-on-20000-devices-used-by-3500-brands/
656:
https://voicebot.ai/2018/09/02/amazon-alexa-now-has-50000-skills-worldwide-is-on-20000-devices-used-by-3500-brands/
894:
https://voicebot.ai/2018/01/24/google-assistant-app-total-reaches-nearly-2400-thats-not-real-number-really-1719/
796:
https://www.businesswire.com/news/home/20180417006122/en/Global-Speech-Voice-Recognition-Market-2018-Forecast
497:
132:
and play it back in corporate settings. In the 1950s-1960s there were primitive attempts to build automated
54:
1006:
591:
50:
808:
https://techcrunch.com/2017/10/24/ftc-relaxes-coppa-rule-so-kids-can-issue-voice-searches-and-commands/
38:
1016:
356:
As of
September 2018, there are currently over 20,000 types of devices compatible with Amazon Alexa.
66:
508:
415:
240:
92:
Voice computing has become increasingly significant in modern times, especially with the advent of
42:
395:
644:
https://medium.com/swlh/the-past-present-and-future-of-speech-recognition-technology-cf13c179aaf
322:, have solidified the voice computing field and made it widely relevant to the public at large.
519:
488:
a CAGR of 19-25% by 2025, making it an attractive industry for startups and investors alike.
165:
930:
611:
606:
596:
531:
There are many research conferences that relate to voice computing. Some of these include:
229:
145:
105:
86:
148:
were used to recognize up to 1,000 words that speech recognition systems became relevant.
8:
315:
294:
was released by Sony in North
America in 2013 (70+ million devices), Amazon released the
586:
573:
350:
337:
Note that voice computers do not necessarily need a screen, such as in the traditional
133:
129:
109:
180:
125:
62:
975:
Mozilla's large repository of voice data will shape the future of machine learning.
976:
342:
303:
273:
101:
58:
834:
https://iapp.org/news/a/how-do-the-rules-on-audio-recording-change-under-the-gdpr/
727:
442:
for featurizing audio files with things like mel-frequency cepstrum coefficients.
771:
421:
70:
549:
ACII2019 The 8th Int'l Conf. on
Affective Computing and Intelligent Interaction
511:
relaxed the COPAA rule so that children can issue voice searches and commands.
113:
93:
34:
is the discipline that develops hardware or software to process voice inputs.
1000:
783:
346:
176:
739:
74:
254:
launches a voice application, bring speech recognition to mobile devices.
759:
601:
387:
338:
295:
277:
206:
97:
46:
24:
988:
918:
302:
released
Cortana (2015 - 400 million Windows 10 users), Google released
447:
307:
262:
19:
715:
437:
432:
for visualizing audio file spectrograms and featurizing audio files.
299:
191:
137:
536:
International
Conference on Acoustics, Speech, and Signal Processing
367:
Here are some popular software packages related to voice computing:
561:
There are over 50,000 Alexa skills worldwide as of
September 2018.
243:
begins research in hotword detection during normal conversations.
311:
679:
631:
390:
audio files from one format to another (e.g. .WAV --> .MP3).
744:
565:
558:
Google Assistant has roughly 2,000 actions as of January 2018.
410:
for manipulating audio files and removing environmental noise.
381:
319:
306:(2016 - 2 billion active monthly users on Android phones), and
251:
195:
78:
845:
691:
952:
881:
869:
504:
217:
334:
is assembled hardware and software to process voice inputs.
546:
IEEE Int'l Conf. on Automatic Face and Gesture Recognition
515:
287:
124:
Voice computing has a rich history. First, scientists like
667:
198:, capable of recognizing spoken digits with 90% accuracy.
405:
291:
280:
to make voice computing relevant to the public at large.
141:
82:
931:
https://research.google.com/audioset/dataset/speech.html
703:
857:
144:, and others. However, it was not until the 1980s that
353:-enabled devices, such as within cars or televisions.
220:is created, which can understand over 1,000 words.
476:for audio and music analysis, feature extraction.
998:
977:https://opensource.com/article/18/4/common-voice
168:creates the Acoustic-Mechanical speech machine.
728:https://www.audeering.com/technology/opensmile/
460:for playing back audio files (text-to-speech).
772:https://pycryptodome.readthedocs.io/en/latest/
496:In the United States, the states have varying
420:for featurizing transcripts with things like
468:for encrypting and decrypting audio files.
784:https://github.com/libAudioFlux/audioFlux/
820:"Federal Register :: Request Access"
452:for transcribing speech files into text.
18:
526:
518:is a new European law that governs the
491:
999:
760:https://github.com/nateshmbhat/pyttsx3
553:
1012:History of human–computer interaction
989:https://github.com/mozilla/DeepSpeech
919:https://research.google.com/audioset/
37:It spans many other fields including
341:. In other embodiments, traditional
400:for recording and filtering audio.
13:
716:https://librosa.github.io/librosa/
14:
1043:
642:Timeline for Speech Recognition.
27:, an example of a voice computer
981:
969:
957:
945:
935:
923:
911:
898:
886:
874:
862:
850:
838:
826:
812:
800:
788:
776:
764:
752:
732:
482:
298:in 2014 (30+ million devices),
232:to predict phonemes in speech.
720:
708:
696:
684:
672:
660:
648:
636:
624:
209:can recognize up to 16 words.
1:
1032:Computational fields of study
680:https://www.audacityteam.org/
632:https://neurolex.ai/voicebook
617:
498:telephone call recording laws
55:automatic speech recognition
7:
1022:Natural language processing
846:http://interspeech2018.org/
692:http://sox.sourceforge.net/
592:Natural language processing
580:
359:
325:
108:, and improved accuracy of
51:natural language processing
10:
1048:
953:https://voice.mozilla.org/
882:http://acii-conf.org/2019/
870:https://fg2018.cse.sc.edu/
119:
39:human-computer interaction
1027:Computational linguistics
67:digital signal processing
509:Federal Trade Commission
416:Natural Language Toolkit
265:releases Siri on iPhone
241:National Security Agency
43:conversational computing
668:https://www.ffmpeg.org/
16:Discipline in computing
963:Common Voice Project.
951:Common Voice Project.
28:
704:https://www.nltk.org/
520:right to be forgotten
166:Wolfgang von Kempelen
22:
858:http://avec2018.org/
612:Hands-free computing
607:Ubiquitous computing
597:Voice user interface
527:Research conferences
492:Legal considerations
230:Hidden Markov Models
146:Hidden Markov Models
106:serverless computing
87:information security
554:Developer community
316:Amazon Web Services
1007:Speech recognition
844:Interspeech 2018.
587:Speech recognition
574:Mozilla Foundation
572:In November 2017,
179:invents the first
134:speech recognition
130:dictation machines
110:speech recognition
104:, a shift towards
29:
917:Google AudioSet.
480:
479:
284:
283:
228:IBM Tangora uses
181:dictation machine
126:Wolfgang Kempelen
63:audio engineering
1039:
1017:Voice technology
991:
985:
979:
973:
967:
961:
955:
949:
943:
939:
933:
927:
921:
915:
909:
902:
896:
890:
884:
878:
872:
866:
860:
854:
848:
842:
836:
830:
824:
823:
816:
810:
804:
798:
792:
786:
780:
774:
768:
762:
756:
750:
749:
748:. 29 March 2020.
736:
730:
724:
718:
712:
706:
700:
694:
688:
682:
676:
670:
664:
658:
652:
646:
640:
634:
628:
370:
369:
343:laptop computers
304:Google Assistant
151:
150:
102:Google Assistant
59:speech synthesis
1047:
1046:
1042:
1041:
1040:
1038:
1037:
1036:
997:
996:
995:
994:
986:
982:
974:
970:
962:
958:
950:
946:
940:
936:
929:Audioset data.
928:
924:
916:
912:
903:
899:
891:
887:
879:
875:
867:
863:
855:
851:
843:
839:
831:
827:
818:
817:
813:
805:
801:
793:
789:
781:
777:
769:
765:
757:
753:
738:
737:
733:
725:
721:
713:
709:
701:
697:
689:
685:
677:
673:
665:
661:
653:
649:
641:
637:
629:
625:
620:
583:
556:
529:
494:
485:
422:parts of speech
362:
328:
122:
71:cloud computing
32:Voice computing
17:
12:
11:
5:
1045:
1035:
1034:
1029:
1024:
1019:
1014:
1009:
993:
992:
980:
968:
956:
944:
934:
922:
910:
897:
885:
873:
861:
849:
837:
825:
811:
799:
794:Businesswire.
787:
775:
770:Pycryptodome.
763:
751:
731:
719:
707:
695:
683:
671:
659:
647:
635:
622:
621:
619:
616:
615:
614:
609:
604:
599:
594:
589:
582:
579:
564:In June 2017,
555:
552:
551:
550:
547:
544:
541:
538:
528:
525:
493:
490:
484:
481:
478:
477:
474:
470:
469:
466:
462:
461:
458:
454:
453:
450:
444:
443:
440:
434:
433:
430:
426:
425:
418:
412:
411:
408:
402:
401:
398:
392:
391:
384:
378:
377:
374:
361:
358:
332:voice computer
327:
324:
282:
281:
271:
267:
266:
260:
256:
255:
249:
245:
244:
238:
234:
233:
226:
222:
221:
215:
211:
210:
204:
200:
199:
189:
185:
184:
174:
170:
169:
163:
159:
158:
155:
121:
118:
114:text-to-speech
94:smart speakers
15:
9:
6:
4:
3:
2:
1044:
1033:
1030:
1028:
1025:
1023:
1020:
1018:
1015:
1013:
1010:
1008:
1005:
1004:
1002:
990:
984:
978:
972:
966:
960:
954:
948:
938:
932:
926:
920:
914:
907:
904:Voicebot.ai.
901:
895:
892:Voicebot.ai.
889:
883:
877:
871:
865:
859:
853:
847:
841:
835:
829:
821:
815:
809:
803:
797:
791:
785:
779:
773:
767:
761:
755:
747:
746:
741:
735:
729:
723:
717:
711:
705:
699:
693:
687:
681:
675:
669:
663:
657:
654:Voicebot.AI.
651:
645:
639:
633:
627:
623:
613:
610:
608:
605:
603:
600:
598:
595:
593:
590:
588:
585:
584:
578:
575:
570:
567:
562:
559:
548:
545:
542:
539:
537:
534:
533:
532:
524:
521:
517:
512:
510:
506:
501:
499:
489:
475:
472:
471:
467:
465:Pycryptodome
464:
463:
459:
456:
455:
451:
449:
446:
445:
441:
439:
436:
435:
431:
428:
427:
423:
419:
417:
414:
413:
409:
407:
404:
403:
399:
397:
394:
393:
389:
385:
383:
380:
379:
375:
373:Package name
372:
371:
368:
365:
357:
354:
352:
348:
347:mobile phones
344:
340:
335:
333:
323:
321:
317:
313:
309:
305:
301:
297:
293:
289:
286:Around 2011,
279:
275:
272:
269:
268:
264:
261:
258:
257:
253:
250:
247:
246:
242:
239:
236:
235:
231:
227:
224:
223:
219:
216:
213:
212:
208:
205:
202:
201:
197:
193:
190:
187:
186:
182:
178:
177:Thomas Edison
175:
172:
171:
167:
164:
161:
160:
156:
153:
152:
149:
147:
143:
139:
135:
131:
127:
117:
115:
111:
107:
103:
99:
95:
90:
88:
84:
80:
76:
72:
68:
64:
60:
56:
52:
48:
44:
40:
35:
33:
26:
21:
987:DeepSpeech.
983:
971:
959:
947:
937:
925:
913:
900:
888:
880:ASCII 2019.
876:
864:
852:
840:
828:
814:
806:Techcrunch.
802:
790:
778:
766:
754:
743:
734:
722:
710:
698:
686:
674:
662:
650:
638:
626:
571:
563:
560:
557:
540:Interspeech
530:
513:
502:
495:
486:
483:Applications
376:Description
366:
363:
355:
336:
331:
329:
285:
123:
91:
75:data science
36:
31:
30:
856:AVEC 2018.
782:AudioFlux.
726:OpenSMILE.
602:Audio codec
388:transcoding
339:Amazon Echo
296:Amazon Echo
278:Amazon Echo
207:IBM Shoebox
136:systems by
98:Amazon Echo
47:linguistics
25:Amazon Echo
1001:Categories
678:Audacity.
618:References
503:Moreover,
473:AudioFlux
448:CMU Sphinx
868:2018 FG.
758:Pyttsx3.
714:LibROSA.
438:OpenSMILE
310:released
300:Microsoft
276:releases
194:releases
192:Bell Labs
138:Bell Labs
96:like the
666:FFmpeg.
581:See also
514:Lastly,
457:Pyttsx3
429:LibROSA
396:Audacity
360:Software
326:Hardware
116:models.
312:HomePod
120:History
832:IAPP.
745:GitHub
702:NLTK.
566:Google
382:FFmpeg
320:codecs
318:) and
274:Amazon
252:Google
196:Audrey
157:Event
85:, and
79:ethics
942:IEEE.
690:SoX.
543:AVEC
505:COPPA
308:Apple
270:2014
263:Apple
259:2011
248:2008
237:2006
225:1986
218:Harpy
214:1971
203:1962
188:1952
173:1879
162:1784
154:Date
516:GDPR
386:for
288:Siri
112:and
100:and
23:The
406:SoX
351:IoT
345:or
292:PS4
142:IBM
83:law
1003::
742:.
424:.
330:A
183:.
140:,
89:.
81:,
77:,
73:,
69:,
65:,
61:,
57:,
53:,
49:,
45:,
41:,
908:.
822:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.