{"id":3149,"date":"2021-06-28T17:36:02","date_gmt":"2021-06-28T21:36:02","guid":{"rendered":"https:\/\/engineering.jhu.edu\/nsa\/?page_id=3149"},"modified":"2022-03-18T12:51:02","modified_gmt":"2022-03-18T16:51:02","slug":"emotional-speech","status":"publish","type":"page","link":"https:\/\/engineering.jhu.edu\/nsa\/research\/emotional-speech\/","title":{"rendered":"Emotional Speech"},"content":{"rendered":"<p>Emotion is the cornerstone of human interactions. In fact, the\u00a0<em>manner<\/em> in which something is said can convey just as much information as the words being spoken. Emotional cues in speech are conveyed through vocal inflections known as <em>prosody<\/em>. Key attributes of prosody include the relative pitch, duration, and intensity of the speech signal. Together, these features encode stress, intonation, and rhythm, all of which impact emotion perception. While we have identified general patterns to relate prosody to emotion, machine classification and synthesis of emotional speech remain unreliable.<\/p>\n<p><a href=\"https:\/\/engineering.jhu.edu\/nsa\/wp-content\/uploads\/2020\/08\/block_architectures2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-2798 alignleft\" src=\"https:\/\/engineering.jhu.edu\/nsa\/wp-content\/uploads\/2020\/08\/block_architectures2-300x153.png\" alt=\"\" width=\"300\" height=\"153\" srcset=\"https:\/\/engineering.jhu.edu\/nsa\/wp-content\/uploads\/2020\/08\/block_architectures2-300x153.png 300w, https:\/\/engineering.jhu.edu\/nsa\/wp-content\/uploads\/2020\/08\/block_architectures2-1024x523.png 1024w, https:\/\/engineering.jhu.edu\/nsa\/wp-content\/uploads\/2020\/08\/block_architectures2-768x392.png 768w, https:\/\/engineering.jhu.edu\/nsa\/wp-content\/uploads\/2020\/08\/block_architectures2-1536x784.png 1536w, https:\/\/engineering.jhu.edu\/nsa\/wp-content\/uploads\/2020\/08\/block_architectures2-2048x1046.png 2048w, https:\/\/engineering.jhu.edu\/nsa\/wp-content\/uploads\/2020\/08\/block_architectures2-500x255.png 500w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a>This project tackles the problem of multi-speaker <em>emotion conversion<\/em>, which refers to modifying the perceived affect of a speech utterance without changing its linguistic content or speaker identity. To start, we have curated the <a href=\"https:\/\/engineering.jhu.edu\/nsa\/vesus\/\">VESUS dataset<\/a>, which represents one of the largest collections of parallel emotional speech utterances. From here, we have introduced a new paradigm for emotion conversion that blends deformable curve registration of the prosodic features with a novel variational cycle GAN architecture that aligns the\u00a0<em>distribution<\/em> of prosodic embeddings across classes. We are now working on methods for open-loop duration modification that leverages a learned attention mechanism.<\/p>\n<p>Beyond advancing speech technology, our framework has the unique potential to improve human-human interactions by modifying natural speech. Consider autism, which is characterized by a blunted ability to recognize and respond to emotional cues. Suppose we could artificially amplify spoken emotional cues to the point at which an individual with autism can accurately perceive them. Over time, it may be possible to retrain the brain of autistic patient to use the appropriate neural pathways.<\/p>\n<h2>Selected Publications<\/h2>\n<ul>\n<li style=\"margin-bottom: 10px;\">R. Shankar, H.-W. Hsieh, N. Charon, A. Venkataraman. <a href=\"https:\/\/www.isca-speech.org\/archive\/Interspeech_2020\/abstracts\/1323.html\"><em>Multispeaker Emotion Conversion via a Chained Encoder-Decoder-Predictor Network and Latent Variable Regularization<\/em><\/a>. In Proc. Interspeech: Conference of the International Speech Communication Association, 3391-3395, 2020.<\/li>\n<li style=\"margin-bottom: 10px;\">R. Shankar, J. Sager, A. Venkataraman. <a href=\"https:\/\/www.isca-speech.org\/archive\/Interspeech_2020\/abstracts\/1325.html\"><em>Non-parallel Emotion Conversion using a Pair Discrimination Deep-Generative Hybrid Model<\/em><\/a>. In Proc. Interspeech: Conf of the International Speech Communication Association, 3396-3400, 2020.<\/li>\n<li style=\"margin-bottom: 10px;\">J. Sager, J. Reinhold, R. Shankar, A. Venkataraman. <a href=\"https:\/\/www.isca-speech.org\/archive\/Interspeech_2019\/abstracts\/1413.html\"><em>VESUS: A Crowd-Annotated Database to Study Emotion Production and Perception in Spoken English<\/em><\/a>. In Proc. Interspeech: Conf of the International Speech Communication Association, 316-320, 2019. <span style=\"color: #800080;\"><strong>Selected for an Oral Presentation (&lt;20% of Papers)<\/strong><\/span><\/li>\n<li style=\"margin-bottom: 10px;\">R. Shankar, J. Sager, A. Venkataraman. <a href=\"https:\/\/www.isca-speech.org\/archive\/Interspeech_2019\/abstracts\/2512.html\"><em>A Multi-Speaker Emotion Morphing Model Using Highway Networks and Maximum Likelihood Objective<\/em><\/a>. In Proc. Interspeech: Conference of the International Speech Communication Association, 2848-2852, 2019. <span style=\"color: #800080;\"><strong>Selected for an Oral Presentation (&lt;20% of Papers)<\/strong><\/span><\/li>\n<\/ul>\n<h2><a href=\"https:\/\/engineering.jhu.edu\/nsa\/research\/\">Back to Research Overview<\/a><\/h2>\n","protected":false},"excerpt":{"rendered":"<p>Emotion is the cornerstone of human interactions. In fact, the\u00a0manner in which something is said can convey just as much information as the words being spoken. Emotional cues in speech are conveyed through vocal inflections known as prosody. Key attributes &hellip; <a href=\"https:\/\/engineering.jhu.edu\/nsa\/research\/emotional-speech\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1476,"featured_media":0,"parent":30,"menu_order":4,"comment_status":"closed","ping_status":"closed","template":"sidebar-page.php","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"coauthors":[],"class_list":["post-3149","page","type-page","status-publish","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Emotional Speech - Neural Systems Analysis Laboratory<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/engineering.jhu.edu\/nsa\/research\/emotional-speech\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Emotional Speech - Neural Systems Analysis Laboratory\" \/>\n<meta property=\"og:description\" content=\"Emotion is the cornerstone of human interactions. In fact, the\u00a0manner in which something is said can convey just as much information as the words being spoken. Emotional cues in speech are conveyed through vocal inflections known as prosody. Key attributes &hellip; Continue reading &rarr;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/engineering.jhu.edu\/nsa\/research\/emotional-speech\/\" \/>\n<meta property=\"og:site_name\" content=\"Neural Systems Analysis Laboratory\" \/>\n<meta property=\"article:modified_time\" content=\"2022-03-18T16:51:02+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/engineering.jhu.edu\/nsa\/wp-content\/uploads\/2020\/08\/block_architectures2-300x153.png\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n\t<meta name=\"twitter:label2\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data2\" content=\"Archana Venkataraman\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/research\\\/emotional-speech\\\/\",\"url\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/research\\\/emotional-speech\\\/\",\"name\":\"Emotional Speech - Neural Systems Analysis Laboratory\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/research\\\/emotional-speech\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/research\\\/emotional-speech\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/wp-content\\\/uploads\\\/2020\\\/08\\\/block_architectures2-300x153.png\",\"datePublished\":\"2021-06-28T21:36:02+00:00\",\"dateModified\":\"2022-03-18T16:51:02+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/research\\\/emotional-speech\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/research\\\/emotional-speech\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/research\\\/emotional-speech\\\/#primaryimage\",\"url\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/wp-content\\\/uploads\\\/2020\\\/08\\\/block_architectures2.png\",\"contentUrl\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/wp-content\\\/uploads\\\/2020\\\/08\\\/block_architectures2.png\",\"width\":4000,\"height\":2042},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/research\\\/emotional-speech\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Research\",\"item\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/research\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Emotional Speech\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/#website\",\"url\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/\",\"name\":\"Neural Systems Analysis Laboratory\",\"description\":\"Decoding the Brain, One Snapshot at a Time\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/engineering.jhu.edu\\\/nsa\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Emotional Speech - Neural Systems Analysis Laboratory","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/engineering.jhu.edu\/nsa\/research\/emotional-speech\/","og_locale":"en_US","og_type":"article","og_title":"Emotional Speech - Neural Systems Analysis Laboratory","og_description":"Emotion is the cornerstone of human interactions. In fact, the\u00a0manner in which something is said can convey just as much information as the words being spoken. Emotional cues in speech are conveyed through vocal inflections known as prosody. Key attributes &hellip; Continue reading &rarr;","og_url":"https:\/\/engineering.jhu.edu\/nsa\/research\/emotional-speech\/","og_site_name":"Neural Systems Analysis Laboratory","article_modified_time":"2022-03-18T16:51:02+00:00","og_image":[{"url":"https:\/\/engineering.jhu.edu\/nsa\/wp-content\/uploads\/2020\/08\/block_architectures2-300x153.png","type":"","width":"","height":""}],"twitter_misc":{"Est. reading time":"2 minutes","Written by":"Archana Venkataraman"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/engineering.jhu.edu\/nsa\/research\/emotional-speech\/","url":"https:\/\/engineering.jhu.edu\/nsa\/research\/emotional-speech\/","name":"Emotional Speech - Neural Systems Analysis Laboratory","isPartOf":{"@id":"https:\/\/engineering.jhu.edu\/nsa\/#website"},"primaryImageOfPage":{"@id":"https:\/\/engineering.jhu.edu\/nsa\/research\/emotional-speech\/#primaryimage"},"image":{"@id":"https:\/\/engineering.jhu.edu\/nsa\/research\/emotional-speech\/#primaryimage"},"thumbnailUrl":"https:\/\/engineering.jhu.edu\/nsa\/wp-content\/uploads\/2020\/08\/block_architectures2-300x153.png","datePublished":"2021-06-28T21:36:02+00:00","dateModified":"2022-03-18T16:51:02+00:00","breadcrumb":{"@id":"https:\/\/engineering.jhu.edu\/nsa\/research\/emotional-speech\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/engineering.jhu.edu\/nsa\/research\/emotional-speech\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/engineering.jhu.edu\/nsa\/research\/emotional-speech\/#primaryimage","url":"https:\/\/engineering.jhu.edu\/nsa\/wp-content\/uploads\/2020\/08\/block_architectures2.png","contentUrl":"https:\/\/engineering.jhu.edu\/nsa\/wp-content\/uploads\/2020\/08\/block_architectures2.png","width":4000,"height":2042},{"@type":"BreadcrumbList","@id":"https:\/\/engineering.jhu.edu\/nsa\/research\/emotional-speech\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/engineering.jhu.edu\/nsa\/"},{"@type":"ListItem","position":2,"name":"Research","item":"https:\/\/engineering.jhu.edu\/nsa\/research\/"},{"@type":"ListItem","position":3,"name":"Emotional Speech"}]},{"@type":"WebSite","@id":"https:\/\/engineering.jhu.edu\/nsa\/#website","url":"https:\/\/engineering.jhu.edu\/nsa\/","name":"Neural Systems Analysis Laboratory","description":"Decoding the Brain, One Snapshot at a Time","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/engineering.jhu.edu\/nsa\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/engineering.jhu.edu\/nsa\/wp-json\/wp\/v2\/pages\/3149","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/engineering.jhu.edu\/nsa\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/engineering.jhu.edu\/nsa\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/engineering.jhu.edu\/nsa\/wp-json\/wp\/v2\/users\/1476"}],"replies":[{"embeddable":true,"href":"https:\/\/engineering.jhu.edu\/nsa\/wp-json\/wp\/v2\/comments?post=3149"}],"version-history":[{"count":7,"href":"https:\/\/engineering.jhu.edu\/nsa\/wp-json\/wp\/v2\/pages\/3149\/revisions"}],"predecessor-version":[{"id":3466,"href":"https:\/\/engineering.jhu.edu\/nsa\/wp-json\/wp\/v2\/pages\/3149\/revisions\/3466"}],"up":[{"embeddable":true,"href":"https:\/\/engineering.jhu.edu\/nsa\/wp-json\/wp\/v2\/pages\/30"}],"wp:attachment":[{"href":"https:\/\/engineering.jhu.edu\/nsa\/wp-json\/wp\/v2\/media?parent=3149"}],"wp:term":[{"taxonomy":"author","embeddable":true,"href":"https:\/\/engineering.jhu.edu\/nsa\/wp-json\/wp\/v2\/coauthors?post=3149"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}