ビスポーク音声デザイン

最先端の予測性能を持つ合成音声品質の自動評価システム UTMOS について

佐伯高明 , and 高道慎之介

日本音響学会誌, 2024

(Invited article / 招待記事)

Bib PDF

@article{saeki24asj-kaisetsu_utmos,
  title = {最先端の予測性能を持つ合成音声品質の自動評価システム UTMOS について},
  author = {高明, 佐伯 and 慎之介, 高道},
  year = {2024},
  journal = {日本音響学会誌},
  note = {(Invited article / 招待記事)},
  memo = {本研究は科研費 21H04900，22H03639，23H03418，23K18474，JST創発的研究支援事業 JP23KJ0828，ムーンショット JPMJPS2011 の助成を受けた．本解説記事の執筆に際し，東京大学大学院の関健太郎氏の助言を受けた．}
}

音声分析

Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data

Hitoshi Suda , Aya Watanabe , and Shinnosuke Takamichi

In Proceedings of Interspeech , 2024

@inproceedings{suda24interspeech_sukikirai,
  abbr_publisher = {Proceedings of Interspeech},
  booktitle = {Proceedings of Interspeech},
  title = {Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data},
  author = {Suda, Hitoshi and Watanabe, Aya and Takamichi, Shinnosuke},
  year = {2024},
  memo = {This work was supported by JSPS KAKENHI Grant Number 23K20017, 21H04900, 22H03639, and 23H03418, and JST FOREST JPMJFR226V. This paper is based on results obtained from a project, JPNP20006, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).}
}

インターネット時代の音声コーパスの作成

高道慎之介

日本音響学会誌, 2024

(Invited article / 招待記事)

Bib PDF

@article{takamichi24asj_invited-article-dark-data,
  title = {インターネット時代の音声コーパスの作成},
  author = {慎之介, 高道},
  year = {2024},
  journal = {日本音響学会誌},
  note = {(Invited article / 招待記事)},
  memo = {本研究は科研費 21H04900，22H03639，23H03418，23K18474，JST創発的研究支援事業 JP23KJ0828，ムーンショット JPMJPS2011 の助成を受けた．また，本稿の執筆にあたり東京大学 大学院情報理工学系研究科 修士課程 関 健太郎氏からの助言を受けた．}
}

F0に基づいて伸縮された画像文字からの音声合成

大中緋慧 , 宮崎亮一 , and 高道慎之介

In 日本音響学会春季研究発表会 , 2024

Bib PDF Slides

@inproceedings{ohnaka24asjs_vtts-width,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {{F0}に基づいて伸縮された画像文字からの音声合成},
  author = {緋慧, 大中 and 亮一, 宮崎 and 慎之介, 高道},
  year = {2024},
  memo = {本研究は，科研費 22H03639，21H04900 による補助を受けた}
}

YODAS：YouTube 動画から構築される多言語大規模音声データセット

Xinjian Li , 高道慎之介 , 佐伯高明 , William Chen , 塩田さやか , and 渡部晋治

In 日本音響学会春季研究発表会 , 2024

Bib PDF Slides

@inproceedings{li24asjs_yodas,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {{YODAS：YouTube} 動画から構築される多言語大規模音声データセット},
  author = {Li, Xinjian and 慎之介, 高道 and 高明, 佐伯 and Chen, William and 塩田さやか and 晋治, 渡部},
  year = {2024},
  memo = {本研究は，アメリカ国立科学財団資金番号 #2138259, #2138286, #2138307, #2137603, #2138296 により支援さ れた，PSC Bridges2 と NCSA Delta via ACCESS allocation CIS210014 を使用した．また本研究は科研費 21H04900, 22H03639，23H03418，JST 創発的研究支援事業 JP23KJ0828， ムーンショット JPMJPS2011 の助成を受けた．}
}

NecoBERT：音声合成のために事前学習された自己教師あり学習モデル

中田亘 , 佐伯高明 , 齋藤佑樹 , 高道慎之介 , and 猿渡洋

In 日本音響学会春季研究発表会 , Mar 2024

PDF

対照学習モデルによる音声-声質表現文の埋め込み表現獲得

渡邊亞椰 , 高道慎之介 , 齋藤佑樹 , 中田亘 , 辛徳泰 , and 猿渡洋

In 日本音響学会春季研究発表会 , Mar 2024

Bib PDF

@inproceedings{watanabe24asjs_coconut-embedding,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {対照学習モデルによる音声-声質表現文の埋め込み表現獲得},
  author = {亞椰, 渡邊 and 慎之介, 高道 and 佑樹, 齋藤 and 亘, 中田 and 徳泰, 辛 and 洋, 猿渡},
  year = {2024},
  memo = {本研究は科研費 21H04900, 22H03639，23H03418，JST 創発的研究支援事業 JP23KJ0828，ムーンショット JPMJPS2011 の助成を受けたものです.}
}

音声分析

Do learned speech symbols follow Zipf’s law?

Shinnosuke Takamichi , Hiroki Maeda , Joonyong Park , Daisuke Saito , and Hiroshi Saruwatari

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Mar 2024

arXiv Bib

@inproceedings{meada24icassp_zipf-law,
  abbr_publisher = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title = {Do learned speech symbols follow {Z}ipf's law?},
  author = {Takamichi, Shinnosuke and Maeda, Hiroki and Park, Joonyong and Saito, Daisuke and Saruwatari, Hiroshi},
  year = {2024}
}

Emotion-controllable Speech Synthesis using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence

Xuan Luo , Shinnosuke Takamichi , Yuki Saito , Tomoki Koriyama , and Hiroshi Saruwatari

APSIPA Transactions, Mar 2024

Bib PDF

@article{luo24apsipa-trans_emotion-synthesis,
  title = {Emotion-controllable Speech Synthesis using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence},
  author = {Luo, Xuan and Takamichi, Shinnosuke and Saito, Yuki and Koriyama, Tomoki and Saruwatari, Hiroshi},
  year = {2024},
  journal = {APSIPA Transactions},
}

「キミは私の声、好きかな?」大規模主観評価による声質好感度コーパスの構築とその分析

須田仁志 , 渡邊亞椰 , and 高道慎之介

In 情報処理学会音声言語処理研究会 , Mar 2024

Bib PDF

@inproceedings{suda24slp_voice-attractiveness,
  abbr_publisher = {情報処理学会 音声言語処理研究会},
  booktitle = {情報処理学会 音声言語処理研究会},
  title = {「キミは私の声、好きかな?」大規模主観評価による声質好感度コーパスの構築とその分析},
  author = {仁志, 須田 and 亞椰, 渡邊 and 慎之介, 高道},
  year = {2024},
  memo = {本研究は JSPS 科研費 23K20017，21H04900， 22H03639，23H03418，JST 創発的研究支援事業 JPMJFR226V の助成を受けたものです．この成果は，国立研 究開発法人新エネルギー・産業技術総合開発機構（NEDO） の委託業務（JPNP20006）の結果得られたものです．}
}

楽音合成

複数のオーディオエフェクトが適用された楽音に対するエフェクトチェイン推定と原音復元

武伯寒 , 渡邉研斗 , 中塚貴之 , Tian Cheng , 中野倫靖 , 後藤真孝 , 高道慎之介 , and 猿渡洋

In 日本音響学会春季研究発表会 , Mar 2024

Bib PDF

@inproceedings{take24asjs_audio-effect,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {複数のオーディオエフェクトが適用された楽音に対するエフェクトチェイン推定と原音復元},
  author = {伯寒, 武 and 研斗, 渡邉 and 貴之, 中塚 and Cheng, Tian and 倫靖, 中野 and 真孝, 後藤 and 慎之介, 高道 and 洋, 猿渡},
  year = {2024},
  memo = {本研究は科研費 21H04900, 22H03639，23H03418， JST 創発的研究支援事業 JP23KJ0828，ムーンショット JPMJPS2011 の助成を受けたものです}
}

楽音合成

Audio Effect Chain Estimation and Dry Signal Recovery from Multi-Effect-Processed Musical Signals

Osamu Take , Kento Watanabe , Takayuki Nakatsuka , Tian Cheng , Tomoyasu Nakano , Masataka Goto , Shinnosuke Takamichi , and Hiroshi Saruwatari

In Proceedings of International Conference on Digital Audio Effects (DAFx) , Mar 2024

Bib PDF

@inproceedings{take24dafx_effect-chain,
  abbr_publisher = {Proceedings of International Conference on Digital Audio Effects (DAFx)},
  booktitle = {Proceedings of International Conference on Digital Audio Effects (DAFx)},
  title = {Audio Effect Chain Estimation and Dry Signal Recovery from Multi-Effect-Processed Musical Signals},
  author = {Take, Osamu and Watanabe, Kento and Nakatsuka, Takayuki and Cheng, Tian and Nakano, Tomoyasu and Goto, Masataka and Takamichi, Shinnosuke and Saruwatari, Hiroshi},
  memo = {This work is supported by JSPS KAKENHI 21H04900, 22H03639, and 23H03418, JST FOREST JPMJFR226V, and Moonshot R&D Grant Number JPMJPS2011.},
  year = {2024}
}

2023

Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control

Aya Watanabe , Shinnosuke Takamichi , Yuki Saito , Wataru Nakata , Detai Xin , and Hiroshi Saruwatari

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) , Mar 2023

@inproceedings{watanabe23asru_coconut-corpus,
  abbr_publisher = {IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU)},
  booktitle = {IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU)},
  title = {{Coco-Nut}: Corpus of {J}apanese Utterance and Voice Characteristics Description for Prompt-based Control},
  author = {Watanabe, Aya and Takamichi, Shinnosuke and Saito, Yuki and Nakata, Wataru and Xin, Detai and Saruwatari, Hiroshi},
  year = {2023}
}

HumanDiffusion: diffusion model using perceptual gradients

Yota Ueda , Shinnosuke Takamichi , Yuki Saito , Norihiro Takamune , and Hiroshi Saruwatari

In Proceedings of Interspeech , Mar 2023

arXiv Bib

@inproceedings{ueda23interspeech_humandiffusion,
  abbr_publisher = {Proceedings of Interspeech},
  booktitle = {Proceedings of Interspeech},
  title = {HumanDiffusion: diffusion model using perceptual gradients},
  author = {Ueda, Yota and Takamichi, Shinnosuke and Saito, Yuki and Takamune, Norihiro and Saruwatari, Hiroshi},
  year = {2023}
}

Coco-Nut: 自由記述文による声質制御に向けた多話者音声・声質自由記述ペアデータセット

渡邊亞椰 , 高道慎之介 , 齋藤佑樹 , 辛徳泰 , and 猿渡洋

In 日本音響学会秋季研究発表会 , Mar 2023

Bib PDF Slides Website

@inproceedings{watanabe23asja_coconut,
  abbr_publisher = {日本音響学会秋季研究発表会},
  booktitle = {日本音響学会秋季研究発表会},
  title = {Coco-Nut: 自由記述文による声質制御に向けた多話者音声・声質自由記述ペアデータセット},
  author = {亞椰, 渡邊 and 慎之介, 高道 and 佑樹, 齋藤 and 徳泰, 辛 and 洋, 猿渡},
  year = {2023}
}

学習・評価ループを用いたデータ選択によるダークデータからの音声合成

関健太郎 , 高道慎之介 , 佐伯高明 , and 猿渡洋

In 日本音響学会春季研究発表会 , Mar 2023

Bib PDF

@inproceedings{seki23asjs_dark-data,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {学習・評価ループを用いたデータ選択によるダークデータからの音声合成},
  author = {健太郎, 関 and 慎之介, 高道 and 高明, 佐伯 and 洋, 猿渡},
  year = {2023}
}

jaCappella corpus: A Japanese a cappella vocal ensemble corpus

Tomohiko Nakamura , Shinnosuke Takamichi , Naoko Tanji , Satoru Fukayama , and Hiroshi Saruwatari

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Mar 2023

@inproceedings{nakamura23icassp_jacappella,
  abbr_publisher = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title = {{jaCappella} corpus: A Japanese a cappella vocal ensemble corpus},
  author = {Nakamura, Tomohiko and Takamichi, Shinnosuke and Tanji, Naoko and Fukayama, Satoru and Saruwatari, Hiroshi},
  year = {2023}
}

Mid-attribute Speaker Generation using Optimal-Transport-based Interpolation of Gaussian Mixture Models

Aya Watanabea , Shinnosuke Takamichi , Yuki Saito , Detai Xin , and Hiroshi Saruwatari

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Mar 2023

arXiv Bib Code Website

@inproceedings{watanabe23icassp_mid-attribute,
  abbr_publisher = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title = {Mid-attribute Speaker Generation using Optimal-Transport-based Interpolation of Gaussian Mixture Models},
  author = {Watanabea, Aya and Takamichi, Shinnosuke and Saito, Yuki and Xin, Detai and Saruwatari, Hiroshi},
  year = {2023}
}

Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images

Hien Ohnaka , Shinnosuke Takamichi , Keisuke Imoto , Yuki Okamoto , Kazuki Fujii , and Hiroshi Saruwatari

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Mar 2023

@inproceedings{ohnaka23icassp_visual-onoma-to-wave,
  abbr_publisher = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title = {Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images},
  author = {Ohnaka, Hien and Takamichi, Shinnosuke and Imoto, Keisuke and Okamoto, Yuki and Fujii, Kazuki and Saruwatari, Hiroshi},
  year = {2023}
}