介入可能音分析

人間の介入可能性を考慮した音響情景分析のための深層分析合成基盤の開拓とその深化

Title / タイトル

人間の介入可能性を考慮した音響情景分析のための深層分析合成基盤の開拓とその深化(2023-2026, 科研費基盤B 分担)

Projects / プロジェクト

This research aims to build an acoustic scene analysis infrastructure that operates with high performance while taking into account the possibility of human intervention. Specifically, we aim to create a methodology for sound source separation with high separation performance and the possibility of human intervention, deep analysis synthesis, by combining deep acoustic synthesis (a technology combining synthesizers established in signal processing and deep learning) and deep sound source separation (sound source separation using deep learning) technologies. By applying this technology, it is possible to realize an acoustic scene analysis method that can be adapted to various situations that include elements that are difficult to foresee in advance due to human intervention, rather than aiming for practical use by devising learning only. This should make it possible to actively introduce human a priori and expert knowledge.

本研究は，人間が介入可能性を考慮しつつ高性能に動作する音響情景分析基盤の構築を目指す．具体的には，深層音響合成（信号処理で確立されたシンセサイザーと深層学習を組み合わせた技術）と，深層音源分離（深層学習を用いた音源分離）技術を融合し，高い分離性能と人間の介入可能性をもつ音源分離の方法論，深層分析合成を創出することを目指す．この技術を応用することで，学習のみを工夫して実用を目指すのではなく，人が介入することで事前に予見し難い要素を含む様々な現場に適応できる音響情景分析手法が実現しうる．これにより，人間の先験的・専門的知識を能動的に導入することが可能となるはずである．

Member / メンバ

Tomohiko Nakamura / 中村友彦（産総研，代表）
Shinnosuke Takamichi / 高道慎之介（慶應義塾大学）
Kohei Yatabe / 矢田部浩平（東京農工大学）

Acknowledgement / 謝辞

JSPS KAKENHI 23K28108 (English)
JSPS 科研費 23K28108 (日本語)

Website / ウェブサイト

https://kaken.nii.ac.jp/ja/grant/KAKENHI-PROJECT-23K28108/

Reference / 発表文献

(Hyodo et al., 2024)
(Suda et al., 2024)
(Take et al., 2024)
(Seki et al., 2024)
(伯寒武 et al., 2024)
(慎之介高道, 2024)
(Li et al., 2024)
(亞椰渡邊 et al., 2024)
(高明佐伯 et al., 2024)
(Watanabe et al., 2023)
(Ueda et al., 2023)
(亞椰渡邊 et al., 2023)
(紘希前田 et al., 2023)
(missing reference)
(Saeki et al., 2024)
(Park et al., 2023)
(裕太松永 et al., 2024)
(伯寒武 et al., 2024)
(弘明兵藤 et al., 2024)
(凜佳信川 et al., 2025)

References

2025

変分オートエンコーダによるドラムからボーカルパーカッションへの楽器音変換と評価

信川凜佳 , 北村優輝士 , 中村友彦 , 高道慎之介 , and 猿渡洋

In 情報処理学会音楽情報科学研究会 , Mar 2025

Bib PDF

@inproceedings{nobukawa25mus_drum-to-vocal,
  abbr_publisher = {情報処理学会 音楽情報科学研究会},
  booktitle = {情報処理学会 音楽情報科学研究会},
  title = {変分オートエンコーダによるドラムからボーカルパーカッションへの楽器音変換と評価},
  author = {凜佳, 信川 and 優輝士, 北村 and 友彦, 中村 and 慎之介, 高道 and 洋, 猿渡},
  year = {2025},
}

2024

DNN-based ensemble singing voice synthesis with interactions between singers

Hiroaki Hyodo , Shinnosuke Takamichi , Tomohiro Nakamura , Junya Koguchi , and Hiroshi Saruwatari

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) , Mar 2024

arXiv Bib

@inproceedings{hyodo24slt_chorus,
  abbr_publisher = {Proceedings of IEEE Spoken Language Technology Workshop (SLT)},
  booktitle = {Proceedings of IEEE Spoken Language Technology Workshop (SLT)},
  title = {DNN-based ensemble singing voice synthesis with interactions between singers},
  author = {Hyodo, Hiroaki and Takamichi, Shinnosuke and Nakamura, Tomohiro and Koguchi, Junya and Saruwatari, Hiroshi},
  year = {2024}
}

音声分析

Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data

Hitoshi Suda , Aya Watanabe , and Shinnosuke Takamichi

In Proceedings of Interspeech , Mar 2024

arXiv Bib Website

@inproceedings{suda24interspeech_sukikirai,
  abbr_publisher = {Proceedings of Interspeech},
  booktitle = {Proceedings of Interspeech},
  title = {Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data},
  author = {Suda, Hitoshi and Watanabe, Aya and Takamichi, Shinnosuke},
  year = {2024},
  memo = {This work was supported by JSPS KAKENHI Grant Number 23K20017, 21H04900, 22H03639, and 23H03418, and JST FOREST JPMJFR226V. This paper is based on results obtained from a project, JPNP20006, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).}
}

楽音合成

Audio Effect Chain Estimation and Dry Signal Recovery from Multi-Effect-Processed Musical Signals

Osamu Take , Kento Watanabe , Takayuki Nakatsuka , Tian Cheng , Tomoyasu Nakano , Masataka Goto , Shinnosuke Takamichi , and Hiroshi Saruwatari

In Proceedings of International Conference on Digital Audio Effects (DAFx) , Mar 2024

Bib PDF

@inproceedings{take24dafx_effect-chain,
  abbr_publisher = {Proceedings of International Conference on Digital Audio Effects (DAFx)},
  booktitle = {Proceedings of International Conference on Digital Audio Effects (DAFx)},
  title = {Audio Effect Chain Estimation and Dry Signal Recovery from Multi-Effect-Processed Musical Signals},
  author = {Take, Osamu and Watanabe, Kento and Nakatsuka, Takayuki and Cheng, Tian and Nakano, Tomoyasu and Goto, Masataka and Takamichi, Shinnosuke and Saruwatari, Hiroshi},
  memo = {This work is supported by JSPS KAKENHI 21H04900, 22H03639, and 23H03418, JST FOREST JPMJFR226V, and Moonshot R&D Grant Number JPMJPS2011.},
  year = {2024}
}

音声変換

Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals

Kentaro Seki , Shinnosuke Takamichi , Norihiro Takamune , Yuki Saito , Kanami Imamura , and Hiroshi Saruwatari

In Proceedings of Interspeech , Mar 2024

arXiv Bib Code

@inproceedings{seki24interspeech_spatial-voice-conversion,
  abbr_publisher = {Proceedings of Interspeech},
  booktitle = {Proceedings of Interspeech},
  title = {Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals},
  author = {Seki, Kentaro and Takamichi, Shinnosuke and Takamune, Norihiro and Saito, Yuki and Imamura, Kanami and Saruwatari, Hiroshi},
  year = {2024},
  memo = {This work is supported by Research Grant S of the Tateishi Science and Technology Foundation.}
}

コーパス

音環境に適応するテキスト音声合成のための一人称視点コーパス構築

武伯寒 , 高道慎之介 , 関健太郎 , 坂東宜昭 , and 猿渡洋

In 情報処理学会音声言語処理研究会 , Mar 2024

Bib PDF

@inproceedings{take24slp_1st-person-tts,
  abbr_publisher = {情報処理学会 音声言語処理研究会},
  booktitle = {情報処理学会 音声言語処理研究会},
  title = {音環境に適応するテキスト音声合成のための一人称視点コーパス構築},
  author = {伯寒, 武 and 慎之介, 高道 and 健太郎, 関 and 宜昭, 坂東 and 洋, 猿渡},
  year = {2024},
  memo = {本研究の一部は，科研費 22H03639，23K18474， JST 創発的研究支援事業 JP23KJ0828，及び JST ムーンショット型研究開発事業 JPMJMS2011 の助成を受け実施 しました．また, 原稿の作成に際して, 渡邊 亞椰さんには 図の作成でご協力頂きました. この場を借りて感謝申し上げます}
}

コーパス

インターネット時代の音声コーパスの作成

高道慎之介

日本音響学会誌, Mar 2024

(Invited article / 招待記事)

Bib PDF

@article{takamichi24asj_invited-article-dark-data,
  title = {インターネット時代の音声コーパスの作成},
  author = {慎之介, 高道},
  year = {2024},
  journal = {日本音響学会誌},
  note = {(Invited article / 招待記事)},
  memo = {本研究は科研費 21H04900，22H03639，23H03418，23K18474，JST創発的研究支援事業 JP23KJ0828，ムーンショット JPMJPS2011 の助成を受けた．また，本稿の執筆にあたり東京大学 大学院情報理工学系研究科 修士課程 関 健太郎氏からの助言を受けた．}
}

コーパス

YODAS：YouTube 動画から構築される多言語大規模音声データセット

Xinjian Li , 高道慎之介 , 佐伯高明 , William Chen , 塩田さやか , and 渡部晋治

In 日本音響学会春季研究発表会 , Mar 2024

Bib PDF Slides

@inproceedings{li24asjs_yodas,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {{YODAS：YouTube} 動画から構築される多言語大規模音声データセット},
  author = {Li, Xinjian and 慎之介, 高道 and 高明, 佐伯 and Chen, William and 塩田さやか and 晋治, 渡部},
  year = {2024},
  memo = {本研究は，アメリカ国立科学財団資金番号 #2138259, #2138286, #2138307, #2137603, #2138296 により支援さ れた，PSC Bridges2 と NCSA Delta via ACCESS allocation CIS210014 を使用した．また本研究は科研費 21H04900, 22H03639，23H03418，JST 創発的研究支援事業 JP23KJ0828， ムーンショット JPMJPS2011 の助成を受けた．}
}

音声合成

対照学習モデルによる音声-声質表現文の埋め込み表現獲得

渡邊亞椰 , 高道慎之介 , 齋藤佑樹 , 中田亘 , 辛徳泰 , and 猿渡洋

In 日本音響学会春季研究発表会 , Mar 2024

Bib PDF

@inproceedings{watanabe24asjs_coconut-embedding,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {対照学習モデルによる音声-声質表現文の埋め込み表現獲得},
  author = {亞椰, 渡邊 and 慎之介, 高道 and 佑樹, 齋藤 and 亘, 中田 and 徳泰, 辛 and 洋, 猿渡},
  year = {2024},
  memo = {本研究は科研費 21H04900, 22H03639，23H03418，JST 創発的研究支援事業 JP23KJ0828，ムーンショット JPMJPS2011 の助成を受けたものです.}
}

音声評価

テキスト生成の自動評価尺度に基づく音声生成の自動評価

佐伯高明 , マイティソウミ , 高道慎之介 , 渡部晋治 , and 猿渡洋

In 電子情報通信学会音声研究会 , Mar 2024

Bib PDF

@inproceedings{saeki24sp_speechevaluation,
  abbr_publisher = {電子情報通信学会 音声研究会},
  booktitle = {電子情報通信学会 音声研究会},
  title = {テキスト生成の自動評価尺度に基づく音声生成の自動評価},
  author = {高明, 佐伯 and ソウミ, マイティ and 慎之介, 高道 and 晋治, 渡部 and 洋, 猿渡},
  year = {2024},
  memo = {JSPS 科 研 費 23H03418，23K18474，22H03639，21H05054，22KJ0838 ムーンショット研究開発費 JPMJPS2011，および JST FOREST JPMJFR226V によって支援された．}
}

音声復元

SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources

Takaaki Saeki , Shinnosuke Takamichi , Tomohiko Nakamura , Naoko Tanji , and Hiroshi Saruwatari

IEEE Access, Mar 2024

arXiv Bib HTML Code Website

@article{saeki24access_selfremaster,
  title = {{SelfRemaster}: Self-Supervised Speech Restoration for Historical Audio Resources},
  author = {Saeki, Takaaki and Takamichi, Shinnosuke and Nakamura, Tomohiko and Tanji, Naoko and Saruwatari, Hiroshi},
  year = {2024},
  journal = {IEEE Access},
}

音声認識

Cocktail Machine Speech Chain: 重複あり音声を用いた音声認識・音声合成モデルの統一的学習

松永裕太 , 高道慎之介 , 上乃聖 , and 猿渡洋

In 日本音響学会春季研究発表会 , Mar 2024

Bib PDF

@inproceedings{matsunaga24asjs_cocktail-speech-chain,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {Cocktail Machine Speech Chain: 重複あり音声を用いた音声認識・音声合成モデルの統一的学習 },
  author = {裕太, 松永 and 慎之介, 高道 and 聖, 上乃 and 洋, 猿渡},
  year = {2024},
  memo = {本研究は，JST 次世代研究者挑戦的研究プログラム JPMJSP2108，ムーンショット JPMJPS2011，JST 創発的研究支 援事業 JP23KJ0828，科研費 21H05054, 22H03639，23H03418 の支援と，東京大学の齋藤佑樹博士, 佐伯高明氏の協力を受け実施 したものです.}
}

楽音合成

複数のオーディオエフェクトが適用された楽音に対するエフェクトチェイン推定と原音復元

武伯寒 , 渡邉研斗 , 中塚貴之 , Tian Cheng , 中野倫靖 , 後藤真孝 , 高道慎之介 , and 猿渡洋

In 日本音響学会春季研究発表会 , Mar 2024

Bib PDF

@inproceedings{take24asjs_audio-effect,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {複数のオーディオエフェクトが適用された楽音に対するエフェクトチェイン推定と原音復元},
  author = {伯寒, 武 and 研斗, 渡邉 and 貴之, 中塚 and Cheng, Tian and 倫靖, 中野 and 真孝, 後藤 and 慎之介, 高道 and 洋, 猿渡},
  year = {2024},
  memo = {本研究は科研費 21H04900, 22H03639，23H03418， JST 創発的研究支援事業 JP23KJ0828，ムーンショット JPMJPS2011 の助成を受けたものです}
}

二重唱の歌い出しタイミングに対する同時性知覚の刺激閾調査

兵藤弘明 , 高道慎之介 , and 猿渡洋

In 日本音響学会秋季研究発表会 , Mar 2024

Bib PDF

@inproceedings{hyodo24asja_duet-timing,
  abbr_publisher = {日本音響学会秋季研究発表会},
  booktitle = {日本音響学会秋季研究発表会},
  title = {二重唱の歌い出しタイミングに対する同時性知覚の刺激閾調査},
  author = {弘明, 兵藤 and 慎之介, 高道 and 洋, 猿渡},
  year = {2024}
}

2023

音声合成

Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control

Aya Watanabe , Shinnosuke Takamichi , Yuki Saito , Wataru Nakata , Detai Xin , and Hiroshi Saruwatari

In IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU) , Mar 2023

arXiv Bib Website

@inproceedings{watanabe23asru_coconut-corpus,
  abbr_publisher = {IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU)},
  booktitle = {IEEE Automatic Speech Recogiton and Understanding Workshop (ASRU)},
  title = {{Coco-Nut}: Corpus of {J}apanese Utterance and Voice Characteristics Description for Prompt-based Control},
  author = {Watanabe, Aya and Takamichi, Shinnosuke and Saito, Yuki and Nakata, Wataru and Xin, Detai and Saruwatari, Hiroshi},
  year = {2023}
}

音声合成

HumanDiffusion: diffusion model using perceptual gradients

Yota Ueda , Shinnosuke Takamichi , Yuki Saito , Norihiro Takamune , and Hiroshi Saruwatari

In Proceedings of Interspeech , Mar 2023

arXiv Bib

@inproceedings{ueda23interspeech_humandiffusion,
  abbr_publisher = {Proceedings of Interspeech},
  booktitle = {Proceedings of Interspeech},
  title = {HumanDiffusion: diffusion model using perceptual gradients},
  author = {Ueda, Yota and Takamichi, Shinnosuke and Saito, Yuki and Takamune, Norihiro and Saruwatari, Hiroshi},
  year = {2023}
}

音声合成

Coco-Nut: 自由記述文による声質制御に向けた多話者音声・声質自由記述ペアデータセット

渡邊亞椰 , 高道慎之介 , 齋藤佑樹 , 辛徳泰 , and 猿渡洋

In 日本音響学会秋季研究発表会 , Mar 2023

Bib PDF Slides Website

@inproceedings{watanabe23asja_coconut,
  abbr_publisher = {日本音響学会秋季研究発表会},
  booktitle = {日本音響学会秋季研究発表会},
  title = {Coco-Nut: 自由記述文による声質制御に向けた多話者音声・声質自由記述ペアデータセット},
  author = {亞椰, 渡邊 and 慎之介, 高道 and 佑樹, 齋藤 and 徳泰, 辛 and 洋, 猿渡},
  year = {2023}
}

音声合成

深層学習で獲得される音声シンボルは自然言語シンボルと同様に Zipf 則に従うか？

前田紘希 , 高道慎之介 , 朴浚鎔 , and 猿渡洋

In 日本音響学会秋季研究発表会 , Mar 2023

Bib PDF Slides

@inproceedings{maeda23asja_zipf,
  abbr_publisher = {日本音響学会秋季研究発表会},
  booktitle = {日本音響学会秋季研究発表会},
  title = {深層学習で獲得される音声シンボルは自然言語シンボルと同様に {Zipf} 則に従うか？},
  author = {紘希, 前田 and 慎之介, 高道 and 浚鎔, 朴 and 洋, 猿渡},
  year = {2023}
}

音声分析

How Generative Spoken Language Model Encodes Noisy Speech: Investigation from Phonetics to Syntactics

Joonyong Park , Shinnosuke Takamichi , Tomohiko Nakamura , Kentaro Seki , Detai Xin , and Hiroshi Saruwatari

In Proceedings of Interspeech , Mar 2023

arXiv Bib

@inproceedings{park23interspeech_gslm,
  abbr_publisher = {Proceedings of Interspeech},
  booktitle = {Proceedings of Interspeech},
  title = {How Generative Spoken Language Model Encodes Noisy Speech: Investigation from Phonetics to Syntactics},
  author = {Park, Joonyong and Takamichi, Shinnosuke and Nakamura, Tomohiko and Seki, Kentaro and Xin, Detai and Saruwatari, Hiroshi},
  year = {2023}
}