アバター共生社会

誰もが自在に活躍できるアバター共生社会の実現

Title / タイトル

誰もが自在に活躍できるアバター共生社会の実現(2020-2025, 内閣府ムーンショット型研究開発制度研究参加者)

Projects / プロジェクト

(TBA)

Member / メンバ

(TBA)

Acknowledgement / 謝辞

Moonshot R&D Grant Number JPMJPS2011 (English)
JST ムーンショット型研究開発事業 JPMJMS2011 (日本語)

Website / ウェブサイト

https://avatar-ss.org/

Reference / 発表文献

(Take et al., 2024)
(高明佐伯 & 慎之介高道, 2024)
(Saeki et al., 2024)
(高明佐伯 et al., 2024)
(慎之介高道, 2024)
(悠希岡本 et al., 2024)
(Li et al., 2024)
(亘中田 et al., 2024)
(Seki et al., 2024)
(伯寒武 et al., 2024)
(Takamichi et al., 2024)
(Seki et al., 2024)
(裕太松永 et al., 2024)
(Watanabe et al., 2023)
(亞椰渡邊 et al., 2023)
(紘希前田 et al., 2023)
(Luo et al., 2024)
(健太郎関 et al., 2023)
(Watanabea et al., 2023)
(Nakano et al., 2023)
(Park et al., 2023)
(Seki et al., 2023)
(Matsunaga et al., 2023)
(伯寒武 et al., 2024)
(Take et al., 2024)
(Ishikawa et al., 2024)
(Nakata et al., 2024)
(亘中田 et al., 2024)
(悠人石川 et al., 2024)
(健太郎関 et al., 2025)
(伯寒武 et al., 2025)
(浚鎔朴 et al., 2026)
(浚鎔朴 et al., 2025)
(Seki et al., 2025)
(Seki et al., 2025)
(Seki et al., 2025)
(Kishi et al., 2026)
(慎之介高道 et al., 2025)
(missing reference)
(健太郎関 et al., 2026)
(Sato et al., 2025)
(Park et al., 2025)
(健太郎関 et al., 2026)
(友紀佐藤 et al., 2025)
(秀岸 et al., 2026)

References

2026

ニューラルオーディオコーデックにおける雑音頑健性分析～ Zipf則・Heaps則に基づく言語統計構造と劣化音声の関係～

朴浚鎔 , 高道慎之介 , David M. Chan , 神藤駿介 , 齋藤佑樹 , and 猿渡洋

In 電子情報通信学会音声研究会 , Mar 2026

Bib PDF

@inproceedings{park26speasip_neural-codec-robustness,
  abbr_publisher = {電子情報通信学会 音声研究会},
  booktitle = {電子情報通信学会 音声研究会},
  title = {ニューラルオーディオコーデックにおける雑音頑健性分析 ～ Zipf則・Heaps則に基づく言語統計構造と劣化音声の関係 ～},
  author = {浚鎔, 朴 and 慎之介, 高道 and Chan, David M. and 駿介, 神藤 and 佑樹, 齋藤 and 洋, 猿渡},
  year = {2026},
  month = mar,
}

AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio embedding Sequences

Minoru Kishi , Ryosuke Sakai , Shinnosuke Takamichi , Yusuke Kanamori , and Yuki Okamoto

In Proceedings of Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI) , Jan 2026

Bib PDF

@inproceedings{kishi26aaai_audiobertscore,
  abbr_publisher = {Proceedings of Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI)},
  booktitle = {Proceedings of Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI)},
  title = {AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio embedding Sequences},
  author = {Kishi, Minoru and Sakai, Ryosuke and Takamichi, Shinnosuke and Kanamori, Yusuke and Okamoto, Yuki},
  year = {2026},
  month = jan,
}

Spatial Audio Captioning: 複数音源状況下における空間情報を伴う説明文の生成とその評価

関健太郎 , 岡本悠希 , 山岡洸瑛 , 齋藤佑樹 , 高道慎之介 , and 猿渡洋

In 日本音響学会春季研究発表会 , Mar 2026

Bib PDF

@inproceedings{seki26asjs_spatial-captioning,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {Spatial Audio Captioning: 複数音源状況下における空間情報を伴う説明文の生成とその評価},
  author = {健太郎, 関 and 悠希, 岡本 and 洸瑛, 山岡 and 佑樹, 齋藤 and 慎之介, 高道 and 洋, 猿渡},
  year = {2026},
  month = mar,
}

TTSOps 2.0: テキスト音声合成におけるデータ収集・前処理・学習プロセスの統合的最適化

関健太郎 , 齋藤佑樹 , 高道慎之介 , 佐伯高明 , and 猿渡洋

In 日本音響学会春季研究発表会 , Mar 2026

Bib PDF

@inproceedings{seki26asjs_ttsops2,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {TTSOps 2.0: テキスト音声合成におけるデータ収集・前処理・学習プロセスの統合的最適化},
  author = {健太郎, 関 and 佑樹, 齋藤 and 慎之介, 高道 and 高明, 佐伯 and 洋, 猿渡},
  year = {2026},
  month = mar,
}

既存データセットとの意図しない重複を避ける環境音評価データセットの半自動構築法

岸秀 , 高道慎之介 , 滝沢力 , 金森勇介 , 砺波紀之 , 永瀬亮太郎 , 井本桂右 , and 岡本悠希

In 電子情報通信学会応用音響研究会 , Mar 2026

Bib PDF

@inproceedings{kishi26speasip_environmental-sound-dataset,
  abbr_publisher = {電子情報通信学会 応用音響研究会},
  booktitle = {電子情報通信学会 応用音響研究会},
  title = {既存データセットとの意図しない重複を避ける環境音評価データセットの半自動構築法},
  author = {秀, 岸 and 慎之介, 高道 and 力, 滝沢 and 勇介, 金森 and 紀之, 砺波 and 亮太郎, 永瀬 and 桂右, 井本 and 悠希, 岡本},
  year = {2026},
  month = mar,
}

2025

データ単位前処理自動選択による音声合成コーパスのデータクレンジング

関健太郎 , 高道慎之介 , 佐伯高明 , and 猿渡洋

In 日本音響学会春季研究発表会 , Mar 2025

Bib PDF

@inproceedings{seki25asjs_tts-ops,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {データ単位前処理自動選択による音声合成コーパスのデータクレンジング},
  author = {健太郎, 関 and 慎之介, 高道 and 高明, 佐伯 and 洋, 猿渡},
  year = {2025},
}

コーパス

音環境に適応する音声合成能力を搭載した音声対話システムの構築と実証実験に基づく検討

武伯寒 , 高道慎之介 , 関健太郎 , and 猿渡洋

In 情報処理学会音声言語処理研究会 , Mar 2025

Bib PDF

@inproceedings{take25speasip_egotts-dialogue,
  abbr_publisher = {情報処理学会 音声言語処理研究会},
  booktitle = {情報処理学会 音声言語処理研究会},
  title = {音環境に適応する音声合成能力を搭載した音声対話システムの構築と実証実験に基づく検討},
  author = {伯寒, 武 and 慎之介, 高道 and 健太郎, 関 and 洋, 猿渡},
  year = {2025},
}

音声トークンの言語に関する分析

朴浚鎔 , 高道慎之介 , David M. Chan , 神藤駿介 , 齋藤佑樹 , and 猿渡洋

In 情報処理学会音声言語処理研究会 , Jun 2025

Bib PDF

@inproceedings{park25speasip_language-token,
  abbr_publisher = {情報処理学会 音声言語処理研究会},
  booktitle = {情報処理学会 音声言語処理研究会},
  title = {音声トークンの言語に関する分析},
  author = {浚鎔, 朴 and 慎之介, 高道 and Chan, David M. and 駿介, 神藤 and 佑樹, 齋藤 and 洋, 猿渡},
  year = {2025},
  month = jun
}

Active Learning for Text-to-Speech Synthesis with Informative Sample Collection

Kentaro Seki , Shinnosuke Takamichi , Takaaki Saeki , and Hiroshi Saruwatari

In Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) , Oct 2025

Bib PDF

@inproceedings{seki25apsipa_active-tts,
  abbr_publisher = {Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
  booktitle = {Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
  title = {Active Learning for Text-to-Speech Synthesis with Informative Sample Collection},
  author = {Seki, Kentaro and Takamichi, Shinnosuke and Saeki, Takaaki and Saruwatari, Hiroshi},
  year = {2025},
  month = oct,
}

Toward Data-Efficient Speech Synthesis: Active Learning–Based Corpus Construction for Multi-Speaker Text-to-Speech Synthesis

Kentaro Seki , Yuki Saito , Shinnosuke Takamichi , Takaaki Saeki , and Hiroshi Saruwatari

IEEE Access, Oct 2025

Bib PDF

@article{seki25ieee-access_data-efficient-tts,
  title = {Toward Data-Efficient Speech Synthesis: Active Learning–Based Corpus Construction for Multi-Speaker Text-to-Speech Synthesis},
  author = {Seki, Kentaro and Saito, Yuki and Takamichi, Shinnosuke and Saeki, Takaaki and Saruwatari, Hiroshi},
  year = {2025},
  journal = {IEEE Access},
}

TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data

Kentaro Seki , Shinnosuke Takamichi , Takaaki Saeki , and Hiroshi Saruwatari

In IEEE Transactions on Audio, Speech, and Language Processing , Nov 2025

Bib PDF

@inproceedings{seki25taslp_ttsops,
  abbr_publisher = {IEEE Transactions on Audio, Speech, and Language Processing},
  booktitle = {IEEE Transactions on Audio, Speech, and Language Processing},
  title = {TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data},
  author = {Seki, Kentaro and Takamichi, Shinnosuke and Saeki, Takaaki and Saruwatari, Hiroshi},
  year = {2025},
  month = nov,
}

音声・音響・音楽を扱うオープン基盤モデルの構築に向けたデータセット策定

高道慎之介 , 和田仰 , 小川諒 , 山岡洸瑛 , 中田亘 , 淺井航平 , 関健太郎 , 岡本悠希 , 齋藤佑樹 , 小川哲司 , and 3 more authors

In 言語処理学会全国大会 , Nov 2025

Bib PDF

@inproceedings{takamichi25nlp_foundation,
  abbr_publisher = {言語処理学会 全国大会},
  booktitle = {言語処理学会 全国大会},
  title = {音声・音響・音楽を扱うオープン基盤モデルの構築に向けたデータセット策定},
  author = {慎之介, 高道 and 仰, 和田 and 諒, 小川 and 洸瑛, 山岡 and 亘, 中田 and 航平, 淺井 and 健太郎, 関 and 悠希, 岡本 and 佑樹, 齋藤 and 哲司, 小川 and 洋, 猿渡 and 友彦, 中村 and 覚, 深山},
  year = {2025},
}

Constructing an In-the-Wild Spoken Dialogue Dataset Based on YouTube Dialogue Videos

Yuki Sato , Sanae Yamashita , Shinnosuke Takamichi , and Ryuichiro Higashinaka

In Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) , Oct 2025

Bib PDF

@inproceedings{sato25apsipa_youtube-dialogue,
  abbr_pabulisher = {Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
  booktitle = {Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
  title = {Constructing an In-the-Wild Spoken Dialogue Dataset Based on YouTube Dialogue Videos},
  author = {Sato, Yuki and Yamashita, Sanae and Takamichi, Shinnosuke and Higashinaka, Ryuichiro},
  year = {2025},
  month = oct,
}

Analysing the Language of Neural Audio Codecs

Joonyong Park , Shinnosuke Takamichi , David M. Chan , Shunsuke Kando , Yuki Saito , and Hiroshi Saruwatari

In IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , Dec 2025

Bib PDF

@inproceedings{park25asru_analysis-neural-audio-codec,
  abbr_publisher = {IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  booktitle = {IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  title = {Analysing the Language of Neural Audio Codecs},
  author = {Park, Joonyong and Takamichi, Shinnosuke and Chan, David M. and Kando, Shunsuke and Saito, Yuki and Saruwatari, Hiroshi},
  year = {2025},
  month = dec,
}

YouTube上の対話動画に基づく音声対話データセットの構築

佐藤友紀 , 高道慎之介 , and 東中竜一郎

In 日本音響学会春季研究発表会 , Dec 2025

Bib PDF

@inproceedings{sato25asjs_youtube-dialogue,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {YouTube上の対話動画に基づく音声対話データセットの構築},
  author = {友紀, 佐藤 and 慎之介, 高道 and 竜一郎, 東中},
  year = {2025},
}

2024

コーパス

SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis

Osamu Take , Shinnosuke Takamichi , Kentaro Seki , Yoshiaki Bando , and Hiroshi Saruwatari

In Proceedings of Interspeech , Dec 2024

Bib

@inproceedings{take24interspeech_saslaw,
  abbr_publisher = {Proceedings of Interspeech},
  booktitle = {Proceedings of Interspeech},
  title = {SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis},
  author = {Take, Osamu and Takamichi, Shinnosuke and Seki, Kentaro and Bando, Yoshiaki and Saruwatari, Hiroshi},
  year = {2024},
  memo = {Part of this work was supported by JSPS KAKENHI Grant Number 23H03418 and 22H03639, Moonshot R&D Grant Number JPMJPS2011, and JST FOREST JPMJFR226V. The authors also thank Aya Watanabe for her support in designing the figures for this paper.}
}

コーパス

最先端の予測性能を持つ合成音声品質の自動評価システム UTMOS について

佐伯高明 , and 高道慎之介

日本音響学会誌, Dec 2024

(Invited article / 招待記事)

Bib PDF

@article{saeki24asj-kaisetsu_utmos,
  title = {最先端の予測性能を持つ合成音声品質の自動評価システム UTMOS について},
  author = {高明, 佐伯 and 慎之介, 高道},
  year = {2024},
  journal = {日本音響学会誌},
  note = {(Invited article / 招待記事)},
  memo = {本研究は科研費 21H04900，22H03639，23H03418，23K18474，JST創発的研究支援事業 JP23KJ0828，ムーンショット JPMJPS2011 の助成を受けた．本解説記事の執筆に際し，東京大学大学院の関健太郎氏の助言を受けた．}
}

音声合成

Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis

Takaaki Saeki , Soumi Maiti , Xinjian Li , Shinji Watanabe , Shinnosuke Takamichi , and Hiroshi Saruwatari

IEEE Transactions on Audio, Speech, and Language Processing, Dec 2024

Bib Code

@article{saeki24taslp_text-inductive-tts,
  title = {Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis},
  author = {Saeki, Takaaki and Maiti, Soumi and Li, Xinjian and Watanabe, Shinji and Takamichi, Shinnosuke and Saruwatari, Hiroshi},
  year = {2024},
  journal = {IEEE Transactions on Audio, Speech, and Language Processing}
}

音声評価

テキスト生成の自動評価尺度に基づく音声生成の自動評価

佐伯高明 , マイティソウミ , 高道慎之介 , 渡部晋治 , and 猿渡洋

In 電子情報通信学会音声研究会 , Dec 2024

Bib PDF

@inproceedings{saeki24sp_speechevaluation,
  abbr_publisher = {電子情報通信学会 音声研究会},
  booktitle = {電子情報通信学会 音声研究会},
  title = {テキスト生成の自動評価尺度に基づく音声生成の自動評価},
  author = {高明, 佐伯 and ソウミ, マイティ and 慎之介, 高道 and 晋治, 渡部 and 洋, 猿渡},
  year = {2024},
  memo = {JSPS 科 研 費 23H03418，23K18474，22H03639，21H05054，22KJ0838 ムーンショット研究開発費 JPMJPS2011，および JST FOREST JPMJFR226V によって支援された．}
}

コーパス

インターネット時代の音声コーパスの作成

高道慎之介

日本音響学会誌, Dec 2024

(Invited article / 招待記事)

Bib PDF

@article{takamichi24asj_invited-article-dark-data,
  title = {インターネット時代の音声コーパスの作成},
  author = {慎之介, 高道},
  year = {2024},
  journal = {日本音響学会誌},
  note = {(Invited article / 招待記事)},
  memo = {本研究は科研費 21H04900，22H03639，23H03418，23K18474，JST創発的研究支援事業 JP23KJ0828，ムーンショット JPMJPS2011 の助成を受けた．また，本稿の執筆にあたり東京大学 大学院情報理工学系研究科 修士課程 関 健太郎氏からの助言を受けた．}
}

コーパス

環境音に対する日本語自由記述文コーパスとベンチマーク分析

岡本悠希 , 高道慎之介 , 森松亜依 , 渡邊亞椰 , 井本桂右 , and 山下洋一

In 言語処理学会全国大会 , Dec 2024

Bib PDF

@inproceedings{okamoto24nlp_multi-lingual-audiocaps,
  abbr_publisher = {言語処理学会 全国大会},
  booktitle = {言語処理学会 全国大会},
  title = {環境音に対する日本語自由記述文コーパスとベンチマーク分析},
  author = {悠希, 岡本 and 慎之介, 高道 and 亜依, 森松 and 亞椰, 渡邊 and 桂右, 井本 and 洋一, 山下},
  year = {2024},
  memo = {本研究は，ムーンショット JPMJPS2011，JST 創発的研 究支援事業 JP23KJ0828，科研費 21H05054，21H04900，22H0363， 23H03418，23K16908 の支援を受け実施した.}
}

コーパス

YODAS：YouTube 動画から構築される多言語大規模音声データセット

Xinjian Li , 高道慎之介 , 佐伯高明 , William Chen , 塩田さやか , and 渡部晋治

In 日本音響学会春季研究発表会 , Dec 2024

Bib PDF Slides

@inproceedings{li24asjs_yodas,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {{YODAS：YouTube} 動画から構築される多言語大規模音声データセット},
  author = {Li, Xinjian and 慎之介, 高道 and 高明, 佐伯 and Chen, William and 塩田さやか and 晋治, 渡部},
  year = {2024},
  memo = {本研究は，アメリカ国立科学財団資金番号 #2138259, #2138286, #2138307, #2137603, #2138296 により支援さ れた，PSC Bridges2 と NCSA Delta via ACCESS allocation CIS210014 を使用した．また本研究は科研費 21H04900, 22H03639，23H03418，JST 創発的研究支援事業 JP23KJ0828， ムーンショット JPMJPS2011 の助成を受けた．}
}

NecoBERT：音声合成のために事前学習された自己教師あり学習モデル

中田亘 , 佐伯高明 , 齋藤佑樹 , 高道慎之介 , and 猿渡洋

In 日本音響学会春季研究発表会 , Mar 2024

PDF

音声合成

Diversity-based core-set selection for text-to-speech with linguistic and acoustic features

Kentaro Seki , Shinnosuke Takamichi , Takaaki Saeki , and Hiroshi Saruwatari

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Mar 2024

arXiv Bib

@inproceedings{seki24icassp_core-set-selection,
  abbr_publisher = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title = {Diversity-based core-set selection for text-to-speech with linguistic and acoustic features},
  author = {Seki, Kentaro and Takamichi, Shinnosuke and Saeki, Takaaki and Saruwatari, Hiroshi},
  year = {2024}
}

コーパス

音環境に適応するテキスト音声合成のための一人称視点コーパス構築

武伯寒 , 高道慎之介 , 関健太郎 , 坂東宜昭 , and 猿渡洋

In 情報処理学会音声言語処理研究会 , Mar 2024

Bib PDF

@inproceedings{take24slp_1st-person-tts,
  abbr_publisher = {情報処理学会 音声言語処理研究会},
  booktitle = {情報処理学会 音声言語処理研究会},
  title = {音環境に適応するテキスト音声合成のための一人称視点コーパス構築},
  author = {伯寒, 武 and 慎之介, 高道 and 健太郎, 関 and 宜昭, 坂東 and 洋, 猿渡},
  year = {2024},
  memo = {本研究の一部は，科研費 22H03639，23K18474， JST 創発的研究支援事業 JP23KJ0828，及び JST ムーンショット型研究開発事業 JPMJMS2011 の助成を受け実施 しました．また, 原稿の作成に際して, 渡邊 亞椰さんには 図の作成でご協力頂きました. この場を借りて感謝申し上げます}
}

音声分析

Do learned speech symbols follow Zipf’s law?

Shinnosuke Takamichi , Hiroki Maeda , Joonyong Park , Daisuke Saito , and Hiroshi Saruwatari

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Mar 2024

arXiv Bib

@inproceedings{meada24icassp_zipf-law,
  abbr_publisher = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title = {Do learned speech symbols follow {Z}ipf's law?},
  author = {Takamichi, Shinnosuke and Maeda, Hiroki and Park, Joonyong and Saito, Daisuke and Saruwatari, Hiroshi},
  year = {2024}
}

音声認識

Cocktail Machine Speech Chain: 重複あり音声を用いた音声認識・音声合成モデルの統一的学習

松永裕太 , 高道慎之介 , 上乃聖 , and 猿渡洋

In 日本音響学会春季研究発表会 , Mar 2024

Bib PDF

@inproceedings{matsunaga24asjs_cocktail-speech-chain,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {Cocktail Machine Speech Chain: 重複あり音声を用いた音声認識・音声合成モデルの統一的学習 },
  author = {裕太, 松永 and 慎之介, 高道 and 聖, 上乃 and 洋, 猿渡},
  year = {2024},
  memo = {本研究は，JST 次世代研究者挑戦的研究プログラム JPMJSP2108，ムーンショット JPMJPS2011，JST 創発的研究支 援事業 JP23KJ0828，科研費 21H05054, 22H03639，23H03418 の支援と，東京大学の齋藤佑樹博士, 佐伯高明氏の協力を受け実施 したものです.}
}

音声合成

Emotion-controllable Speech Synthesis using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence

Xuan Luo , Shinnosuke Takamichi , Yuki Saito , Tomoki Koriyama , and Hiroshi Saruwatari

APSIPA Transactions, Mar 2024

Bib PDF

@article{luo24apsipa-trans_emotion-synthesis,
  title = {Emotion-controllable Speech Synthesis using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence},
  author = {Luo, Xuan and Takamichi, Shinnosuke and Saito, Yuki and Koriyama, Tomoki and Saruwatari, Hiroshi},
  year = {2024},
  journal = {APSIPA Transactions},
}

楽音合成

複数のオーディオエフェクトが適用された楽音に対するエフェクトチェイン推定と原音復元

武伯寒 , 渡邉研斗 , 中塚貴之 , Tian Cheng , 中野倫靖 , 後藤真孝 , 高道慎之介 , and 猿渡洋

In 日本音響学会春季研究発表会 , Mar 2024

Bib PDF

@inproceedings{take24asjs_audio-effect,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {複数のオーディオエフェクトが適用された楽音に対するエフェクトチェイン推定と原音復元},
  author = {伯寒, 武 and 研斗, 渡邉 and 貴之, 中塚 and Cheng, Tian and 倫靖, 中野 and 真孝, 後藤 and 慎之介, 高道 and 洋, 猿渡},
  year = {2024},
  memo = {本研究は科研費 21H04900, 22H03639，23H03418， JST 創発的研究支援事業 JP23KJ0828，ムーンショット JPMJPS2011 の助成を受けたものです}
}

楽音合成

Audio Effect Chain Estimation and Dry Signal Recovery from Multi-Effect-Processed Musical Signals

Osamu Take , Kento Watanabe , Takayuki Nakatsuka , Tian Cheng , Tomoyasu Nakano , Masataka Goto , Shinnosuke Takamichi , and Hiroshi Saruwatari

In Proceedings of International Conference on Digital Audio Effects (DAFx) , Mar 2024

Bib PDF

@inproceedings{take24dafx_effect-chain,
  abbr_publisher = {Proceedings of International Conference on Digital Audio Effects (DAFx)},
  booktitle = {Proceedings of International Conference on Digital Audio Effects (DAFx)},
  title = {Audio Effect Chain Estimation and Dry Signal Recovery from Multi-Effect-Processed Musical Signals},
  author = {Take, Osamu and Watanabe, Kento and Nakatsuka, Takayuki and Cheng, Tian and Nakano, Tomoyasu and Goto, Masataka and Takamichi, Shinnosuke and Saruwatari, Hiroshi},
  memo = {This work is supported by JSPS KAKENHI 21H04900, 22H03639, and 23H03418, JST FOREST JPMJFR226V, and Moonshot R&D Grant Number JPMJPS2011.},
  year = {2024}
}

Real-Time Noise Estimation for Lombard-Effect Speech Synthesis in Human–Avatar Dialogue Systems

Yuto Ishikawa , Osamu Take , Tomohiko Nakamura , Norihiro Takamune , Yuki Saito , Shinnosuke Takamichi , and Hiroshi Saruwatari

In Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) , Mar 2024

Bib PDF

@inproceedings{ishikawa24apsipa_lombard,
  abbr_publisher = {Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
  booktitle = {Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
  title = {Real-Time Noise Estimation for Lombard-Effect Speech Synthesis in Human–Avatar Dialogue Systems},
  author = {Ishikawa, Yuto and Take, Osamu and Nakamura, Tomohiko and Takamune, Norihiro and Saito, Yuki and Takamichi, Shinnosuke and Saruwatari, Hiroshi},
  year = {2024}
}

NecoBERT: Self-Supervised Learning Model Trained by Masked Language Modeling on Rich Acoustic Features Derived from Neural Audio Codec

Wataru Nakata , Takaaki Saeki , Yuki Saito , Shinnosuke Takamichi , and Hiroshi Saruwatari

In Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) , Mar 2024

Bib PDF

@inproceedings{wataru24apsipa_necobert,
  abbr_publisher = {Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
  booktitle = {Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
  title = {NecoBERT: Self-Supervised Learning Model Trained by Masked Language Modeling on Rich Acoustic Features Derived from Neural Audio Codec},
  author = {Nakata, Wataru and Saeki, Takaaki and Saito, Yuki and Takamichi, Shinnosuke and Saruwatari, Hiroshi},
  year = {2024}
}

J-CHAT: 音声言語モデルのための大規模日本語対話音声コーパス

中田亘 , 関健太郎 , 谷中瞳 , 齋藤佑樹 , 高道慎之介 , and 猿渡洋

In 日本音響学会秋季研究発表会 , Mar 2024

Bib PDF

@inproceedings{nakata24asja_j-chat,
  abbr_publisher = {日本音響学会秋季研究発表会},
  booktitle = {日本音響学会秋季研究発表会},
  title = {J-CHAT: 音声言語モデルのための大規模日本語対話音声コーパス},
  author = {亘, 中田 and 健太郎, 関 and 瞳, 谷中 and 佑樹, 齋藤 and 慎之介, 高道 and 洋, 猿渡},
  year = {2024}
}

人間とアバターとの対話システムにおける拡散性雑音下リアルタイム推定雑音を用いたLombard効果模擬音声合成のための検討

石川悠人 , 武伯寒 , 中村友彦 , 高宗典玄 , 齋藤佑樹 , 高道慎之介 , and 猿渡洋

In 日本音響学会秋季研究発表会 , Mar 2024

Bib PDF

@inproceedings{ishikawa24asja_lombard,
  abbr_publisher = {日本音響学会秋季研究発表会},
  booktitle = {日本音響学会秋季研究発表会},
  title = {人間とアバターとの対話システムにおける拡散性雑音下リアルタイム推定雑音を用いたLombard効果模擬音声合成のための検討},
  author = {悠人, 石川 and 伯寒, 武 and 友彦, 中村 and 典玄, 高宗 and 佑樹, 齋藤 and 慎之介, 高道 and 洋, 猿渡},
  year = {2024}
}

2023

音声合成

Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control

Aya Watanabe , Shinnosuke Takamichi , Yuki Saito , Wataru Nakata , Detai Xin , and Hiroshi Saruwatari

In IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , Mar 2023

arXiv Bib Website

@inproceedings{watanabe23asru_coconut-corpus,
  abbr_publisher = {IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  booktitle = {IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  title = {{Coco-Nut}: Corpus of {J}apanese Utterance and Voice Characteristics Description for Prompt-based Control},
  author = {Watanabe, Aya and Takamichi, Shinnosuke and Saito, Yuki and Nakata, Wataru and Xin, Detai and Saruwatari, Hiroshi},
  year = {2023}
}

音声合成

Coco-Nut: 自由記述文による声質制御に向けた多話者音声・声質自由記述ペアデータセット

渡邊亞椰 , 高道慎之介 , 齋藤佑樹 , 辛徳泰 , and 猿渡洋

In 日本音響学会秋季研究発表会 , Mar 2023

Bib PDF Slides Website

@inproceedings{watanabe23asja_coconut,
  abbr_publisher = {日本音響学会秋季研究発表会},
  booktitle = {日本音響学会秋季研究発表会},
  title = {Coco-Nut: 自由記述文による声質制御に向けた多話者音声・声質自由記述ペアデータセット},
  author = {亞椰, 渡邊 and 慎之介, 高道 and 佑樹, 齋藤 and 徳泰, 辛 and 洋, 猿渡},
  year = {2023}
}

音声合成

深層学習で獲得される音声シンボルは自然言語シンボルと同様に Zipf 則に従うか？

前田紘希 , 高道慎之介 , 朴浚鎔 , and 猿渡洋

In 日本音響学会秋季研究発表会 , Mar 2023

Bib PDF Slides

@inproceedings{maeda23asja_zipf,
  abbr_publisher = {日本音響学会秋季研究発表会},
  booktitle = {日本音響学会秋季研究発表会},
  title = {深層学習で獲得される音声シンボルは自然言語シンボルと同様に {Zipf} 則に従うか？},
  author = {紘希, 前田 and 慎之介, 高道 and 浚鎔, 朴 and 洋, 猿渡},
  year = {2023}
}

音声合成

学習・評価ループを用いたデータ選択によるダークデータからの音声合成

関健太郎 , 高道慎之介 , 佐伯高明 , and 猿渡洋

In 日本音響学会春季研究発表会 , Mar 2023

Bib PDF

@inproceedings{seki23asjs_dark-data,
  abbr_publisher = {日本音響学会春季研究発表会},
  booktitle = {日本音響学会春季研究発表会},
  title = {学習・評価ループを用いたデータ選択によるダークデータからの音声合成},
  author = {健太郎, 関 and 慎之介, 高道 and 高明, 佐伯 and 洋, 猿渡},
  year = {2023}
}

音声合成

Mid-attribute Speaker Generation using Optimal-Transport-based Interpolation of Gaussian Mixture Models

Aya Watanabea , Shinnosuke Takamichi , Yuki Saito , Detai Xin , and Hiroshi Saruwatari

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Mar 2023

arXiv Bib Code Website

@inproceedings{watanabe23icassp_mid-attribute,
  abbr_publisher = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title = {Mid-attribute Speaker Generation using Optimal-Transport-based Interpolation of Gaussian Mixture Models},
  author = {Watanabea, Aya and Takamichi, Shinnosuke and Saito, Yuki and Xin, Detai and Saruwatari, Hiroshi},
  year = {2023}
}

音声合成

vTTS: visual-text to speech

Yoshifumi Nakano , Takaaki Saeki , Shinnosuke Takamichi , Katsuhito Sudoh , and Hiroshi Saruwatari

In Proceedings of IEEE Spoken Language Technology Workshop (SLT) , Mar 2023

arXiv Bib Code Slides

@inproceedings{nakano23slt_visual-text-to-speech,
  abbr_publisher = {Proceedings of IEEE Spoken Language Technology Workshop (SLT)},
  booktitle = {Proceedings of IEEE Spoken Language Technology Workshop (SLT)},
  title = {{vTTS}: visual-text to speech},
  author = {Nakano, Yoshifumi and Saeki, Takaaki and Takamichi, Shinnosuke and Sudoh, Katsuhito and Saruwatari, Hiroshi},
  year = {2023}
}

音声分析

How Generative Spoken Language Model Encodes Noisy Speech: Investigation from Phonetics to Syntactics

Joonyong Park , Shinnosuke Takamichi , Tomohiko Nakamura , Kentaro Seki , Detai Xin , and Hiroshi Saruwatari

In Proceedings of Interspeech , Mar 2023

arXiv Bib

@inproceedings{park23interspeech_gslm,
  abbr_publisher = {Proceedings of Interspeech},
  booktitle = {Proceedings of Interspeech},
  title = {How Generative Spoken Language Model Encodes Noisy Speech: Investigation from Phonetics to Syntactics},
  author = {Park, Joonyong and Takamichi, Shinnosuke and Nakamura, Tomohiko and Seki, Kentaro and Xin, Detai and Saruwatari, Hiroshi},
  year = {2023}
}

音声合成

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection

Kentaro Seki , Shinnosuke Takamichi , Takaaki Saeki , and Hiroshi Saruwatari

In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Mar 2023

arXiv Bib

@inproceedings{seki23icassp_dark-data,
  abbr_publisher = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title = {Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection},
  author = {Seki, Kentaro and Takamichi, Shinnosuke and Saeki, Takaaki and Saruwatari, Hiroshi},
  year = {2023}
}

音声合成

Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion

Yuta Matsunaga , Takaaki Saeki , Shinnosuke Takamichi , and Hiroshi Saruwatari

In Proceedings of Speech Synthesis Workshop (SSW) , Mar 2023

arXiv Bib Website

@inproceedings{matsunaga23ssw_filler-synthesis,
  abbr_publisher = {Proceedings of Speech Synthesis Workshop (SSW)},
  booktitle = {Proceedings of Speech Synthesis Workshop (SSW)},
  title = {Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion},
  author = {Matsunaga, Yuta and Saeki, Takaaki and Takamichi, Shinnosuke and Saruwatari, Hiroshi},
  year = {2023}
}