Hayato Futami

Work experiences
2016-2020 Kyoto University, Informatics

2020-2022 Kyoto University, Graduate School of Informatics

Student at Speech and Audio Processing Lab [link], supervised by Professor Tatsuya Kawahara. Research theme is "Bidirectional Transformer-based Language Modeling for End-to-End Automatic Speech Recognition".

2022- Sony Group Corporation

Research enginner at Speech and Language AI Lab. Joint research project with Carnegie Mellon University [link] (Professor Shinji Watanabe).

[LinkedIn] [GitHub] [X (Twitter)] [Google Scholar]

Publications (1st author)
  • "Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs" [link], Interspeech2025
  • "Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model" [link], Interspeech2024
  • "Phoneme-aware Encoding for Prefix-tree-based Contextual ASR" [link], ICASSP2024
  • "The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge" [link], ICASSP2023 (Won 1st place at SLU Grand Challenge)
  • "Streaming Joint Speech Recognition and Disfluency Detection" [link], ICASSP2023
  • "Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM" [link], Interspeech2022
  • "Distilling the Knowledge of BERT for CTC-based ASR" [link], 2021
  • "ASR Rescoring and Confidence Estimation with ELECTRA" [link], ASRU2021
  • "Distilling the Knowledge of BERT for Sequence-to-Sequence ASR" [link], Interspeech2020 (Nominated for ISCA Best student paper award)
Selected publications (co-author)
  • "Whale: Large-Scale multilingual ASR model with w2v-BERT and E-Branchformer with large speech data", Yosuke Kashiwagi et al., 2025 [link]
  • "Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens", Yosuke Kashiwagi et al., ICASSP2025 [link]
  • "UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions", Siddhant Arora et al., NAACL2024 [link]
  • "Decoder-only architecture for streaming end-to-end speech recognition", Emiru Tsunoo et al., Interspeech2024 [link]
  • "Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting", Yosuke Kashiwagi et al., Interspeech2024 [link]
Awards
  • ASJ, Student Award, 2021 [link]
  • IPSJ Yamashita SIG Research Award, 2021 [link]
  • SIG-SLP, Yahoo! Japan Award, 2020 [link]
  • IPSJ National Convention, Best Paper Award, 2020 [link]
  • IPSJ National Convention, Student Encouragement Award, 2020 [link]
Internship experiences
  • Patentfield (2021-2022), ML engineer (NLP)
  • Hacarus (2018-2020), ML engineer (Image diagnosis)
  • CO-CONV (2017-2018), Software engineer (Visual C++)
  • DeNA (2021), Speech synthesis
  • LINE (2019)
  • Yahoo Japan (2019)
Certifications
  • Applied Information Technology Engineer Examination (2022/04)
  • TOEIC score 970 (2024/11)