About me - Hayato Futami

Hayato Futami

Work experiences

2016-2020 Kyoto University, Informatics

2020-2022 Kyoto University, Graduate School of Informatics

Student at Speech and Audio Processing Lab [link], supervised by Professor Tatsuya Kawahara. Research theme is "Bidirectional Transformer-based Language Modeling for End-to-End Automatic Speech Recognition".

2022- Sony Group Corporation

Research enginner at Speech and Language AI Lab. Joint research project with Carnegie Mellon University [link] (Professor Shinji Watanabe).

[LinkedIn] [GitHub] [X (Twitter)] [Google Scholar]

Publications (1st author)

"Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs" [link], Interspeech2025
"Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model" [link], Interspeech2024
"Phoneme-aware Encoding for Prefix-tree-based Contextual ASR" [link], ICASSP2024
"The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge" [link], ICASSP2023 (Won 1st place at SLU Grand Challenge)
"Streaming Joint Speech Recognition and Disfluency Detection" [link], ICASSP2023
"Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM" [link], Interspeech2022
"Distilling the Knowledge of BERT for CTC-based ASR" [link], 2021
"ASR Rescoring and Confidence Estimation with ELECTRA" [link], ASRU2021
"Distilling the Knowledge of BERT for Sequence-to-Sequence ASR" [link], Interspeech2020 (Nominated for ISCA Best student paper award)

Selected publications (co-author)

"Whale: Large-Scale multilingual ASR model with w2v-BERT and E-Branchformer with large speech data", Yosuke Kashiwagi et al., 2025 [link]
"Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens", Yosuke Kashiwagi et al., ICASSP2025 [link]
"UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions", Siddhant Arora et al., NAACL2024 [link]
"Decoder-only architecture for streaming end-to-end speech recognition", Emiru Tsunoo et al., Interspeech2024 [link]
"Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting", Yosuke Kashiwagi et al., Interspeech2024 [link]

Awards

ASJ, Student Award, 2021 [link]
IPSJ Yamashita SIG Research Award, 2021 [link]
SIG-SLP, Yahoo! Japan Award, 2020 [link]
IPSJ National Convention, Best Paper Award, 2020 [link]
IPSJ National Convention, Student Encouragement Award, 2020 [link]

Internship experiences

Patentfield (2021-2022), ML engineer (NLP)
Hacarus (2018-2020), ML engineer (Image diagnosis)
CO-CONV (2017-2018), Software engineer (Visual C++)
DeNA (2021), Speech synthesis
LINE (2019)
Yahoo Japan (2019)

Certifications

Applied Information Technology Engineer Examination (2022/04)
TOEIC score 970 (2024/11)