http://10.10.120.238:8080/xmlui/handle/123456789/433
Title: | Optimal Near-End Speech Intelligibility Improvement Using CLPSO-Based Voice Transformation in Realistic Noisy Environments |
Authors: | Biswas R. Nathwani K. |
Keywords: | CLPSO PESQ SDR Speech intelligibility STOI |
Issue Date: | 2022 |
Publisher: | Birkhauser |
Abstract: | The proposed work attempts to improve the near-end intelligibility of speech at very low signal-to-noise ratios (SNRs). Additionally, the prerequisite of noise statistics that existing intelligibility improvement methods require is not a limitation of the proposed approach. To this end, the shaping parameters of the voice transformation function (VTF) are optimized. This optimization of the shaping parameters of the VTF corresponds to the combined modification that includes formant shifting, nonuniform time scaling, smoothing, and energy re-distributions in comprehensive learning particle swarm optimization (CLPSO) framework. The optimal parameters of the combined modifications are obtained by jointly maximizing the short time objective intelligibility, perceptual evaluation of speech quality and signal-to-distortion ratio metrics being used as the cost function in CLPSO. The outcome at the end is an improvement in intelligibility that is significantly higher than the ones obtained by applying these methods individually, while preserving the quality. As a side result, a Gaussian process regression is also employed to estimate the shaping parameters of VTF at arbitrary SNRs—other than the ones which were used during CLPSO training. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature. |
URI: | https://dx.doi.org/10.1007/s00034-022-02106-3 http://localhost:8080/xmlui/handle/123456789/433 |
ISSN: | 0278081X |
Appears in Collections: | Journal Article |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.