Python如何将语音转换为文本？详细实现指南

2021年11月11日17:37:44 发表评论 4,072 次浏览

Python语音转换为文本指南：学习如何使用语音识别 Python 库执行语音识别，将音频语音转换为 Python 中的文本。 Python如何将语音转换为文本？语音识别是计算机软件识别口语中的单词和短语并将其转换为人类可读文本的能力。在本教程中，你将学习如何使用SpeechRecognition 库在 Python 中将语音转换为文本。因此，我们不需要从头开始构建任何机器学习模型，该库为我们提供了各种众所周知的公共语音识别 API（例如 Google Cloud Speech API、IBM Speech To Text 等）的便捷包装器。 推荐阅读：如何在 Python 中翻译文本。 Python将语音转换为文本示例 - 好的，让我们开始吧，使用pip以下命令安装库：

pip3 install SpeechRecognition pydub

好的，打开一个新的 Python 文件并导入它：

import speech_recognition as sr

这个库的好处是它支持多种识别引擎：

我们将在这里使用 Google 语音识别，因为它很简单，不需要任何 API 密钥。

从文件中读取

Python语音转换为文本指南：确保当前目录中有一个包含英语语音的音频文件（如果你想跟我一起学习，请在此处获取音频文件）：

filename = "16-122828-0002.wav"

这个文件是从LibriSpeech数据集中抓取的，但你可以使用任何你想要的音频 WAV 文件，只需更改文件名，让我们初始化我们的语音识别器：

# initialize the recognizer
r = sr.Recognizer()

Python将语音转换为文本示例：以下代码负责加载音频文件，并使用 Google Speech Recognition 将语音转换为文本：

# open the file
with sr.AudioFile(filename) as source:
    # listen for the data (load audio to memory)
    audio_data = r.record(source)
    # recognize (convert from speech to text)
    text = r.recognize_google(audio_data)
    print(text)

这将需要几秒钟才能完成，因为它将文件上传到 Google 并获取输出，这是我的结果：

I believe you're just talking nonsense

上面的代码适用于中小型音频文件。在下一节中，我们将为大文件编写代码。

读取大型音频文件

Python如何将语音转换为文本？如果你想对长音频文件执行语音识别，那么下面的函数可以很好地处理：

# importing libraries 
import speech_recognition as sr 
import os 
from pydub import AudioSegment
from pydub.silence import split_on_silence

# create a speech recognition object
r = sr.Recognizer()

# a function that splits the audio file into chunks
# and applies speech recognition
def get_large_audio_transcription(path):
    """
    Splitting the large audio file into chunks
    and apply speech recognition on each of these chunks
    """
    # open the audio file using pydub
    sound = AudioSegment.from_wav(path)  
    # split audio sound where silence is 700 miliseconds or more and get chunks
    chunks = split_on_silence(sound,
        # experiment with this value for your target audio file
        min_silence_len = 500,
        # adjust this per requirement
        silence_thresh = sound.dBFS-14,
        # keep the silence for 1 second, adjustable as well
        keep_silence=500,
    )
    folder_name = "audio-chunks"
    # create a directory to store the audio chunks
    if not os.path.isdir(folder_name):
        os.mkdir(folder_name)
    whole_text = ""
    # process each chunk 
    for i, audio_chunk in enumerate(chunks, start=1):
        # export audio chunk and save it in
        # the `folder_name` directory.
        chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
        audio_chunk.export(chunk_filename, format="wav")
        # recognize the chunk
        with sr.AudioFile(chunk_filename) as source:
            audio_listened = r.record(source)
            # try converting it to text
            try:
                text = r.recognize_google(audio_listened)
            except sr.UnknownValueError as e:
                print("Error:", str(e))
            else:
                text = f"{text.capitalize()}. "
                print(chunk_filename, ":", text)
                whole_text += text
    # return the text for all chunks detected
    return whole_text

注意：你需要安装Pydub使用pip上述代码才能工作。上面的函数使用split_on_silence()来自pydub.silence模块的函数将音频数据拆分为静音的块。min_silence_len参数是用于拆分的最小静音长度。 silence_thresh是任何比这更安静的东西都将被视为静音的阈值，我已将其设置为平均dBFS减去14，keep_silence参数是在检测到的每个块的开始和结束时离开的静音量（以毫秒为单位）。这些参数并非适用于所有声音文件，请尝试根据你的大型音频需求尝试使用这些参数。之后，我们遍历所有块并将每个语音音频转换为文本并将它们加在一起，这是一个运行示例：

path = "7601-291468-0006.wav"
print("\nFull text:", get_large_audio_transcription(path))

注意：你可以在此处获取7601-291468-0006.wav文件。 输出：

audio-chunks\chunk1.wav : His abode which you had fixed in a bowery or country seat. 
audio-chunks\chunk2.wav : At a short distance from the city. 
audio-chunks\chunk3.wav : Just at what is now called dutch street. 
audio-chunks\chunk4.wav : Sooner bounded with proofs of his ingenuity. 
audio-chunks\chunk5.wav : Patent smokejacks. 
audio-chunks\chunk6.wav : It required a horse to work some. 
audio-chunks\chunk7.wav : Dutch oven roasted meat without fire. 
audio-chunks\chunk8.wav : Carts that went before the horses. 
audio-chunks\chunk9.wav : Weather cox that turned against the wind and other wrongheaded contrivances. 
audio-chunks\chunk10.wav : So just understand can found it all beholders. 

Full text: His abode which you had fixed in a bowery or country seat. At a short distance from the city. Just at what is now called dutch street. Sooner bounded with proofs of his ingenuity. Patent smokejacks. It required a horse to work some. Dutch oven roasted meat without fire. Carts that went before the horses. Weather cox that turned against the wind and other wrongheaded contrivances. So just understand can found it all beholders.

因此，该函数会自动为我们创建一个文件夹，并放置我们指定的原始音频文件的块，然后对所有这些块运行语音识别。

从麦克风阅读

Python如何将语音转换为文本？这需要在你的机器上安装PyAudio，以下是安装过程，具体取决于你的操作系统：

视窗

你可以直接pip安装它：

pip3 install pyaudio

Linux

你需要先安装依赖项：

sudo apt-get install python-pyaudio python3-pyaudio
pip3 install pyaudio

苹果系统

你需要先安装portaudio，然后你就可以pip install它了：

brew install portaudio
pip3 install pyaudio

Python将语音转换为文本示例：现在让我们使用我们的麦克风来转换我们的语音：

with sr.Microphone() as source:
    # read the audio data from the default microphone
    audio_data = r.record(source, duration=5)
    print("Recognizing...")
    # convert speech to text
    text = r.recognize_google(audio_data)
    print(text)

这将听到你的麦克风 5 秒钟，然后尝试将该语音转换为文本！它和前面的代码非常相似，但是我们在这里使用Microphone()对象从默认麦克风读取音频，然后我们使用record()函数中的持续时间参数在5 秒后停止读取，然后上传音频数据到谷歌以获取输出文本。你还可以在record()函数中使用offset参数在offset秒后开始记录。此外，你可以通过将语言参数传递给identify_google()函数来识别不同的语言。例如，如果你想识别西班牙语语音，你可以使用：

text = r.recognize_google(audio_data, language="es-ES")

在此 stackoverflow 答案中查看支持的语言。

Python语音转换为文本指南总结

Python如何将语音转换为文本？如你所见，使用此库将语音转换为文本非常简单。这个库在野外被广泛使用，查看他们的官方文档。如果你不想使用 Python 并且想要一个自动为你执行此操作的服务，我建议你使用 audext，它可以快速且经济高效地将你的音频在线转换为文本。一探究竟！如果你还想在 Python中将文本转换为语音，请查看本教程。

从文件中读取

读取大型音频文件

从麦克风阅读

视窗

Linux

苹果系统

Python语音转换为文本指南总结

发表评论取消回复

登录 注册 找回密码

登录注册找回密码