25.6.29	增加Gummy 实时/一句话语音识别
25.6.28	增加Qwen TTS本地音频和实时播报

背景

准备环境

MacOS M1电脑（其他M系列芯片也可以）

为了方便python的使用环境，使用Miniconda：下载链接：Download Anaconda Distribution | Anaconda

安装阿里模型的依赖库：https://bailian.console.aliyun.com/?tab=api#/doc/?type=model&url=https%3A%2F%2Fhelp.aliyun.com%2Fdocument_detail%2F2712193.html%23f3e80b21069aa

为了配置环境变量：https://bailian.console.aliyun.com/?tab=api#/doc/?type=model&url=https%3A%2F%2Fhelp.aliyun.com%2Fdocument_detail%2F2803795.html

为了方便编辑代码，下载安装最流行的：vscode，最新版本已经有了Github Copilot免费用，记得要打开。

为了解决使用了Miniconda的python环境，导致vscode自带的运行环境找不到dashscope出错的问题；

按下 Cmd+Shift+P，输入并选择 Python: Select Interpreter。
选择你用 miniconda 安装 dashscope 的那个环境（比如 miniconda3/envs/xxx）。
右下角状态栏会显示当前环境。
验证：which python

Gummy-ASR

实时识别

一句话识别

官方demo，未修改；

原理：程序启动时，会开始录音；录音结束判停：经过查看代码，和日志查看，由云端判定的），一分钟音频是上限。

# For prerequisites running the following sample, visit https://help.aliyun.com/document_detail/xxxxx.html
# 一句话识别能够对一分钟内的语音数据流（无论是从外部设备如麦克风获取的音频流，还是从本地文件读取的音频流）进行识别并流式返回结果。import pyaudio
import dashscope
from dashscope.audio.asr import *# 若没有将API Key配置到环境变量中，需将your-api-key替换为自己的API Key
# dashscope.api_key = "your-api-key"mic = None
stream = Noneclass Callback(TranslationRecognizerCallback):def on_open(self) -> None:global micglobal streamprint("TranslationRecognizerCallback open.")mic = pyaudio.PyAudio()stream = mic.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True)def on_close(self) -> None:global micglobal streamprint("TranslationRecognizerCallback close.")stream.stop_stream()stream.close()mic.terminate()stream = Nonemic = Nonedef on_event(self,request_id,transcription_result: TranscriptionResult,translation_result: TranslationResult,usage,) -> None:print("request id: ", request_id)print("usage: ", usage)if translation_result is not None:print("translation_languages: ",translation_result.get_language_list(),)english_translation = translation_result.get_translation("en")print("sentence id: ", english_translation.sentence_id)print("translate to english: ", english_translation.text)if english_translation.vad_pre_end:print("vad pre end {}, {}, {}".format(transcription_result.pre_end_start_time, transcription_result.pre_end_end_time, transcription_result.pre_end_timemillis))if transcription_result is not None:print("sentence id: ", transcription_result.sentence_id)print("transcription: ", transcription_result.text)callback = Callback()translator = TranslationRecognizerChat(model="gummy-chat-v1",format="pcm",sample_rate=16000,transcription_enabled=True,translation_enabled=True,translation_target_languages=["en"],callback=callback,
)
translator.start()
print("请您通过麦克风讲话体验一句话语音识别和翻译功能")
while True:if stream:data = stream.read(3200, exception_on_overflow=False)if not translator.send_audio_frame(data):print("sentence end, stop sending")breakelse:breaktranslator.stop()

Qwen-TTS

非实时TTS

将生成的音频，保存到本地。文档见https://bailian.console.aliyun.com/?tab=doc#/doc/?type=model&url=https%3A%2F%2Fhelp.aliyun.com%2Fdocument_detail%2F2879134.html&renderType=iframe

修复了tts请求时，response出错的情况（比如API_KEY不对）

import os
import requests
import dashscopetext = "那我来给大家推荐一款T恤，这款呢真的是超级好看，这个颜色呢很显气质，而且呢也是搭配的绝佳单品，大家可以闭眼入，真的是非常好看，对身材的包容性也很好，不管啥身材的宝宝呢，穿上去都是很好看的。推荐宝宝们下单哦。"
response = dashscope.audio.qwen_tts.SpeechSynthesizer.call(model="qwen-tts",api_key=os.getenv("DASHSCOPE_API_KEY"),text=text,voice="Cherry",
)# ====== 开始检查 response 是否有效 ======
print(response)
if not hasattr(response, 'output') or response.output is None:print("响应中没有 output 字段，请检查权限或模型是否开通")exit()if not hasattr(response.output, 'audio') or response.output.audio is None:print("响应中没有 audio 数据，请检查返回内容")exit()if not hasattr(response.output.audio, 'url'):print("响应中 audio 没有 url 字段，请检查返回结构")exit()# ====== 结束检查 response 是否有效 ======audio_url = response.output.audio["url"]save_path = "downloaded_audio.wav"  # 自定义保存路径try:response = requests.get(audio_url)response.raise_for_status()  # 检查请求是否成功with open(save_path, 'wb') as f:f.write(response.content)print(f"音频文件已保存至：{save_path}")
except Exception as e:print(f"下载失败：{str(e)}")

问题1：pip install pyaudio失败

解决方案：

（1）先安装brew：/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"，重启

（2）再安装brew install portaudio

（3）再安装pip install pyaudio

生成的TTS音频文件，放到了本地。适合音频生成离线播报场景，比如PPT。

不适合实时的语音交互，我们需要实时TTS。

实时TTS

按照官方demo跑就好。

我们把实时TTS封装成函数api，并提供了api测试demo；

函数封装代码：qwen_play_tts.py

# coding=utf-8import os
import dashscope
import pyaudio
import time
import base64
import numpy as npdef qwen_play_tts(text, voice="Ethan", api_key=None):"""使用通义千问 TTS 进行流式语音合成并播放:param text: 合成文本:param voice: 发音人:param api_key: Dashscope API Key（可选，默认读取环境变量）"""api_key = api_key or os.getenv("DASHSCOPE_API_KEY")if not api_key:raise ValueError("DASHSCOPE_API_KEY is not set.")p = pyaudio.PyAudio()stream = p.open(format=pyaudio.paInt16,channels=1,rate=24000,output=True)responses = dashscope.audio.qwen_tts.SpeechSynthesizer.call(model="qwen-tts",api_key=api_key,text=text,voice=voice,stream=True)for chunk in responses:audio_string = chunk["output"]["audio"]["data"]wav_bytes = base64.b64decode(audio_string)audio_np = np.frombuffer(wav_bytes, dtype=np.int16)stream.write(audio_np.tobytes())time.sleep(0.8)stream.stop_stream()stream.close()p.terminate()# 示例调用
if __name__ == "__main__":sample_text = "你好，这是一段测试语音。"qwen_play_tts(sample_text)

api测试代码：qwen_api_test.py

from qwen_play_tts import qwen_play_ttsqwen_play_tts("这是一个函数调用测试。")

背景

准备环境

Gummy-ASR

实时识别

一句话识别

Qwen-TTS

非实时TTS

实时TTS

相关文章

WEB安全--Java安全--jsp webshell免杀1

Valkey与Redis评估对比：开源替代方案的技术演进

华为云Flexus+DeepSeek征文 | 基于华为云ModelArts Studio安装NoteGen AI笔记应用程序

BUUCTF在线评测-练习场-WebCTF习题[BJDCTF2020]Easy MD51-flag获取、解析

面向对象三大特性深度解析：封装、继承与多态

mmap映射物理内存之三invalid cache

记忆化搜索（dfs+memo）无环有向图

使用AI豆包写一个车辆信息管理页面

《HarmonyOSNext应用防崩指南：30秒定位JS Crash的破案手册》

阅读笔记(3) 单层网络:回归(下)

Mac OS系统每次开机启动后，提示：输入密码来解锁磁盘“Data”，去除提示的解决方法

CSS Flex 布局中flex-shrink: 0使用

WHERE 子句中使用子查询：深度解析与最佳实践

从0到1：不文明现象随手拍小程序开发日记（一）

STM32F103之SPI软件读写W25Q64

vue3+element-plus 组件功能实现上传功能

OpenCV中创建Mat对象

10、做中学 | 五年级下期 Golang循环控制

机器学习算法-K近邻算法-KNN

基于vue框架的法律知识咨询普及系统gwuv7（程序+源码+数据库+调试部署+开发环境）带论文文档1万字以上，文末可获取，系统界面在最后面。