python百度语音实时识别成文字（代码运行）百度智能云语音识别怎么用的

发表时间：2023-07-27 18:35:01

python百度语音实时识别成文字（代码运行）

CUDA安装教程

淘气泡淘气:最后添加4个变量的变量名是什么

知识图谱嵌入的衡量指标：MRR，MR，HITS@n

37.2℃496:我觉得应该是同一类型的所有实体吧

CUDA安装教程

L1uminous:合着你这就是默认下一步呗，无脑安装

CUDA安装教程

麦田里的捡穗狗:那个控制面板在哪儿第一步就没看懂

CUDA安装教程

嗷呜126:俺还不知道，能告诉俺不

百度语音合成与语音识别api使用（Java版本）

百度语音合成官方文档：https://ai.baidu.com/docs#/TTS-Online-Java-SDK/top百度语音识别官方文档：https://ai.baidu.com/docs#/ASR-Online-Java-SDK/top

本文项目源码下载：https://github.com/Blankwhiter/SpeechSynthesizer

第一步注册百度账号以及创建创建应用

读者请自行注册，以及创建应用并在创建应用过程中加入接口选择。创建完成后，可得到AppID，APIKey，SecretKey。如果任何问题，请在评论留言。最终结果应用详情界面如下：

第二步加入开发所需环境

在springboot的pom.xml中dependencies节点下加入fastjson，百度aip的JavaSDK，以及mp3转pcm的mp3spi。pom.xml文件如下：

4.0.0com.examplespeechsynthesizer0.0.1-SNAPSHOTjarSpeechSynthesizerDemoprojectforSpringBootorg.springframework.bootspring-boot-starter-parent2.0.4.RELEASEUTF-8UTF-81.8org.springframework.bootspring-boot-starterorg.springframework.bootspring-boot-starter-testtestorg.projectlomboklomboktruecom.alibabafastjson1.2.47com.baidu.aipjava-sdk4.1.1com.googlecode.soundlibsmp3spi1.9.5.4org.springframework.bootspring-boot-maven-plugin第三步编写语音合成代码

内容如下：

/***单例懒加载模式返回实例*@return*/publicstaticAipSpeechgetInstance(){if(client==null){synchronized(AipSpeech.class){if(client==null){client=newAipSpeech(APP_ID,API_KEY,SECRET_KEY);}}}returnclient;}/***语音合成*@paramword文字内容*@paramoutputPath合成语音生成路径*@return*/publicstaticbooleanSpeechSynthesizer(Stringword,StringoutputPath){/*最长的长度*/intmaxLength=1024;if(word.getBytes().length>=maxLength){returnfalse;}//初始化一个AipSpeechclient=getInstance();//可选：设置网络连接参数client.setConnectionTimeoutInMillis(2000);client.setSocketTimeoutInMillis(60000);//可选：设置代理服务器地址,http和socket二选一，或者均不设置//client.setHttpProxy("proxy_host",proxy_port);//设置http代理//client.setSocketProxy("proxy_host",proxy_port);//设置socket代理//调用接口TtsResponseres=client.synthesis(word,"zh",1,null);byte[]data=res.getData();org.json.JSONObjectres1=res.getResult();if(data!=null){try{Util.writeBytesToFileSystem(data,outputPath);}catch(IOExceptione){e.printStackTrace();}returntrue;}if(res1!=null){log.info("result:"+res1.toString());}returnfalse;}

使用示例：

SpeechSynthesizer("简单测试百度语音合成","d:/SpeechSynthesizer.mp3");

注：语音合成文字是不能超过1024字节，读者可自行改装，将多次内容合成进行拼装。

第四步编写语音识别代码/***语音识别*@paramvideoPath*@paramvideoType*@return*/publicstaticStringSpeechRecognition(StringvideoPath,StringvideoType){//初始化一个AipSpeechclient=getInstance();//可选：设置网络连接参数client.setConnectionTimeoutInMillis(2000);client.setSocketTimeoutInMillis(60000);//可选：设置代理服务器地址,http和socket二选一，或者均不设置//client.setHttpProxy("proxy_host",proxy_port);//设置http代理//client.setSocketProxy("proxy_host",proxy_port);//设置socket代理//调用接口JSONObjectres=client.asr(videoPath,videoType,16000,null);log.info("SpeechRecognition:"+res.toString());returnres.toString(2);}/***mp3转pcm*@parammp3filepathMP3文件存放路径*@parampcmfilepathpcm文件保存路径*@return*/publicstaticbooleanconvertMP32Pcm(Stringmp3filepath,Stringpcmfilepath){try{//获取文件的音频流，pcm的格式AudioInputStreamaudioInputStream=getPcmAudioInputStream(mp3filepath);//将音频转化为pcm的格式保存下来AudioSystem.write(audioInputStream,AudioFileFormat.Type.WAVE,newFile(pcmfilepath));returntrue;}catch(IOExceptione){//TODOAuto-generatedcatchblocke.printStackTrace();returnfalse;}}/***获得pcm文件的音频流*@parammp3filepath*@return*/privatestaticAudioInputStreamgetPcmAudioInputStream(Stringmp3filepath){Filemp3=newFile(mp3filepath);AudioInputStreamaudioInputStream=null;AudioFormattargetFormat=null;try{AudioInputStreamin=null;MpegAudioFileReadermp=newMpegAudioFileReader();in=mp.getAudioInputStream(mp3);AudioFormatbaseFormat=in.getFormat();targetFormat=newAudioFormat(AudioFormat.Encoding.PCM_SIGNED,baseFormat.getSampleRate(),16,baseFormat.getChannels(),baseFormat.getChannels()*2,baseFormat.getSampleRate(),false);audioInputStream=AudioSystem.getAudioInputStream(targetFormat,in);}catch(Exceptione){e.printStackTrace();}returnaudioInputStream;}

使用示例：

convertMP32Pcm("d:/SpeechSynthesizer.mp3","d:/SpeechSynthesizer.pcm");SpeechRecognition("d:/SpeechSynthesizer.pcm","pcm");

注：原始PCM的录音参数必须符合8k/16k采样率、16bit位深、单声道，支持的格式有：pcm（不压缩）、wav（不压缩，pcm编码）、amr（压缩格式）。语音时长上限为60s，请不要超过这个长度，否则会返回错误。

第五步合成一个工具类

SpeechUtil.java内容如下：

importcom.baidu.aip.speech.AipSpeech;importcom.baidu.aip.speech.TtsResponse;importcom.baidu.aip.util.Util;importjavazoom.spi.mpeg.sampled.file.MpegAudioFileReader;importlombok.extern.slf4j.Slf4j;importorg.json.JSONObject;importjavax.sound.sampled.AudioFileFormat;importjavax.sound.sampled.AudioFormat;importjavax.sound.sampled.AudioInputStream;importjavax.sound.sampled.AudioSystem;importjava.io.File;importjava.io.IOException;/***百度语音工具类*/@Slf4jpublicclassSpeechUtil{publicstaticfinalStringAPP_ID="11679901";publicstaticfinalStringAPI_KEY="FMkPBfeCmc7kGQmhHr3prGzN";publicstaticfinalStringSECRET_KEY="WpWbnNu9SDUscwWTs2sQRtw1WXvGssCg";privatestaticAipSpeechclient;publicstaticvoidmain(String[]args)throwsIOException{//SpeechSynthesizer("简单测试百度语音合成","d:/SpeechSynthesizer.mp3");convertMP32Pcm("d:/SpeechSynthesizer.mp3","d:/SpeechSynthesizer.pcm");SpeechRecognition("d:/SpeechSynthesizer.pcm","pcm");}/***单例懒加载模式返回实例*@return*/publicstaticAipSpeechgetInstance(){if(client==null){synchronized(AipSpeech.class){if(client==null){client=newAipSpeech(APP_ID,API_KEY,SECRET_KEY);}}}returnclient;}/***语音合成*@paramword文字内容*@paramoutputPath合成语音生成路径*@return*/publicstaticbooleanSpeechSynthesizer(Stringword,StringoutputPath){/*最长的长度*/intmaxLength=1024;if(word.getBytes().length>=maxLength){returnfalse;}//初始化一个AipSpeechclient=getInstance();//可选：设置网络连接参数client.setConnectionTimeoutInMillis(2000);client.setSocketTimeoutInMillis(60000);//可选：设置代理服务器地址,http和socket二选一，或者均不设置//client.setHttpProxy("proxy_host",proxy_port);//设置http代理//client.setSocketProxy("proxy_host",proxy_port);//设置socket代理//调用接口TtsResponseres=client.synthesis(word,"zh",1,null);byte[]data=res.getData();org.json.JSONObjectres1=res.getResult();if(data!=null){try{Util.writeBytesToFileSystem(data,outputPath);}catch(IOExceptione){e.printStackTrace();}returntrue;}if(res1!=null){log.info("result:"+res1.toString());}returnfalse;}/***语音识别*@paramvideoPath*@paramvideoType*@return*/publicstaticStringSpeechRecognition(StringvideoPath,StringvideoType){//初始化一个AipSpeechclient=getInstance();//可选：设置网络连接参数client.setConnectionTimeoutInMillis(2000);client.setSocketTimeoutInMillis(60000);//可选：设置代理服务器地址,http和socket二选一，或者均不设置//client.setHttpProxy("proxy_host",proxy_port);//设置http代理//client.setSocketProxy("proxy_host",proxy_port);//设置socket代理//调用接口JSONObjectres=client.asr(videoPath,videoType,16000,null);log.info("SpeechRecognition:"+res.toString());returnres.toString(2);}/***mp3转pcm*@parammp3filepathMP3文件存放路径*@parampcmfilepathpcm文件保存路径*@return*/publicstaticbooleanconvertMP32Pcm(Stringmp3filepath,Stringpcmfilepath){try{//获取文件的音频流，pcm的格式AudioInputStreamaudioInputStream=getPcmAudioInputStream(mp3filepath);//将音频转化为pcm的格式保存下来AudioSystem.write(audioInputStream,AudioFileFormat.Type.WAVE,newFile(pcmfilepath));returntrue;}catch(IOExceptione){//TODOAuto-generatedcatchblocke.printStackTrace();returnfalse;}}/***获得pcm文件的音频流*@parammp3filepath*@return*/privatestaticAudioInputStreamgetPcmAudioInputStream(Stringmp3filepath){Filemp3=newFile(mp3filepath);AudioInputStreamaudioInputStream=null;AudioFormattargetFormat=null;try{AudioInputStreamin=null;MpegAudioFileReadermp=newMpegAudioFileReader();in=mp.getAudioInputStream(mp3);AudioFormatbaseFormat=in.getFormat();targetFormat=newAudioFormat(AudioFormat.Encoding.PCM_SIGNED,baseFormat.getSampleRate(),16,baseFormat.getChannels(),baseFormat.getChannels()*2,baseFormat.getSampleRate(),false);audioInputStream=AudioSystem.getAudioInputStream(targetFormat,in);}catch(Exceptione){e.printStackTrace();}returnaudioInputStream;}}

注：开发工具需要安装lombok

写在最后，读者如需更多详情配置请移步到百度api官网进行查阅。

附录：1.语音合成错误码对应表：SDK本地检测参数返回的错误码：

error_codeerror_msg备注SDK108connectionorreaddatatimeout连接超时或读取数据超时

服务端返回的错误码：

错误码含义500不支持的输入501输入参数不正确502token验证失败503合成后端错误

2.语音识别错误码对应表：SDK本地检测参数返回的错误码：

error_codeerror_msg备注SDK108connectionorreaddatatimeout连接超时或读取数据超时

服务端返回的错误码

错误码用户输入/服务端含义一般解决方法3300用户输入错误输入参数不正确请仔细核对文档及参照demo，核对输入参数3301用户输入错误音频质量过差请上传清晰的音频3302用户输入错误鉴权失败token字段校验失败。请使用正确的API_KEY和SECRET_KEY生成3303服务端问题语音服务器后端问题请将api返回结果反馈至论坛或者QQ群3304用户请求超限用户的请求QPS超限请降低识别api请求频率（qps以appId计算，移动端如果共用则累计）3305用户请求超限用户的日pv（日请求量）超限请“申请提高配额”，如果暂未通过，请降低日请求量3307服务端问题语音服务器后端识别出错问题目前请确保16000的采样率音频时长低于30s，8000的采样率音频时长低于60s。如果仍有问题，请将api返回结果反馈至论坛或者QQ群3308用户输入错误音频过长音频时长不超过60s，请将音频时长截取为60s以下3309用户输入错误音频数据问题服务端无法将音频转为pcm格式，可能是长度问题，音频格式问题等。请将输入的音频时长截取为60s以下，并核对下音频的编码，是否是8K或者16K，16bits，单声道。3310用户输入错误输入的音频文件过大语音文件共有3种输入方式：json里的speech参数（base64后）；直接post二进制数据，及callback参数里url。分别对应三种情况：json超过10M；直接post的语音文件超过10M；callback里回调url的音频文件超过10M3311用户输入错误采样率rate参数不在选项里目前rate参数仅提供8000,16000两种，填写4000即会有此错误3312用户输入错误音频格式format参数不在选项里目前格式仅仅支持pcm，wav或amr，如填写mp3即会有此错误

python百度语音实时识别成文字（代码运行） 百度智能云语音识别怎么用的