whisper.cpp测试与使用
whipser.cpp安装完毕后,加载了多个大模型,分别进行测试。
测试项目
下载模型命令:
sh ./models/download-ggml-model.sh base
测试命令:
# 转化成wav
ffmpeg -i samples/meeting.m4a -ar 16000 -ac 1 -c:a pcm_s16le samples/meeting.wav
# 运行命令
./build/bin/whisper-cli \
-f samples/meeting.wav \ # 语音文件
-m models/ggml-base.bin \ # 使用的模型
-l zh \ # 中文
-t 8 # 8线程
设备性能:
- 7840HS,单核CPU跑,不使用多进程,默认4线程
| Model | Disk | Mem | 时长 | 转化时间 | 速度 | 测试结果 |
|---|---|---|---|---|---|---|
| tiny | 75 MiB | ~273 MB | 34s | 1.1s | 31x | 需要加上-l zh参数,识别中文,效果不好 |
| base | 142 MiB | ~388 MB | 34s | 2.2s | 15x | 需要加上-l zh参数,识别中文,效果不好 |
| small | 466 MiB | ~852 MB | 34s | 9.8s | 3.5x | 需要加上-l zh参数,识别中文 |
| medium | 1.5 GiB | ~2.1 GB | 34s | 21s | 1.6x | 需要加上-l zh参数 |
| large-v3 | 2.9 GiB | ~3.9 GB | 34s | 40.7s | 0.83x | |
| large-v3-turbo.bin | 1.6GB | 2.6GB | 34s | 33s | 1.03x |
tiny测试
必须要指定中文名称,使用命令./build/bin/whisper-cli -f samples/test.wav -m models/ggml-tiny.bin -l zh, 相同会议识别如下:
[00:00:00.000 --> 00:00:07.680] 现在来你跟我说两句话
[00:00:07.680 --> 00:00:10.240] 我看一次
[00:00:10.240 --> 00:00:11.780] 两句话
[00:00:11.780 --> 00:00:13.320] 好
[00:00:13.320 --> 00:00:17.400] 请一下落音效果怎么样
[00:00:17.400 --> 00:00:20.480] 你这三两句话
[00:00:20.480 --> 00:00:23.040] 好
[00:00:29.180 --> 00:00:30.180] 好多吗?
whisper_print_timings: load time = 46.42 ms
whisper_print_timings: fallbacks = 1 p / 0 h
whisper_print_timings: mel time = 23.61 ms
whisper_print_timings: sample time = 82.86 ms / 282 runs ( 0.29 ms per run)
whisper_print_timings: encode time = 706.36 ms / 2 runs ( 353.18 ms per run)
whisper_print_timings: decode time = 1.88 ms / 1 runs ( 1.88 ms per run)
whisper_print_timings: batchd time = 193.39 ms / 269 runs ( 0.72 ms per run)
whisper_print_timings: prompt time = 50.64 ms / 96 runs ( 0.53 ms per run)
whisper_print_timings: total time = 1128.47 ms
base测试
如果不指定语言,会如下所示:
[00:00:00.000 --> 00:00:02.000] "What do you want to say?"
[00:00:02.000 --> 00:00:05.000] "What do you want to say to me?"
[00:00:05.000 --> 00:00:06.000] "What do you want to say to me?"
[00:00:06.000 --> 00:00:07.000] "What do you want to say to me?"
[00:00:07.000 --> 00:00:08.000] "What do you want to say to me?"
[00:00:08.000 --> 00:00:09.000] "What do you want to say to me?"
[00:00:09.000 --> 00:00:10.000] "What do you want to say to me?"
[00:00:10.000 --> 00:00:11.000] "What do you want to say to me?"
[00:00:11.000 --> 00:00:12.000] "What do you want to say to me?"
[00:00:12.000 --> 00:00:13.000] "What do you want to say to me?"
[00:00:13.000 --> 00:00:14.000] "What do you want to say to me?"
[00:00:14.000 --> 00:00:15.000] "What do you want to say to me?"
[00:00:15.000 --> 00:00:16.000] "What do you want to say to me?"
[00:00:16.000 --> 00:00:17.000] "What do you want to say to me?"
[00:00:17.000 --> 00:00:18.000] "What do you want to say to me?"
[00:00:18.000 --> 00:00:19.000] "What do you want to say to me?"
[00:00:19.000 --> 00:00:20.000] "What do you want to say to me?"
[00:00:20.000 --> 00:00:21.000] "What do you want to say to me?"
[00:00:21.000 --> 00:00:36.000] "What do you want to say to me?"
必须要指定中文名称,使用命令./build/bin/whisper-cli -f samples/test.wav -m models/ggml-base.bin -l zh, 相同会议识别如下:
[00:00:00.000 --> 00:00:07.680] 现在来你跟我说两句话
[00:00:07.680 --> 00:00:10.240] 我看一次
[00:00:10.240 --> 00:00:11.780] 两句话
[00:00:11.780 --> 00:00:13.320] 好
[00:00:13.320 --> 00:00:17.400] 请一下落音效果怎么样
[00:00:17.400 --> 00:00:20.480] 你这三两句话
[00:00:20.480 --> 00:00:23.040] 好
[00:00:29.180 --> 00:00:30.180] 好多吗?
whisper_print_timings: load time = 46.42 ms
whisper_print_timings: fallbacks = 1 p / 0 h
whisper_print_timings: mel time = 23.61 ms
whisper_print_timings: sample time = 82.86 ms / 282 runs ( 0.29 ms per run)
whisper_print_timings: encode time = 706.36 ms / 2 runs ( 353.18 ms per run)
whisper_print_timings: decode time = 1.88 ms / 1 runs ( 1.88 ms per run)
whisper_print_timings: batchd time = 193.39 ms / 269 runs ( 0.72 ms per run)
whisper_print_timings: prompt time = 50.64 ms / 96 runs ( 0.53 ms per run)
whisper_print_timings: total time = 1128.47 ms
small
可能人声音太嘈杂了,也需要加-l zh,识别如下:
[00:00:00.000 --> 00:00:07.460] 现在来你跟我说两句话
[00:00:07.460 --> 00:00:11.640] 我看你是 你想说啥
[00:00:11.640 --> 00:00:17.280] 好 听一下录音效果怎么样
[00:00:17.280 --> 00:00:20.240] 你再说两句话
[00:00:20.240 --> 00:00:22.880] 好
[00:00:29.240 --> 00:00:30.280] 好 听一下
whisper_print_timings: load time = 208.93 ms
whisper_print_timings: fallbacks = 1 p / 0 h
whisper_print_timings: mel time = 24.02 ms
whisper_print_timings: sample time = 515.76 ms / 940 runs ( 0.55 ms per run)
whisper_print_timings: encode time = 5609.63 ms / 2 runs ( 2804.82 ms per run)
whisper_print_timings: decode time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: batchd time = 3176.09 ms / 928 runs ( 3.42 ms per run)
whisper_print_timings: prompt time = 231.88 ms / 96 runs ( 2.42 ms per run)
whisper_print_timings: total time = 9850.00 ms
medium
[00:00:00.000 --> 00:00:08.000] 你跟我說兩句話
[00:00:08.000 --> 00:00:11.000] 好你試
[00:00:11.000 --> 00:00:13.000] 你想說啥
[00:00:13.000 --> 00:00:15.000] 好
[00:00:15.000 --> 00:00:18.000] 聽一下錄音效果怎麼樣
[00:00:18.000 --> 00:00:21.000] 你再說兩句話
[00:00:21.000 --> 00:00:23.000] 好
[00:00:23.000 --> 00:00:33.000] 他怎麼了
whisper_print_timings: load time = 594.03 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 24.38 ms
whisper_print_timings: sample time = 80.54 ms / 253 runs ( 0.32 ms per run)
whisper_print_timings: encode time = 18115.92 ms / 2 runs ( 9057.96 ms per run)
whisper_print_timings: decode time = 62.90 ms / 3 runs ( 20.97 ms per run)
whisper_print_timings: batchd time = 2026.68 ms / 243 runs ( 8.34 ms per run)
whisper_print_timings: prompt time = 329.93 ms / 48 runs ( 6.87 ms per run)
whisper_print_timings: total time = 21424.55 ms
large-v3
[00:00:00.000 --> 00:00:07.600] 现在来你跟我说两句话
[00:00:07.600 --> 00:00:12.000] 我看你是你要说啥
[00:00:12.000 --> 00:00:17.400] 好听一下录音效果怎么样
[00:00:17.400 --> 00:00:20.400] 你再说两句话
[00:00:20.400 --> 00:00:34.440] 他怎么
whisper_print_timings: load time = 1698.18 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 29.04 ms
whisper_print_timings: sample time = 78.72 ms / 232 runs ( 0.34 ms per run)
whisper_print_timings: encode time = 34552.24 ms / 2 runs ( 17276.12 ms per run)
whisper_print_timings: decode time = 77.53 ms / 2 runs ( 38.77 ms per run)
whisper_print_timings: batchd time = 3319.69 ms / 223 runs ( 14.89 ms per run)
whisper_print_timings: prompt time = 547.42 ms / 43 runs ( 12.73 ms per run)
whisper_print_timings: total time = 40676.03 ms
ggml-large-v3-turbo
[00:00:00.000 --> 00:00:07.440] 现在来 你跟我说两句话
[00:00:07.440 --> 00:00:17.240] 你是 你要说啥 好 听一下录音效果怎么样
[00:00:17.240 --> 00:00:22.800] 你再说两句话 好
[00:00:22.800 --> 00:00:34.440] 他怎么
whisper_print_timings: load time = 1131.77 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 27.35 ms
whisper_print_timings: sample time = 68.56 ms / 208 runs ( 0.33 ms per run)
whisper_print_timings: encode time = 31061.01 ms / 2 runs ( 15530.50 ms per run)
whisper_print_timings: decode time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: batchd time = 527.58 ms / 201 runs ( 2.62 ms per run)
whisper_print_timings: prompt time = 102.62 ms / 44 runs ( 2.33 ms per run)
whisper_print_timings: total time = 32967.73 ms
本文发布于2025年04月21日22:53,已经过了251天,若内容或图片失效,请留言反馈 -
github大模型软件评测
1. 背景 互联网上有许多开源的大模型软件,每种软件都有其独特的功能和优缺点。我曾试用过许多大模型软件,但最终效果和具体内容都未能完全记住。因此,本文将主要记录我所了解和使用过的大模型软件。此外,文中...
2025/02/19
-
whisper.cpp安装
1. 背景 whisper是OpenAI官方发布的一款开源语音识别大模型,使用python实现。可以将语音信息转化为文本信息。其实也叫做ASR"自动语音识别”(Automatic Speech Rec...
2025/03/02
-
【1】COGVIDEO生成视频
1. 背景 cogvideoX是清华和清影同源的开源视频生成大模型,详情可以查看:CogvideoXhttps://modelscope.cn/models/ZhipuAI/CogVideoX-5b-...
2024/11/19
-
whisper.cpp测试与使用
whipser.cpp安装完毕后,加载了多个大模型,分别进行测试。 测试项目 下载模型命令: bash sh ./models/download-ggml-model.sh base 测试命令: 转化...
2025/04/21
求索空间
apostle9891
360视觉云
360智慧生活
gitea
导航
hoppscotch
暂无评论