loading

Loading

请输入关键字开始搜索
    首页 AI专栏开源部署

    whisper.cpp测试与使用

    分类:开源部署
    字数: (7901)
    阅读: (226)
    0

    whipser.cpp安装完毕后,加载了多个大模型,分别进行测试。

    测试项目

    下载模型命令:

    sh ./models/download-ggml-model.sh base

    测试命令:

    # 转化成wav
    ffmpeg -i samples/meeting.m4a -ar 16000 -ac 1 -c:a pcm_s16le samples/meeting.wav
    # 运行命令
    ./build/bin/whisper-cli \
     -f samples/meeting.wav \   # 语音文件
     -m models/ggml-base.bin \  # 使用的模型
     -l zh \                    # 中文
     -t 8                       # 8线程

    设备性能:

    • 7840HS,单核CPU跑,不使用多进程,默认4线程
    Model Disk Mem 时长 转化时间 速度 测试结果
    tiny 75 MiB ~273 MB 34s 1.1s 31x 需要加上-l zh参数,识别中文,效果不好
    base 142 MiB ~388 MB 34s 2.2s 15x 需要加上-l zh参数,识别中文,效果不好
    small 466 MiB ~852 MB 34s 9.8s 3.5x 需要加上-l zh参数,识别中文
    medium 1.5 GiB ~2.1 GB 34s 21s 1.6x 需要加上-l zh参数
    large-v3 2.9 GiB ~3.9 GB 34s 40.7s 0.83x
    large-v3-turbo.bin 1.6GB 2.6GB 34s 33s 1.03x

    tiny测试

    必须要指定中文名称,使用命令./build/bin/whisper-cli -f samples/test.wav -m models/ggml-tiny.bin -l zh, 相同会议识别如下:

    [00:00:00.000 --> 00:00:07.680]  现在来你跟我说两句话
    [00:00:07.680 --> 00:00:10.240]  我看一次
    [00:00:10.240 --> 00:00:11.780]  两句话
    [00:00:11.780 --> 00:00:13.320]  好
    [00:00:13.320 --> 00:00:17.400]  请一下落音效果怎么样
    [00:00:17.400 --> 00:00:20.480]  你这三两句话
    [00:00:20.480 --> 00:00:23.040]  好
    [00:00:29.180 --> 00:00:30.180]  好多吗?
    whisper_print_timings:     load time =    46.42 ms
    whisper_print_timings:     fallbacks =   1 p /   0 h
    whisper_print_timings:      mel time =    23.61 ms
    whisper_print_timings:   sample time =    82.86 ms /   282 runs (     0.29 ms per run)
    whisper_print_timings:   encode time =   706.36 ms /     2 runs (   353.18 ms per run)
    whisper_print_timings:   decode time =     1.88 ms /     1 runs (     1.88 ms per run)
    whisper_print_timings:   batchd time =   193.39 ms /   269 runs (     0.72 ms per run)
    whisper_print_timings:   prompt time =    50.64 ms /    96 runs (     0.53 ms per run)
    whisper_print_timings:    total time =  1128.47 ms
    

    base测试

    如果不指定语言,会如下所示:

    [00:00:00.000 --> 00:00:02.000]   "What do you want to say?"
    [00:00:02.000 --> 00:00:05.000]   "What do you want to say to me?"
    [00:00:05.000 --> 00:00:06.000]   "What do you want to say to me?"
    [00:00:06.000 --> 00:00:07.000]   "What do you want to say to me?"
    [00:00:07.000 --> 00:00:08.000]   "What do you want to say to me?"
    [00:00:08.000 --> 00:00:09.000]   "What do you want to say to me?"
    [00:00:09.000 --> 00:00:10.000]   "What do you want to say to me?"
    [00:00:10.000 --> 00:00:11.000]   "What do you want to say to me?"
    [00:00:11.000 --> 00:00:12.000]   "What do you want to say to me?"
    [00:00:12.000 --> 00:00:13.000]   "What do you want to say to me?"
    [00:00:13.000 --> 00:00:14.000]   "What do you want to say to me?"
    [00:00:14.000 --> 00:00:15.000]   "What do you want to say to me?"
    [00:00:15.000 --> 00:00:16.000]   "What do you want to say to me?"
    [00:00:16.000 --> 00:00:17.000]   "What do you want to say to me?"
    [00:00:17.000 --> 00:00:18.000]   "What do you want to say to me?"
    [00:00:18.000 --> 00:00:19.000]   "What do you want to say to me?"
    [00:00:19.000 --> 00:00:20.000]   "What do you want to say to me?"
    [00:00:20.000 --> 00:00:21.000]   "What do you want to say to me?"
    [00:00:21.000 --> 00:00:36.000]   "What do you want to say to me?"

    必须要指定中文名称,使用命令./build/bin/whisper-cli -f samples/test.wav -m models/ggml-base.bin -l zh, 相同会议识别如下:

    [00:00:00.000 --> 00:00:07.680]  现在来你跟我说两句话
    [00:00:07.680 --> 00:00:10.240]  我看一次
    [00:00:10.240 --> 00:00:11.780]  两句话
    [00:00:11.780 --> 00:00:13.320]  好
    [00:00:13.320 --> 00:00:17.400]  请一下落音效果怎么样
    [00:00:17.400 --> 00:00:20.480]  你这三两句话
    [00:00:20.480 --> 00:00:23.040]  好
    [00:00:29.180 --> 00:00:30.180]  好多吗?
    whisper_print_timings:     load time =    46.42 ms
    whisper_print_timings:     fallbacks =   1 p /   0 h
    whisper_print_timings:      mel time =    23.61 ms
    whisper_print_timings:   sample time =    82.86 ms /   282 runs (     0.29 ms per run)
    whisper_print_timings:   encode time =   706.36 ms /     2 runs (   353.18 ms per run)
    whisper_print_timings:   decode time =     1.88 ms /     1 runs (     1.88 ms per run)
    whisper_print_timings:   batchd time =   193.39 ms /   269 runs (     0.72 ms per run)
    whisper_print_timings:   prompt time =    50.64 ms /    96 runs (     0.53 ms per run)
    whisper_print_timings:    total time =  1128.47 ms

    small

    可能人声音太嘈杂了,也需要加-l zh,识别如下:

    
    [00:00:00.000 --> 00:00:07.460]  现在来你跟我说两句话
    [00:00:07.460 --> 00:00:11.640]  我看你是 你想说啥
    [00:00:11.640 --> 00:00:17.280]  好 听一下录音效果怎么样
    [00:00:17.280 --> 00:00:20.240]  你再说两句话
    [00:00:20.240 --> 00:00:22.880]  好
    [00:00:29.240 --> 00:00:30.280]  好 听一下
    
    whisper_print_timings:     load time =   208.93 ms
    whisper_print_timings:     fallbacks =   1 p /   0 h
    whisper_print_timings:      mel time =    24.02 ms
    whisper_print_timings:   sample time =   515.76 ms /   940 runs (     0.55 ms per run)
    whisper_print_timings:   encode time =  5609.63 ms /     2 runs (  2804.82 ms per run)
    whisper_print_timings:   decode time =     0.00 ms /     1 runs (     0.00 ms per run)
    whisper_print_timings:   batchd time =  3176.09 ms /   928 runs (     3.42 ms per run)
    whisper_print_timings:   prompt time =   231.88 ms /    96 runs (     2.42 ms per run)
    whisper_print_timings:    total time =  9850.00 ms
    

    medium

    
    [00:00:00.000 --> 00:00:08.000]  你跟我說兩句話
    [00:00:08.000 --> 00:00:11.000]  好你試
    [00:00:11.000 --> 00:00:13.000]  你想說啥
    [00:00:13.000 --> 00:00:15.000]  好
    [00:00:15.000 --> 00:00:18.000]  聽一下錄音效果怎麼樣
    [00:00:18.000 --> 00:00:21.000]  你再說兩句話
    [00:00:21.000 --> 00:00:23.000]  好
    [00:00:23.000 --> 00:00:33.000]  他怎麼了
    whisper_print_timings:     load time =   594.03 ms
    whisper_print_timings:     fallbacks =   0 p /   0 h
    whisper_print_timings:      mel time =    24.38 ms
    whisper_print_timings:   sample time =    80.54 ms /   253 runs (     0.32 ms per run)
    whisper_print_timings:   encode time = 18115.92 ms /     2 runs (  9057.96 ms per run)
    whisper_print_timings:   decode time =    62.90 ms /     3 runs (    20.97 ms per run)
    whisper_print_timings:   batchd time =  2026.68 ms /   243 runs (     8.34 ms per run)
    whisper_print_timings:   prompt time =   329.93 ms /    48 runs (     6.87 ms per run)
    whisper_print_timings:    total time = 21424.55 ms
    

    large-v3

    [00:00:00.000 --> 00:00:07.600]  现在来你跟我说两句话
    [00:00:07.600 --> 00:00:12.000]  我看你是你要说啥
    [00:00:12.000 --> 00:00:17.400]  好听一下录音效果怎么样
    [00:00:17.400 --> 00:00:20.400]  你再说两句话
    [00:00:20.400 --> 00:00:34.440]  他怎么
    
    whisper_print_timings:     load time =  1698.18 ms
    whisper_print_timings:     fallbacks =   0 p /   0 h
    whisper_print_timings:      mel time =    29.04 ms
    whisper_print_timings:   sample time =    78.72 ms /   232 runs (     0.34 ms per run)
    whisper_print_timings:   encode time = 34552.24 ms /     2 runs ( 17276.12 ms per run)
    whisper_print_timings:   decode time =    77.53 ms /     2 runs (    38.77 ms per run)
    whisper_print_timings:   batchd time =  3319.69 ms /   223 runs (    14.89 ms per run)
    whisper_print_timings:   prompt time =   547.42 ms /    43 runs (    12.73 ms per run)
    whisper_print_timings:    total time = 40676.03 ms
    

    ggml-large-v3-turbo

    
    [00:00:00.000 --> 00:00:07.440]   现在来 你跟我说两句话
    [00:00:07.440 --> 00:00:17.240]   你是 你要说啥 好 听一下录音效果怎么样
    [00:00:17.240 --> 00:00:22.800]   你再说两句话 好
    [00:00:22.800 --> 00:00:34.440]   他怎么
    
    whisper_print_timings:     load time =  1131.77 ms
    whisper_print_timings:     fallbacks =   0 p /   0 h
    whisper_print_timings:      mel time =    27.35 ms
    whisper_print_timings:   sample time =    68.56 ms /   208 runs (     0.33 ms per run)
    whisper_print_timings:   encode time = 31061.01 ms /     2 runs ( 15530.50 ms per run)
    whisper_print_timings:   decode time =     0.00 ms /     1 runs (     0.00 ms per run)
    whisper_print_timings:   batchd time =   527.58 ms /   201 runs (     2.62 ms per run)
    whisper_print_timings:   prompt time =   102.62 ms /    44 runs (     2.33 ms per run)
    whisper_print_timings:    total time = 32967.73 ms
    
    本文发布于2025年04月21日22:53,已经过了251天,若内容或图片失效,请留言反馈
    文章出处: 求索空间
    文章链接: https://blog.askerlab.com/whisper_test
    评论列表:
    empty

    暂无评论