facebook 发布 wav2letter 工具包，用于端到端自动语音识别 | 雷峰网-亚博电竞网

facebook 语音识别

2018/01/04 09:44

雷锋网 ai科技评论消息，日前， facebook 人工智能研究院发布 wav2letter 工具包，它是一个简单高效的端到端自动语音识别(asr)系统，实现了和这两篇论文中提出的架构。如果大家想现在就开始使用这个工具进行语音识别，facebook 提供 librispeech 数据集的预训练模型。

以下为对系统的要求，以及这一工具的安装教程，雷锋网 ai科技评论整理如下：

安装要求：

系统：macos 或 linux

torch：接下来会介绍安装教程

在 cpu 上训练：intel mkl

在 gpu 上训练：英伟达 cuda 工具包 (cudnn v5.1 for cuda 8.0)

音频文件读取：libsndfile

标准语音特征：fftw

安装：

mkl

如果想在 cpu 上进行训练，强烈建议安装 intel mkl

执行如下代码更新 .bashrc file

# we assume torch will be installed in $home/usr.
# change according to your needs.
export path=$home/usr/bin:$path

# this is to detect mkl during compilation
# but also to make sure it is found at runtime.
intel_dir=/opt/intel/lib/intel64
mkl_dir=/opt/intel/mkl/lib/intel64
mkl_inc_dir=/opt/intel/mkl/include

if [ ! -d "$intel_dir" ]; then
echo "$ warning: intel_dir out of date"
fi
if [ ! -d "$mkl_dir" ]; then
echo "$ warning: mkl_dir out of date"
fi
if [ ! -d "$mkl_inc_dir" ]; then
echo "$ warning: mkl_inc_dir out of date"
fi

# make sure mkl can be found by torch.
export ld_library_path=$ld_library_path:$intel_dir:$mkl_dir
export cmake_library_path=$ld_library_path
export cmake_include_path=$cmake_include_path:$mkl_inc_dir

luajit 和 luarocks

执行如下代码可以在 $home/usr 下安装 luajit 和 luarocks，如果你想要进行系统级安装，删掉代码中的 -dcmake_install_prefix=$home/usr 即可。

git clone https://github.com/torch/luajit-rocks.git
cd luajit-rocks
mkdir build; cd build
cmake .. -dcmake_install_prefix=$home/usr -dwith_luajit21=off
make -j 4
make install
cd ../..

接下来，我们假定 luarocks 和 luajit 被安装在 $path 下，如果你把它们安装在 $home/usr 下了，可以执行 ~/usr/bin/luarocks 和 ~/usr/bin/luajit 这两段代码。

如果你想采用 wav2letter decoder，需要安装 kenlm。

这里需要用到：

# make sure boost is installed (with system/thread/test modules)
# actual command might vary depending on your system
sudo apt-get install libboost-dev libboost-system-dev libboost-thread-dev libboost-test-dev

boost 安装之后就可以安装 kenlm 了：

wget https://kheafield.com/code/kenlm.tar.gz
tar xfvz kenlm.tar.gzcd kenlm
mkdir build && cd build
cmake .. -dcmake_install_prefix=$home/usr -dcmake_position_independent_code=on
make -j 4
make install
cp -a lib/* ~/usr/lib # libs are not installed by default :(cd ../..

和

如果计划用到多 cpu/gpu（或者多设备），需要安装 openmpi 和 torchmpi

免责声明：我们非常鼓励大家重新编译 openmpi。标准发布版本中的 openmpi 二进制文件编译标记不一致，想要成功编译和运行 torchmpi，确定的编译标记至关重要。

先安装 openmpi：

wget https://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.2.tar.bz2
tar xfj openmpi-2.1.2.tar.bz2
cd openmpi-2.1.2; mkdir build; cd build
./configure --prefix=$home/usr --enable-mpi-cxx --enable-shared --with-slurm --enable-mpi-thread-multiple --enable-mpi-ext=affinity,cuda --with-cuda=/public/apps/cuda/9.0
make -j 20 all
make install

注意：也可以执行 openmpi-3.0.0.tar.bz2，但需要删掉 --enable-mpi-thread-multiple。

接下来可以安装 torchmpi 了：

mpi_cxx_compiler=$home/usr/bin/mpicxx ~/usr/bin/luarocks install torchmpi

torch 和其他 torch 包

luarocks install torch
luarocks install cudnn # for gpu supportluarocks install cunn # for gpu support

wav2letter 包

git clone https://github.com/facebookresearch/wav2letter.git
cd wav2letter
cd gtn && luarocks make rocks/gtn-scm-1.rockspec && cd ..
cd speech && luarocks make rocks/speech-scm-1.rockspec && cd ..
cd torchnet-optim && luarocks make rocks/torchnet-optim-scm-1.rockspec && cd ..
cd wav2letter && luarocks make rocks/wav2letter-scm-1.rockspec && cd ..
# assuming here you got kenlm in $home/kenlm
# and only if you plan to use the decoder:
cd beamer && kenlm_inc=$home/kenlm luarocks make rocks/beamer-scm-1.rockspec && cd ..

训练 wav2letter 模型

数据预处理

数据文件夹中有预处理不同数据集的多个脚本，现在我们只提供预处理 librispeech 和 timit 数据集的脚本。

下面是预处理 librispeech asr 数据集的案例：

wget http://www.openslr.org/resources/12/dev-clean.tar.gz
tar xfvz dev-clean.tar.gz
# repeat for train-clean-100, train-clean-360, train-other-500, dev-other, test-clean, test-other
luajit ~/wav2letter/data/librispeech/create.lua ~/librispeech ~/librispeech-proc
luajit ~/wav2letter/data/utils/create-sz.lua librispeech-proc/train-clean-100 librispeech-proc/train-clean-360 librispeech-proc/train-other-500 librispeech-proc/dev-clean librispeech-proc/dev-other librispeech-proc/test-clean librispeech-proc/test-other

训练

mkdir experiments
luajit ~/wav2letter/train.lua --train -rundir ~/experiments -runname hello_librispeech -arch ~/wav2letter/arch/librispeech-glu-highdropout -lr 0.1 -lrcrit 0.0005 -gpu 1 -linseg 1 -linlr 0 -linlrcrit 0.005 -onorm target -nthread 6 -dictdir ~/librispeech-proc -datadir ~/librispeech-proc -train train-clean-100 train-clean-360 train-other-500 -valid dev-clean dev-other -test test-clean test-other -gpu 1 -sqnorm -mfsc -melfloor 1 -surround "|" -replabel 2 -progress -wnorm -normclamp 0.2 -momentum 0.9 -weightdecay 1e-05

多 gpu 训练

利用 openmpi

mpirun -n 2 --bind-to none ~/torchmpi/scripts/wrap.sh luajit ~/wav2letter/train.lua --train -mpi -gpu 1 ...

运行 decoder（推理阶段）

为了运行 decoder，需要做少量预处理。

首先创建一个字母词典，其中包括在 wav2letter 中用到的特殊重复字母：

cat ~/librispeech-proc/letters.lst >> ~/librispeech-proc/letters-rep.lst && echo "1" >> ~/librispeech-proc/letters-rep.lst && echo "2" >> ~/librispeech-proc/letters-rep.lst

然后将得到一个语言模型，并对这个模型进行预处理。这里，我们将使用预先训练过的 librispeech 语言模型，大家也可以用 kenlm 训练自己的模型。然后，我们对模型进行预处理，脚本可能会对错误转录的单词给予警告，这不是什么大问题，因为这些词很少见。

wget http://www.openslr.org/resources/11/3-gram.pruned.3e-7.arpa.gz luajit
~/wav2letter/data/utils/convert-arpa.lua ~/3-gram.pruned.3e-7.arpa.gz ~/3-gram.pruned.3e-7.arpa ~/dict.lst -preprocess ~/wav2letter/data/librispeech/preprocess.lua -r 2 -letters letters-rep.lst

可选项：利用 kenlm 将模型转换成二进制格式，加载起来将会更快。

build_binary 3-gram.pruned.3e-7.arpa 3-gram.pruned.3e-7.bin

现在运行 test.lua lua，可以生成 emission。下面的脚本可以显示出字母错误率 (ler) 和单词错误率 (wer)。

luajit ~/wav2letter/test.lua ~/experiments/hello_librispeech/001_model_dev-clean.bin -progress -show -test dev-clean -save

一旦存储好 emission，可以执行 decoder 来计算 wer：

luajit ~/wav2letter/decode.lua ~/experiments/hello_librispeech dev-clean -show -letters ~/librispeech-proc/letters-rep.lst -words ~/dict.lst -lm ~/3-gram.pruned.3e-7.arpa -lmweight 3.1639 -beamsize 25000 -beamscore 40 -nthread 10 -smearing max -show

预训练好的模型：

我们提供训练充分的 librispeech 模型：

wget https://s3.amazonaws.com/wav2letter/models/librispeech-glu-highdropout.bin

注意：该模型是在 facebook 的框架下训练好的，因此需要用稍微不同的参数来运行 test.lua

luajit ~/wav2letter/test.lua ~/librispeech-glu-highdropout.bin -progress -show -test dev-clean -save -datadir ~/librispeech-proc/ -dictdir ~/librispeech-proc/ -gfsai

大家可以加入 wav2letter 社群

facebook：

google 社群：

via：

雷锋网 ai 科技评论编译整理。