February 7, 2014
I recently did a fresh installation of Cygwin on a new laptop and tried to run the TeX Live pdflatex on a .tex file. However, I encountered the following error indicating that the multirow.sty package was not installed:
LaTeX Error: File `multirow.sty’ not found.
To solve this, I ran Cygwin Setup and selected the texlive-collection-latexextra package from the Publishing category. After that, multirow.sty was installed, and pdflatex was able to find it successfully.
I was able to determine that textlive-collection-latexextra was the required package for multirow.sty by referring to this page, which provides a useful list of the contents of all of the Tex Live packages.
January 6, 2011
I ran the HTK command HResults to calculate the performance of a speech recognition experiment, and received the following error:
ERROR [+6550] LoadHTKLabels: Junk at end of HTK transcription
FATAL ERROR – Terminating program HResults
After inspecting the reference MLF file, I discovered a blank line in one of the transcriptions. After deleting this line, the error disappeared, and HResults completed successfully.
July 1, 2010
I recently wanted to train some acoustic models using the Switchboard corpus of conversational telephone speech, and I wanted the model names to be compatible with models that I had trained on a different corpus using the CMU pronouncing dictionary. The phone sets of the Switchboard dictionary and the CMU dictionary are very similar, but there are three differences that need to be fixed before they are completely compatible.
First, the Switchboard pronouncing dictionary can be downloaded here (along with the transcriptions). The filename of the dictionary in this release is sw-ms98-dict.text. Then, the CMU pronouncing dictionary can be downloaded here. For my purposes, I am disregarding the lexical stress information provided by the CMU dictionary, so I have removed all of the ‘0’, ‘1’, and ‘2’ labels following the vowels.
If you compare the phones in the two dictionary files, you will see that the following three phones are present in the Switchboard dictionary, but absent in the CMU dictionary: ax, el, and en. To convert the Switchboard dictionary transcriptions to match the CMU phone set, ax should be changed to AH, el should be changed to AH L, and en should be changed to AH N.
Here is a simple sed script that I wrote to apply these changes to the Switchboard dictionary file sw-ms98-dict.text (in addition to removing comments and empty lines):
$ sed '1 d' sw-ms98-dict.text \
| sed '/^#/ d' \
| sed '/^$/ d' \
| sed 's/ ax/ ah/g' \
| sed 's/ en/ ah n/g' \
| sed 's/ al/ ah l/g' \
| sed 's/ .*/\U&\E/'
Note that this sed script requires GNU sed in order to use the last command to change the replacement text to upper case.
June 23, 2010
I was attempting to compile HTK (version 3.4) on a 64-bit Linux system, and I received this error for HSLab:
/usr/bin/ld: skipping incompatible /usr/lib64/libX11.so when searching for -lX11
/usr/bin/ld: cannot find -lX11
collect2: ld returned 1 exit status
make: *** [HSLab] Error 1
I solved this by passing the following two arguments to the configure script:
$ configure --without-x --disable-hslab
After the new Makefiles were generated, make all worked and built all of the HTK tools except HSLab.
June 23, 2010
When I ran the this HCopy command (using HTK 3.4):
$ HCopy -C config -S train.scp
I received the following warning message for each file:
WARNING [-6371] ValidCodeParms: Using linear spectrum with PLP in HCopy
The feature vector I was attempting to use was defined in the configuration file to be:
TARGETKIND = PLP_0_D_A_Z
The warning message disappeared when I added the following line to the configuration file:
May 20, 2010
Many ESPS commands allow the use of parameter files to set parameters used by the algorithms (instead of specifying them as command-line arguments). For example, the man page of get_f0 (for pitch tracking) lists the parameters min_f0 and max_f0 that will specify the minimum and maximum F0 values to track. The default values for these parameters are 50.0 and 550.0 Hertz. To modify these with speaker-specific values for more accurate pitch tracking, create a parameter file with the following contents:
float min_f0 = 75;
float max_f0 = 300;
The format of each line of the parameter file is:
dataType name = value;
The data type for each parameter should be available in the man page for the command. To run the command using these parameters, you can either simply name the parameter file params and put it in the directory where you run the command, or use the -P command line option to specify the name of the file. For example, the following command will use the parameter file f0params.txt to produce pitch estimates for the audio file filename.wav:
$ get_f0 -P f0params.txt filename.wav filename.f0
March 2, 2010
While using the HTK (version 3.4.1) tool HLStats to build a bigram language model from a corpus of transcriptions of speech files, I encountered a segmentation fault. The command I ran was:
$ HLStats -b train.bigram -o train.wlist train/*.lab
where train.bigram is the filename where the language model should be saved, train.wlist contains a list of all of the unique words in the corpus, and train/ is a directory that stores a .lab orthographic transcription file for each speech file.
By turning the HTK verbosity to the highest level (-T 8) I was easily able to identify the file that caused the segmentation fault:
$ HLStats -T 8 -b train.bigram -o train.wlist train/*.lab
. . .
Processing file train/XXX.lab
I discovered that the file XXX.lab was empty: the transcriber did not transcribe anything because the speech in that file was inaudible. This was apparently the cause of the Segmentation Fault in HLStats. So, one solution would be to make sure all label files contain at least one transcribed word.
After glancing at the source code, though, in HLStats.c, it seemed like HLStats should be able to handle empty label files. Indeed, when I concatenate all of the label files into a Master Label File, HLStats works fine and prints a warning message for the empty label files:
$ HLStats -T 8 -b train.bigram -o train.wlist train.mlf
. . .
Processing file train/XXX.lab
WARNING [-1330] HLStats: Empty file XXX.lab in HLStats
. . .
So, it looks like HLStats isn’t able to process empty label files when they are listed individually as command line arguments.
March 1, 2010
A simple speech recognition task is to use only the acoustic observations to predict the most likely sequence of words, and, thus, not incorporate a language model. In order to do this with HTK, you need to first create a word loop grammar. Assuming the file wordList contains a list of potential words to be recognized, the following command creates the word loop grammar in the file wordLoop:
$ HBuild wordList wordLoop
The word loop grammar simply contains a network in which each word occurs with equal probability.
To recognize a speech file using this word loop grammar, run the following command (assuming that macros and hmmdefs contain the acoustic models, dict contains the pronouncing dictionary, and monophones contains a list of the HMM names):
$ HVite -H macros -H hmmdefs -w wordLoop dict monophones testFile
This type of recognition is slow if there are more than a few words in wordList, and the accuracy is generally low. It is useful, however, for experimental purposes.
July 9, 2009
Ogg Vorbis is an open source audio compression format (not proprietary, like mp3!). I recently needed to play / manipulate .ogg sound files that I downloaded from LibriVox, an excellent repository of free audiobooks. The sound manipulation program SoX can encode and decode Ogg Vorbis files, but requires the installation of additional libraries first.
After compiling and installing SoX 14.3.0 on Ubuntu 9.04, I tried to convert an Ogg Vorbis file to WAV format, but received the following error:
$ sox filename.ogg filename.wav
sox FAIL formats: no handler for detected file type ‘vorbis’
Another sign that something was wrong was the following output from running ./configure before compiling SoX:
OPTIONAL FILE FORMATS
To fix this problem and enable the use of Ogg Vorbis files with SoX, I did the following:
$ sudo apt-get install vorbis-tools
In addition I downloaded and installed libogg and libvorbis from Xiph.Org (in that order, since libvorbis depends on libogg). After that, I re-configured SoX, and saw
OPTIONAL FILE FORMATS
Then, after re-compiling and re-installing, SoX was able to process the Ogg Vorbis file.
July 9, 2009
SoX (short for Sound eXchange) is a very useful multi-purpose tool for manipulating sound files. It can convert between many different file formants, and can also add various effects to the signal. It can be downloaded here.
After downloading the source code (version 14.3.0) and succesfully compiling it on Ubuntu 9.04, I received the following error when trying to start up SoX:
sox: error while loading shared libraries: libsox.so.1: cannot open shared object file: No such file or directory
After a little poking around, I see that the file libsox.so.1 was installed to /usr/local/lib when SoX was installed. Also, /usr/local/lib is included in the file /etc/ld.so.conf (the system file which contains a list of shared libraries). However, to link these newly installed libraries and make them accessible, I needed to run the following command:
$ sudo ldconfig
After running that, SoX worked just fine.