HLStats Segmentation Fault

While using the HTK (version 3.4.1) tool HLStats to build a bigram language model from a corpus of transcriptions of speech files, I encountered a segmentation fault. The command I ran was:


$ HLStats -b train.bigram -o train.wlist train/*.lab
Segmentation fault

where train.bigram is the filename where the language model should be saved, train.wlist contains a list of all of the unique words in the corpus, and train/ is a directory that stores a .lab orthographic transcription file for each speech file.

By turning the HTK verbosity to the highest level (-T 8) I was easily able to identify the file that caused the segmentation fault:


$ HLStats -T 8 -b train.bigram -o train.wlist train/*.lab
. . .
Processing file train/XXX.lab
Segmentation fault

I discovered that the file XXX.lab was empty: the transcriber did not transcribe anything because the speech in that file was inaudible. This was apparently the cause of the Segmentation Fault in HLStats. So, one solution would be to make sure all label files contain at least one transcribed word.

After glancing at the source code, though, in HLStats.c, it seemed like HLStats should be able to handle empty label files. Indeed, when I concatenate all of the label files into a Master Label File, HLStats works fine and prints a warning message for the empty label files:


$ HLStats -T 8 -b train.bigram -o train.wlist train.mlf
. . .
Processing file train/XXX.lab
WARNING [-1330] HLStats: Empty file XXX.lab in HLStats
. . .

So, it looks like HLStats isn’t able to process empty label files when they are listed individually as command line arguments.

Advertisement

Tags:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.