A simple speech recognition task is to use only the acoustic observations to predict the most likely sequence of words, and, thus, not incorporate a language model. In order to do this with HTK, you need to first create a word loop grammar. Assuming the file wordList contains a list of potential words to be recognized, the following command creates the word loop grammar in the file wordLoop:
$ HBuild wordList wordLoop
The word loop grammar simply contains a network in which each word occurs with equal probability.
To recognize a speech file using this word loop grammar, run the following command (assuming that macros and hmmdefs contain the acoustic models, dict contains the pronouncing dictionary, and monophones contains a list of the HMM names):
$ HVite -H macros -H hmmdefs -w wordLoop dict monophones testFile
This type of recognition is slow if there are more than a few words in wordList, and the accuracy is generally low. It is useful, however, for experimental purposes.
Tags: HTK