Programming to Live: August 2014

Recently CMU's PocketSphinx project published the library for Android platform, including a demo code. On the project webpage it also provides a short tutorial at http://cmusphinx.sourceforge.net/wiki/tutorialandroid.

I downloaded the lib and the demo project, spending 2 hours to understand the flow the demo and figure out how to use it in a service for Android app. I am going to list some details I can still remember, and hope it would be helpful for someone who wants to use the lib.

I am assuming readers of this post are familiar with the basic concepts of Android platform and have already setup the development environment in the computer. Eclipse IDE is used in this project.

1. Go to the link http://cmusphinx.sourceforge.net/wiki/download/ to download the library http://sourceforge.net/projects/cmusphinx/files/pocketsphinx/0.8/

2. Go to http://cmusphinx.sourceforge.net/wiki/tutorialandroid to download the demo project, follow the steps to set up the development environment if it has not been set up on the computer

3. Create a service that implements RecognitionListener. Please remember to import the packages in the java code. Most of the code is copied over from the demo code from PocketSphinx project. Please pay attention to the lines I have commented out. Since this is a service, it shall not have UI related lines of code. I am still keeping some of the lines in the original demo code.

package com.me.android.test;

import static edu.cmu.pocketsphinx.SpeechRecognizerSetup.defaultSetup;

import java.io.File;

import java.io.IOException;

import java.util.HashMap;

import android.app.Service;

import android.content.Context;

import android.content.Intent;

import android.os.AsyncTask;

import android.os.IBinder;

import android.util.Log;

import edu.cmu.pocketsphinx.Assets;

import edu.cmu.pocketsphinx.Hypothesis;

import edu.cmu.pocketsphinx.RecognitionListener;

import edu.cmu.pocketsphinx.SpeechRecognizer;

* Based on the sample from PocketSphinxDemo

public class PocketSphinxVoiceRecognitionService extends Service implements RecognitionListener{

public static String TAG ="PocketSphinxVoiceRecognitionService";

private static final String KWS_SEARCH = "wakeup";

private static final String FORECAST_SEARCH = "forecast";

private static final String DIGITS_SEARCH = "digits";

private static final String MENU_SEARCH = "menu";

private static final String KEYPHRASE = "oh mighty computer";

private SpeechRecognizer recognizer;

private HashMap<String, Integer> captions;

public Context context;

@Override

public void onCreate() {

// TODO Auto-generated method stub

super.onCreate();

context = getApplicationContext();

// Prepare the data for UI

Log.i(TAG, "onCreate: setup search options");

captions = new HashMap<String, Integer>();

captions.put(KWS_SEARCH, R.string.kws_caption);

captions.put(MENU_SEARCH, R.string.menu_caption);

captions.put(DIGITS_SEARCH, R.string.digits_caption);

captions.put(FORECAST_SEARCH, R.string.forecast_caption);

//setContentView(R.layout.main);

//((TextView) findViewById(R.id.caption_text)).setText("Preparing the recognizer");

// Recognizer initialization is a time-consuming and it involves IO,

// so we execute it in async task

new AsyncTask<Void, Void, Exception>() {

@Override

protected Exception doInBackground(Void... params) {

try {

Log.i(TAG, "AsyncTask:doInBackground: setup recognizr");

Assets assets = new Assets(context);

File assetDir = assets.syncAssets();

setupRecognizer(assetDir);

} catch (IOException e) {

return e;

}

return null;

}

@Override

protected void onPostExecute(Exception result) {

if (result != null) {

//((TextView) findViewById(R.id.caption_text)).setText("Failed to init recognizer " + result);

Log.e(TAG, "onPostExecute: failed to init recognizer: " + result);

} else {

Log.i(TAG, "AsyncTask: onPostExecute: swtich to the digit search");

switchSearch(/*KWS_SEARCH*/DIGITS_SEARCH);

}

}.execute();

}

private void switchSearch(String searchName) {

recognizer.stop();

recognizer.startListening(searchName);

String caption = getResources().getString(captions.get(searchName));

//((TextView) findViewById(R.id.caption_text)).setText(caption);

}

private void setupRecognizer(File assetsDir) {

File modelsDir = new File(assetsDir, "models");

recognizer = defaultSetup()

.setAcousticModel(new File(modelsDir, "hmm/en-us-semi"))

.setDictionary(new File(modelsDir, "dict/cmu07a.dic"))

.setRawLogDir(assetsDir).setKeywordThreshold(1e-20f)

.getRecognizer();

recognizer.addListener(this);

// Create keyword-activation search.

recognizer.addKeyphraseSearch(KWS_SEARCH, KEYPHRASE);

// Create grammar-based searches.

File menuGrammar = new File(modelsDir, "grammar/menu.gram");

recognizer.addGrammarSearch(MENU_SEARCH, menuGrammar);

File digitsGrammar = new File(modelsDir, "grammar/digits.gram");

recognizer.addGrammarSearch(DIGITS_SEARCH, digitsGrammar);

// Create language model search.

File languageModel = new File(modelsDir, "lm/weather.dmp");

recognizer.addNgramSearch(FORECAST_SEARCH, languageModel);

}

public PocketSphinxVoiceRecognitionService() {

// TODO Auto-generated constructor stub

}

@Override

public void onBeginningOfSpeech() {

// TODO Auto-generated method stub

}

@Override

public void onEndOfSpeech() {

// TODO Auto-generated method stub

if (DIGITS_SEARCH.equals(recognizer.getSearchName())

|| FORECAST_SEARCH.equals(recognizer.getSearchName()))

switchSearch(/*KWS_SEARCH*/DIGITS_SEARCH);

}

@Override

public void onPartialResult(Hypothesis hypothesis) {

// TODO Auto-generated method stub

String text = hypothesis.getHypstr();

if (text.equals(KEYPHRASE))

switchSearch(MENU_SEARCH);

else if (text.equals(DIGITS_SEARCH))

switchSearch(DIGITS_SEARCH);

else if (text.equals(FORECAST_SEARCH))

switchSearch(FORECAST_SEARCH);

else {

//((TextView) findViewById(R.id.result_text)).setText(text);

}

@Override

public void onResult(Hypothesis hypothesis) {

// TODO Auto-generated method stub

//((TextView) findViewById(R.id.result_text)).setText("");

if (hypothesis != null) {

String text = hypothesis.getHypstr();

//makeText(getApplicationContext(), text, Toast.LENGTH_SHORT).show();

Log.i(TAG, "onResult: " + text);

}

@Override

public IBinder onBind(Intent intent) {

// TODO Auto-generated method stub

return null;

}

4. Add the service to the manifest file

<service android:name="com.me.android.test.PocketSphinxVoiceRecognitionService"

android:enabled="true"

android:exported="false" />

5. Copy and paste the following folders from the demo project to the "libs" folder of the current project

armeabi

armeabi-v7a

mips

x86

6. Copy the lib pocketsphinx-android-0.8-nolib.jar to the the "libs" folder of current project

7. Copy the "assets" folder from the demo project to the current project

8. Copy the values in the strings.xml under "Values" folder of the demo project, and paste them to the strings.xml of current project.

9. Add your own words to be used in the current project

Open the file digits.gram under "assets/sync/models/grammar" and following the format to add your own words to be used in the current project

10. Copy the following files from the demo project to the current project

assets.xml

custom_rules.xml

build.xml

and make some changes such as project names, etc.

11. In other code such as the main activity or widget service provider, check if this service has been started. If it is not, then start up this service

Up to this step, the project shall be able to get compiled and build. The apk can also be launched to the device.

12. The last but the most important step to make the service work and take the voice command

Project -> Properties -> Builders -> New

Configure the builds and add the asset list builder to the project. Please refer to the parameters used by "Asset List Builder" of the demo project. The asset list builder shall be in the top of the builder list.

This builder will build the assets of voice recognition.

13. Build the project and launch the apk to the Android device

14. The app will be able to take the voice commands that use the words listed in the digits.gram file.

Programming to Live

Sunday, August 24, 2014

Using CMU's PocketSphinx voice recognition engine to create a service for Android app