Voice to Text App for Android Tutorial

Voice To Text services are all the rage these days.  Just look at the VUI (Voice User Interface) systems many of us have in our homes like Google Home and Alexa.  Industry leaders like Google, Amazon, Apple, and Microsoft are all betting on voice assistants by creating interfaces like Google Assistant, Siri, Alexa, OneNote dictation and Cortana. These systems wouldn’t be possible without some kind of underlying technology that is able to convert our spoken audio into discernible text. 

If you’ve ever wanted to integrate speech to text into your applications, you might have though that it would be difficult or complicated. However, using Googles Speech to Text API, its actually quite simple.  Many apps out there utilize the speech recognition feature, such as Google Keep. In this article, we’ll learn how to build a simple speech to text app on Android using the speech recognizer library provided by Google.  By the end of this short tutorial, you’ll have a working version of a voice to text dictation app!

Prerequisites

  • Android Studio IDE downloaded and configured on your PC or Mac.
  • An Android Smartphone or Tablet (Unfortunately, Voice Recognition does not work on an emulator).
  • Basic Android knowledge like how to create a layout and run a program.

Step 1.  Create a Simple UI

To get started, let’s first create a new project in Android Studio. Go ahead and create a new project using the “Create New Project” wizard in Android Studio.  I’ll name my project “Speech Recognizer”, but you can name it whatever you like.

Start a New Project in Android Studio

I’ll also choose to create an empty activity for this example.

Android Studio Select Blank Activity

 

Android Interface with Microphone Button

For the purpose of this simple speech to text tool, let’s create a new Activity with only two elements on the page:  a TextView and a Button or ImageButton.  We’ll use the button to turn on/off listening for speech, while the TextView will be used to place our converted speech text on the screen.  In this example I’ve also wrapped theTextView in a ScrollView, just in case there is enough text to fill the page!  The interface looks pretty simple, but will do just fine for what we need.

 

 

 

 

 

 

 

 

 

Currently, my activity_main.xml file looks like this:

<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:orientation="vertical" >

    <ScrollView
        android:layout_width="match_parent"
        android:layout_height="279dp">

        <TextView
            android:id="@+id/textView"
            android:layout_width="match_parent"
            android:layout_height="match_parent"
            android:layout_centerHorizontal="true"
            android:layout_margin="10dp"
            android:layout_marginTop="20dp"
            android:textSize="25sp" />
    </ScrollView>

    <ProgressBar
        android:layout_width="0dp"
        android:layout_height="0dp"
        android:id="@+id/progressbar"
        />

    <ImageButton
        android:id="@+id/recordButton"
        android:layout_width="100dp"
        android:layout_height="100dp"
        android:layout_alignParentBottom="true"
        android:layout_centerHorizontal="true"
        android:layout_marginBottom="112dp"
        android:background="@drawable/microphone_button_off" />

</RelativeLayout>

Step 2. Implement RecognizerListener in our Main Activity

Now that you have a UI with a TextView and Button, it’s time to get into the meat of the code.    To get our speech recognizer to work, we’ll need to implement RecognizerListener.  This class will enable use to to use Google’s voice to text engine and add our own custom actions at different points in the voice recognition life cycle.  To do this, you will want to change the class declaration in your MainActivity.java to:

public class MainActivity extends AppCompatActivity implements
        RecognitionListener {

 

Like I said before, RecognizerListener has all of the underlying methods which you can edit.  In our Main Activity, we will also want to add the following code (we’ll fill in the required functions later).




//We'll need this to ask the user for permission to record audio
@Override
public void onRequestPermissionsResult(int requestCode, @NonNull String[] permissions, @NonNull int[] grantResults) {
        
}

@Override
public void onResume() {
	super.onResume();
}

@Override
protected void onPause() {
	super.onPause();
}

@Override
protected void onStop() {
	super.onStop();
}

//Executes when the user has started to speak.
public void onBeginningOfSpeech() {
	Log.i(LOG_TAG, "onBeginningOfSpeech");
}

//Executes when more sound has been received.
@Override
public void onBufferReceived(byte[] buffer) {
	Log.i(LOG_TAG, "onBufferReceived: " + buffer);
}

//Called after the user stops speaking.
@Override
public void onEndOfSpeech() {

}

//Called when any error has occurred

@Override
public void onError(int errorCode) {
	String errorMessage = getErrorText(errorCode);
	Log.d(LOG_TAG, "FAILED " + errorMessage);
}


//Called when partial recognition results are available.
@Override
public void onPartialResults(Bundle arg0) {
 	Log.i(LOG_TAG, "Results");
}

//Called when the endpointer is ready for the user to start speaking.
@Override
public void onReadyForSpeech(Bundle arg0) {
	Log.i(LOG_TAG, "Ready For Speech");
}

//Called when recognition results are ready.
@Override
public void onResults(Bundle results) {
	Log.i(LOG_TAG, "Results");
}

//The sound level in the audio stream has changed.
@Override
public void onRmsChanged(float rmsdB) {
	Log.i(LOG_TAG, "RMS Changed: " + rmsdB);
	progressBar.setProgress((int) rmsdB);
}




3. Requesting Permission

One important aspect of the speech recognizer functions is that you will need to use the microphone to get any sound (obviously). Any time you use a feature on your device that can record the user or access their personal information like the microphone, camera, or reading files, the user must grant the app permission first. The same goes here, and we will have to prompt the user for permission to access their microphone before the Voice to Text engine will work. Ideally, we’ll call a function to ask our system if the user has already granted permission before starting to listen.  If they have not yet granted permission, we will then ask them.

 
    @Override
    public void onRequestPermissionsResult(int requestCode, @NonNull String[] permissions, @NonNull int[] grantResults) {
        super.onRequestPermissionsResult(requestCode, permissions, grantResults);
        switch (requestCode) {
            case REQUEST_RECORD_PERMISSION:
                if (grantResults.length > 0 && grantResults[0] == PackageManager.PERMISSION_GRANTED) {
                    speech.startListening(recognizerIntent);
                } else {
                    Toast.makeText(MainActivity.this, "Permission Denied!", Toast
                            .LENGTH_SHORT).show();
                }
        }
    }

In addition to this piece of code, you will also need to add the following lines to your `AndroidManifest.xml`

    

<uses-permission android:name="android.permission.RECORD_AUDIO"/>
<uses-permission android:name="android.permission.INTERNET" />

Step 4.  Triggering the listener to begin

Now that we’ve got our permissions function started, it’s time to add a trigger when the user clicks our button.  In this section, we also need to declare our variables and set up the recognizer in the `onCreate()` function:

returnedText = (TextView) findViewById(R.id.textView);
recordButton = (ImageButton) findViewById(R.id.recordButton);
recordButtonStatus = false;

progressBar = (ProgressBar) findViewById(R.id.progressbar);

speech = SpeechRecognizer.createSpeechRecognizer(this);
Log.i(LOG_TAG, "isRecognitionAvailable: " + SpeechRecognizer.isRecognitionAvailable(this));
speech.setRecognitionListener(this);
recognizerIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_PREFERENCE, "en");
recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
recognizerIntent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 3);

recordButton.setOnClickListener(new View.OnClickListener() {
	public void onClick(View v) {
		if(recordButtonStatus){
			recordButtonStatus = false;
			progressBar.setIndeterminate(false);
			progressBar.setVisibility(View.INVISIBLE);
			speech.stopListening();
			recordButton.setBackground(getDrawable(R.drawable.microphone_button_off));
		}else{
			ActivityCompat.requestPermissions
 				(MainActivity.this,
					new String[]{Manifest.permission.RECORD_AUDIO},
					REQUEST_RECORD_PERMISSION);
                 recordButtonStatus = true;
                 recordButton.setBackground(getDrawable(R.drawable.microphone_button_on));
        }
    }
});

Great! Now, when I build the app and start it, I should be requested to record audio.

Step 5. Add a function to handle the listener output

Now that we now how to trigger the listener to start listening, the speech recognizer does a lot on its own. However, we will want to tell the recognizer where to put the text once we’ve finished our audio input. The correct place to put that would be in the `onResults()` function. In this case, we just simply want to display the transcripted text in our `TextView`.

The speech recognizer will actually return an `Array` of possible results to the speech, with the most relevant result in front. So, we’ll just take the result in position 0 and display that.

	
    @Override
    public void onResults(Bundle results) {
        Log.i(LOG_TAG, "Results");
        ArrayList<String> matches = results
                .getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);


        String text = matches.get(0);
        returnedText.setText(text);

        recordButtonStatus = false;
        recordButton.setBackground(getDrawable(R.drawable.microphone_button_off));
    }

At this point, I should be able to run my simple speech recognizer and see my words printed on the screen. While this is a very simple example, there is a lot more you can do with this technology. If you’re like me, you might be thinking about all the possibilities that you could use this feature for in an app.

Conclusion

Now you have a basic voice to text app built on your smartphone or tablet!  If you’re like me, you can probably think of some ways that this might be useful.  Perhaps you can make a note taking application for students. Perhaps someone could use this technology when interviewing.  Even yet, maybe speech to text could be a feature in an app you are building to use custom voice commands, similar to how Google Keep and OneNote utilize Speech Recognition technology.

Android Speech Recognizer Screen If you’d like to see basic Note Taking app I added to the Google Play store that is build on top of the code here, please visit the Google Play Listing. and feel free to leave suggestions on further ways to improve the app in the comments!  If you liked what you learned here, you might also enjoy my article on Speeding up your Android WebView with SQLite transactions, or my article on ideas for lucrative side gigs.

Source code for what we learned here can be found here.









 
 
 
 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>