Learning Microsoft Cognitive Services - Second Edition: Leverage Machine Learning APIs to build smart applications by Leif Larsen

Learning Microsoft Cognitive Services - Second Edition: Leverage Machine Learning APIs to build smart applications by Leif Larsen

Author:Leif Larsen [Larsen, Leif]
Language: eng
Format: azw3, epub
Tags: COM016000 - COMPUTERS / Computer Vision and Pattern Recognition, COM004000 - COMPUTERS / Intelligence (AI) and Semantics, COM042000 - COMPUTERS / Natural Language Processing
Publisher: Packt Publishing
Published: 2017-10-23T04:00:00+00:00


InverseTextNormalizationResult This displays phrases such as one two three four as 1234, so it is ideal for usages such as go to second street.

MaskedInverseTextNormalizationResult Inverse text normalization and profanity mask. No capitalization or punctuation applied.

For our use, we are just interested in the DisplayText. With a populated list of recognized phrases, we raise the status update event:

SpeechToTextEventArgs args = new SpeechToTextEventArgs(SttStatus.Success, $"STT completed with status: {e.PhraseResponse.RecognitionStatus.ToString()}", phrasesToDisplay); RaiseSttStatusUpdated(args); }

To be able to use this class, we need a couple of public functions so we can start speech recognition:

public void StartMicToText() { _micRecClient.StartMicAndRecognition(); _isMicRecording = true; }

The StartMicToText method will call the StartMicAndRecognition method on the _micRecClient object. This will allow us to use the microphone to convert spoken audio. This function will be our main way to access this API:

public void StartAudioFileToText(string audioFileName) { using (FileStream fileStream = new FileStream(audioFileName, FileMode.Open, FileAccess.Read)) { int bytesRead = 0; byte[] buffer = new byte[1024];

The second function will require a filename for the audio file, with the audio we want to convert. We open the file, with read access, and are ready to read it:

try { do { bytesRead = fileStream.Read(buffer, 0, buffer.Length); _dataRecClient.SendAudio(buffer, bytesRead); } while (bytesRead > 0); }

As long as we have data available, we read from the file. We will fill up the buffer, and call the SendAudio method. This will then trigger a recognition operation in the service.

If any exceptions occur, we make sure to output the exception message to a debug window. Finally, we need to call the EndAudio method so that the service does not wait for any more data:

catch(Exception ex) { Debug.WriteLine($"Exception caught: {ex.Message}"); }

finally { _dataRecClient.EndAudio(); }

Before leaving this class, we need to dispose of our API clients. Add the following in the Dispose function:

if (_micRecClient != null) { _micRecClient.EndMicAndRecognition(); _micRecClient.OnMicrophoneStatus -= OnMicrophoneStatus; _micRecClient.OnPartialResponseReceived -= OnPartialResponseReceived; _micRecClient.OnResponseReceived -= OnResponseReceived; _micRecClient.OnConversationError -= OnConversationErrorReceived; _micRecClient.Dispose(); _micRecClient = null; } if(_dataRecClient != null) { _dataRecClient.OnIntent -= OnIntentReceived; _dataRecClient.OnPartialResponseReceived -= OnPartialResponseReceived; _dataRecClient.OnConversationError -= OnConversationErrorReceived; _dataRecClient.OnResponseReceived -= OnResponseReceived; _dataRecClient.Dispose(); _dataRecClient = null; }

We stop microphone recording, unsubscribe from all events, and dispose and clear the client objects.

Make sure the application compiles before moving on. We will look at how to use this class a bit later.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.