All Products
Search
Document Center

Python SDK 2.0

Last Updated: Nov 25, 2019

Note:

Download and installation

Note:

  • The SDK only supports Python 3.4 and later.
  • Ensure that you have installed Python package tool setuptools. If you do not have installed setuptools, run the following command to install it:

    1. pip install setuptools
  1. Download the Python SDK.
  2. Install the Python SDK. Run the following commands from the SDK directory.

    1. # Create an egg file.
    2. python setup.py bdist_egg
    3. # Install the egg file.
    4. python setup.py install

Note: The pip and python commands are Python 3 commands.

Key objects

  1. NlsClient: the speech processing client, which is equivalent to a factory for all speech processing classes. You can globally create an NlsClient instance. This object is thread-safe.
  2. SpeechTranscriber: the real-time speech recognition object. You can use this object to set request parameters, send a request, and send audio data. This object is not thread-safe.
    • start: the method used to connect the client and the server. The default parameter ping_interval indicates the interval between pings to the server. The ping_timeout parameter indicates the timeout duration for receiving the pong message. Ensure that the value of ping_interval is larger than the value of ping_timeout.
    • send: the method used to send audio data to the server.
    • stop: the method used to stop recognition and disconnect the client from the server.
    • close: the method used to stop the network connection to the server.
  3. SpeechTranscriberCallback: the object of callback functions. You can use this object to trigger callback events for recognition results and errors.
    • on_started: the callback that is fired when the client is connected to the server.
    • on_result_changed: the callback that is fired when the client receives an intermediate result.
    • on_sentence_begin: the callback that is fired when the client receives the message of the sentence beginning.
    • on_sentence_end: the callback that is fired when the client receives the message of the sentence end.
    • on_completed: the callback that is fired when the client receives the message of recognition completed.
    • on_task_failed: the callback that is fired when the client receives an error message.
    • on_channel_closed: the callback that is fired when the client receives the message of network disconnected.

Notes on SDK calls

  1. You can globally create an NlsClient object and reuse it if necessary.
  2. The SpeechTranscriber object cannot be reused. You must create a SpeechTranscriber object for each recognition task. For example, to process N audio files, you must create N SpeechTranscriber objects to complete N recognition tasks.
  3. A SpeechTranscriberCallback object corresponds to a SpeechTranscriber object. You cannot use a SpeechTranscriberCallback object for multiple SpeechTranscriber objects. Otherwise, you may fail to distinguish recognition tasks.

Sample code

Note 1: The demo uses an audio file at the sampling rate of 16,000 Hz. To obtain correct recognition results, set the mode to universal model for the project to which the appkey is bound in the Intelligent Speech Interaction console. In actual use, you need to select the model according to the audio sampling rate. For more information about model setting, see Manage projects.

nls-sample-16k.wav

Example:

  1. # -*- coding: utf-8 -*-
  2. import os
  3. import time
  4. import threading
  5. import ali_speech
  6. from ali_speech.callbacks import SpeechTranscriberCallback
  7. from ali_speech.constant import ASRFormat
  8. from ali_speech.constant import ASRSampleRate
  9. class MyCallback(SpeechTranscriberCallback):
  10. """
  11. You can set constructor parameters as required.
  12. You can set the name parameter in this example to the name of the audio file to be recognized. This helps you distinguish tasks in multiple threads.
  13. """
  14. def __init__(self, name='default'):
  15. self._name = name
  16. def on_started(self, message):
  17. print('MyCallback.OnRecognitionStarted: %s' % message)
  18. def on_result_changed(self, message):
  19. print('MyCallback.OnRecognitionResultChanged: file: %s, task_id: %s, result: %s' % (
  20. self._name, message['header']['task_id'], message['payload']['result']))
  21. def on_sentence_begin(self, message):
  22. print('MyCallback.on_sentence_begin: file: %s, task_id: %s, sentence_id: %s, time: %s' % (
  23. self._name, message['header']['task_id'], message['payload']['index'], message['payload']['time']))
  24. def on_sentence_end(self, message):
  25. print('MyCallback.on_sentence_end: file: %s, task_id: %s, sentence_id: %s, time: %s, result: %s' % (
  26. self._name,
  27. message['header']['task_id'], message['payload']['index'],
  28. message['payload']['time'], message['payload']['result']))
  29. def on_completed(self, message):
  30. print('MyCallback.OnRecognitionCompleted: %s' % message)
  31. def on_task_failed(self, message):
  32. print('MyCallback.OnRecognitionTaskFailed-task_id:%s, status_text:%s' % (
  33. message['header']['task_id'], message['header']['status_text']))
  34. def on_channel_closed(self):
  35. print('MyCallback.OnRecognitionChannelClosed')
  36. def process(client, appkey, token):
  37. audio_name = 'nls-sample-16k.wav'
  38. callback = MyCallback(audio_name)
  39. transcriber = client.create_transcriber(callback)
  40. transcriber.set_appkey(appkey)
  41. transcriber.set_token(token)
  42. transcriber.set_format(ASRFormat.PCM)
  43. transcriber.set_sample_rate(ASRSampleRate.SAMPLE_RATE_16K)
  44. transcriber.set_enable_intermediate_result(False)
  45. transcriber.set_enable_punctuation_prediction(True)
  46. transcriber.set_enable_inverse_text_normalization(True)
  47. try:
  48. ret = transcriber.start()
  49. if ret < 0:
  50. return ret
  51. print('sending audio...')
  52. with open(audio_name, 'rb') as f:
  53. audio = f.read(3200)
  54. while audio:
  55. ret = transcriber.send(audio)
  56. if ret < 0:
  57. break
  58. time.sleep(0.1)
  59. audio = f.read(3200)
  60. transcriber.stop()
  61. except Exception as e:
  62. print(e)
  63. finally:
  64. transcriber.close()
  65. def process_multithread(client, appkey, token, number):
  66. thread_list = []
  67. for i in range(0, number):
  68. thread = threading.Thread(target=process, args=(client, appkey, token))
  69. thread_list.append(thread)
  70. thread.start()
  71. for thread in thread_list:
  72. thread.join()
  73. if __name__ == "__main__":
  74. client = ali_speech.NlsClient()
  75. # Specify the logging level: DEBUG, INFO, WARNING, or ERROR.
  76. client.set_log_level('INFO')
  77. appkey = 'Your appkey'
  78. token = 'Your token'
  79. process(client, appkey, token)
  80. # The code for multithreading.
  81. # process_multithread(client, appkey, token, 2)