All Products
Search
Document Center

Python SDK 2.0

Last Updated: Nov 18, 2019

Note:

Download and installation

Note:

  • The SDK only supports Python 3.4 and later.
  • Ensure that you have installed Python package tool setuptools. If you do not have installed setuptools, run the following command to install it:

    1. pip install setuptools
  1. Download the Python SDK.
  2. Install the Python SDK. Run the following commands from the SDK directory.

    1. # Create an egg file.
    2. python setup.py bdist_egg
    3. # Install the egg file.
    4. python setup.py install

Note: The pip and python commands are Python 3 commands.

Key objects

  1. NlsClient: the speech processing client, which is equivalent to a factory for all speech processing classes. You can globally create an NlsClient instance. This object is thread-safe.
  2. SpeechRecognizer: the short sentence recognition object. You can use this object to set request parameters, send a request, and send audio data. This object is not thread-safe.
    • start: the method used to connect the client and the server. The default parameter ping_interval indicates the interval between pings to the server. The ping_timeout parameter indicates the timeout duration for receiving the pong message. Ensure that the value of ping_interval is larger than the value of ping_timeout.
    • send: the method used to send audio data to the server.
    • stop: the method used to stop recognition and disconnect the client from the server.
    • close: the method used to stop the network connection to the server.
  3. SpeechRecognizerCallback: the object of callback functions. You can use this object to fire callbacks for recognition results and errors.
    • on_started: the callback that is fired when the client is connected to the server.
    • on_result_changed: the callback that is fired when the client receives an intermediate result.
    • on_completed: the callback that is fired when the client receives the message of recognition completed with the recognition result.
    • on_task_failed: the callback that is fired when the client receives an error message.
    • on_channel_closed: the callback that is fired when the client receives the message of network disconnected.

Notes on SDK calls

  1. You can globally create an NlsClient object and reuse it if necessary.
  2. The SpeechRecognizer object cannot be reused. You must create a SpeechRecognizer object for each recognition task. For example, to process N audio files, you must create N SpeechRecognizer objects to complete N recognition tasks.
  3. A SpeechRecognizerCallback object corresponds to a SpeechRecognizer object. You cannot use a SpeechRecognizerCallback object for multiple SpeechRecognizer objects. Otherwise, you may fail to distinguish recognition tasks.

Sample code

Note 1: The demo uses an audio file at the sampling rate of 16,000 Hz. To obtain correct recognition results, set the mode to universal model for the project to which the appkey is bound in the Intelligent Speech Interaction console. In actual use, you need to select the model according to the audio sampling rate. For more information about model setting, see Manage projects.

nls-sample-16k.wav

Note 2: The demo uses the default URL to access the short sentence recognition service, you need to set the URL when you create the NlsClient object.

  1. recognizer = client.create_recognizer(callback, "ws://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1")

Example:

  1. # -*- coding: utf-8 -*-
  2. import os
  3. import time
  4. import threading
  5. import ali_speech
  6. from ali_speech.callbacks import SpeechRecognizerCallback
  7. from ali_speech.constant import ASRFormat
  8. from ali_speech.constant import ASRSampleRate
  9. class MyCallback(SpeechRecognizerCallback):
  10. """
  11. You can set constructor parameters as required.
  12. You can set the name parameter in this example to the name of the audio file to be recognized. This helps you distinguish tasks in multiple threads.
  13. """
  14. def __init__(self, name='default'):
  15. self._name = name
  16. def on_started(self, message):
  17. print('MyCallback.OnRecognitionStarted: %s' % message)
  18. def on_result_changed(self, message):
  19. print('MyCallback.OnRecognitionResultChanged: file: %s, task_id: %s, result: %s' % (
  20. self._name, message['header']['task_id'], message['payload']['result']))
  21. def on_completed(self, message):
  22. print('MyCallback.OnRecognitionCompleted: file: %s, task_id:%s, result:%s' % (
  23. self._name, message['header']['task_id'], message['payload']['result']))
  24. def on_task_failed(self, message):
  25. print('MyCallback.OnRecognitionTaskFailed: %s' % message)
  26. def on_channel_closed(self):
  27. print('MyCallback.OnRecognitionChannelClosed')
  28. def process(client, appkey, token):
  29. audio_name = 'nls-sample-16k.wav'
  30. callback = MyCallback(audio_name)
  31. recognizer = client.create_recognizer(callback, "ws://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1")
  32. recognizer.set_appkey(appkey)
  33. recognizer.set_token(token)
  34. recognizer.set_format(ASRFormat.PCM)
  35. recognizer.set_sample_rate(ASRSampleRate.SAMPLE_RATE_16K)
  36. recognizer.set_enable_intermediate_result(False)
  37. recognizer.set_enable_punctuation_prediction(True)
  38. recognizer.set_enable_inverse_text_normalization(True)
  39. try:
  40. ret = recognizer.start()
  41. if ret < 0:
  42. return ret
  43. print('sending audio...')
  44. with open(audio_name, 'rb') as f:
  45. audio = f.read(3200)
  46. while audio:
  47. ret = recognizer.send(audio)
  48. if ret < 0:
  49. break
  50. time.sleep(0.1)
  51. audio = f.read(3200)
  52. recognizer.stop()
  53. except Exception as e:
  54. print(e)
  55. finally:
  56. recognizer.close()
  57. def process_multithread(client, appkey, token, number):
  58. thread_list = []
  59. for i in range(0, number):
  60. thread = threading.Thread(target=process, args=(client, appkey, token))
  61. thread_list.append(thread)
  62. thread.start()
  63. for thread in thread_list:
  64. thread.join()
  65. if __name__ == "__main__":
  66. client = ali_speech.NlsClient()
  67. # Specify the logging level: DEBUG, INFO, WARNING, or ERROR.
  68. client.set_log_level('INFO')
  69. appkey = 'Your appkey'
  70. token = 'Your token'
  71. process(client, appkey, token)
  72. # The code for multithreading.
  73. # process_multithread(client, appkey, token, 2)