static Waveform.from_encoded_bytes_into_numpy(encoded_bytes, start_time_milliseconds=0, end_time_milliseconds=0, frame_rate_hz=0, num_channels=0, convert_to_mono=False, zero_pad_ending=False, resample_mode=0, decoding_backend=0, file_extension='', mime_type='')

Decodes audio stored as bytes, directly returning a NumPy array.

This method is just like from_encoded_bytes(), but it returns a NumPy array of shape (frames, channels) instead of a Waveform object.

See the documentation for from_encoded_bytes() for a complete list of raised exceptions.

  • encoded_bytes (bytes) – A bytes object containing an encoded audio file, such as MP3 file.

  • start_time_milliseconds (int, optional) – We discard any audio before this millisecond offset. By default, this does nothing and the audio is decoded from the beginning. Negative offsets are invalid.

  • end_time_milliseconds (int, optional) – We discard any audio after this millisecond offset. By default, this does nothing and the audio is decoded all the way to the end. If start_time_milliseconds is specified, then end_time_milliseconds must be greater. The resulting

  • frame_rate_hz (int, optional) – A destination frame rate to resample the audio to. Do not specify this parameter if you wish Babycat to preserve the audio’s original frame rate. This does nothing if frame_rate_hz is equal to the audio’s original frame rate.

  • num_channels (int, optional) – Set this to a positive integer n to select the first n channels stored in the audio file. By default, Babycat will return all of the channels in the original audio. This will raise an exception if you specify a num_channels greater than the actual number of channels in the audio.

  • convert_to_mono (bool, optional) – Set to True to average all channels into a single monophonic (mono) channel. If num_channels = n is also specified, then only the first n channels will be averaged. Note that convert_to_mono cannot be set to True while also setting num_channels = 1.

  • zero_pad_ending (bool, optional) – If you set this to True, Babycat will zero-pad the ending of the decoded waveform to ensure that the output waveform’s duration is exactly end_time_milliseconds - start_time_milliseconds. By default, zero_pad_ending = False, in which case the output waveform will be shorter than end_time_milliseconds - start_time_milliseconds if the input audio is shorter than end_time_milliseconds.

  • resample_mode (int, optional) – If you set frame_rate_hz to resample the audio when decoding, you can also set resample_mode to pick which resampling backend to use. The babycat.resample_mode submodule contains the various available resampling algorithms compiled into Babycat. By default, Babycat resamples audio using libsamplerate at its highest-quality setting.

  • decoding_backend (int, optional) – Sets the audio decoding backend to use. Defaults to the Symphonia backend.


A NumPy array of shape (frames, channels) of the decoded audio waveform.

Return type