Skip to content

Conversation

Lana-chan
Copy link

hello! i hope this PR meets the code conventions guidelines. i've tried running the code auto-formatter but it tried to modify some 50 unrelated files instead. please let me know what i can change if this PR is unacceptable.

this PR adds the capability to grab samples from PDM devices (such as certain MEMS microphones) to the ESP32 port using the new ESP-IDF PDM driver. i have tested this on an M5 Cardputer which has a SPM1423 microphone built-in. i have tested that other I2S functions are unaffected (the cardputer also has a NS4168 speaker i have been using). i have also confirmed that other ports which use machine.I2S (rp2, mimxrt, stm32) have compilation unaffected, however i do not have those boards to test physically.

Copy link

codecov bot commented Mar 25, 2024

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.39%. Comparing base (51d05c4) to head (e2d9925).
⚠️ Report is 1740 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #14176   +/-   ##
=======================================
  Coverage   98.39%   98.39%           
=======================================
  Files         161      161           
  Lines       21200    21200           
=======================================
  Hits        20860    20860           
  Misses        340      340           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS

Signed-off-by: maple "mavica" syrup <maple@maple.pet>
Signed-off-by: maple "mavica" syrup <maple@maple.pet>
Signed-off-by: maple "mavica" syrup <maple@maple.pet>
@miketeachman
Copy link
Contributor

miketeachman commented Mar 26, 2024

This is some great work to get PDM microphones working on the ESP32. There are a number of products that will benefit from PDM protocol support, for example, the T-Watch 2020 V3.

One suggestion is to revisit the MicroPython API for supporting the PDM protocol. At the moment, this PR has the PDM protocol being added to the machine.I2S class. But, I2S and PDM are very different audio protocols. Adding the PDM protocol to the existing I2S class maybe not the best from an architecture and user comprehension viewpoint. To me, it would be better to support the PDM protocol with a separate machine.PDM class. That would allow a PDM interface in Micropython to exactly match the PDM protocol. This would involve adding an extmod/machine_pdm.c implementation to define the machine.PDM class API. You can still use and modify the functions in machine_i2s.c. Some refactoring might be needed and I can imagine a header file or two is needed. This is extra work, but it will improve the overall experience in Micropython wrt to these two protocols.

CircuitPython takes this suggested approach. audiobusio.PDMIn() for PDM microphone support and audiobusio.I2SOut() for I2S amplifier support.

@Lana-chan
Copy link
Author

after reading the extmod/machine_i2s file and everywhere it's called, i fear making a whole new extmod might be a bit of a daunting task for me. i don't wholly disagree that PDM could have a better calling convention than i implemented, but if a PDM class is going to call on machine_i2s methods anyway, would that be the best course of action code-wise?

and if deemed really necessary, are there pointers on how i would even begin with that? i'm not familiar with the code structure, i was able to make my PR because the code changes weren't too massive.

i'd really like to see PDM support as i have a device i currently can't fully use without hacking up micropython, and i'd like to do it in a way that i don't have to distribute my own firmware to others who want to run my micropython code

@jonnor
Copy link
Contributor

jonnor commented Mar 27, 2024

This feature looks great @Lana-chan !
I am very interested to test it on one of my ESP32 devices which have PDM microphones. I have at least a T-Watch 2020 V3, M5Stack AtomS3U and TTGO T-Camera Mic.
Is the feature enabled by default (on this branch), if I build a generic ESP32 (or ESP32 S3) image for these boards? Or do I need to do anything in particular to enable it?

@Lana-chan
Copy link
Author

Yes, it should automatically be included in any device ESP-IDF defines SOC_I2S_SUPPORTS_PDM_RX for. S3 is one of them.

The I2S mic recording examples work with only the modification of the mode to PDM_RX and setting the correct pins.

@jonnor
Copy link
Contributor

jonnor commented Apr 28, 2024

I got around to testing this now. Using a M5Stack AtomS3U. Has SPM1432 microphone, should be same as the M5 Cardputer.
I used the nonblocking record example as a starting point. From https://github.com/miketeachman/micropython-i2s-examples/blob/master/examples/record_mic_to_sdcard_blocking.py
And updated the pinouts for this device (note: official M5Stack docs have the pinout wrapped, schematic is correct). But as there is no SD card, so I recorded to the internal FLASH.

However, with the standard parameters, my voice was definitely higher pitch than normally. Checking with a 440 Hz sinewave it was indeed showing up as 880 Hz in the recorded .wav file. To get normally pitched sounds, I had to pass rate as 2x the specified samplerate. That does not seem entirely correct? Perhaps the PDM clock rate setup is off by a factor 2?

audio_in = I2S(
 ...
    mode=I2S.PDM_RX,
    rate=SAMPLE_RATE_IN_HZ*2,
...
)

@jonnor
Copy link
Contributor

jonnor commented Apr 28, 2024

I am also observing some noise or discontinuities. They seem to be related to the size of buffers being read, However this might be due to the internal flash writing, or something else that is not PDM specific. The non-blocking seems a bit worse than the blocking. Unfortunately, I do not have any I2S audio inputs to compare with.

@Lana-chan
Copy link
Author

Lana-chan commented Apr 28, 2024

However, with the standard parameters, my voice was definitely higher pitch than normally. Checking with a 440 Hz sinewave it was indeed showing up as 880 Hz in the recorded .wav file. To get normally pitched sounds, I had to pass rate as 2x the specified samplerate. That does not seem entirely correct? Perhaps the PDM clock rate setup is off by a factor 2?

yes i noticed this too, and i'm not entirely sure my understanding of PDM is correct but i think this has to do with the downsampling necessary for the conversion to PCM: https://github.com/micropython/micropython/pull/14176/files#diff-e507a794c1ae2e2e69bf262f93220fd9d17d2f8f5245e778d53dc941a06011e9R433

with ESP-IDF by itself others have noticed that you have to use 2x the desired sample rate, so this is understood to me as a fact of the firmware itself: espressif/esp-idf#8660 (comment)

at any rate my implementation simply calls on ESP-IDF's implementation. personally i don't think it should be up to MicroPython to automatically double the rate behind the scenes

I am also observing some noise or discontinuities. They seem to be related to the size of buffers being read, However this might be due to the internal flash writing, or something else that is not PDM specific. The non-blocking seems a bit worse than the blocking.

i can confirm this as well having tried to use it non-blocking, however i don't think there's much that can be done if the program cycle simply cannot read the buffers fast enough for continuous sound capture. i'm not sure a non-PDM input device would function any differently but i also don't have others on hand to test

@jonnor
Copy link
Contributor

jonnor commented May 1, 2024

yes i noticed this too, and i'm not entirely sure my understanding of PDM is correct but i think this has to do with the downsampling necessary for the conversion to PCM: https://github.com/micropython/micropython/pull/14176/files#diff-e507a794c1ae2e2e69bf262f93220fd9d17d2f8f5245e778d53dc941a06011e9R433

I believe this variable pdm_cfg.clk_cfg.dn_sample_mode is highly relevant to the frequency-doubling issue. This sets the PDM oversampling factor (which determines the PDM clock given to the mic, with pdm_clock=pcm_samplerate*oversampling ). According to the esp-idf I2S documentation, can take two values, for either 64x or 128x oversampling. These are the two most common values (by far) for PDM microphones, and both are commonly used.

with ESP-IDF by itself others have noticed that you have to use 2x the desired sample rate, so this is understood to me as a fact of the firmware itself: espressif/esp-idf#8660 (comment)

In the linked issue, they say that this issue has been fixed in a commit in 4.4.x series. This fix should also be in esp-idf 5.x, which MicroPython is currently using?

@Lana-chan
Copy link
Author

Lana-chan commented May 1, 2024

i thought the issue mentioned being fixed was the amplitude issue, the frequency was an unrelated comment to the issue... this is also complicated in that I2S was completely overhauled and incompatible between 4.4 and 5.x, so i can't say for certain the bug isn't present in 5.x. this is the relevant commit: espressif/esp-idf@53a5d51

if you look at the latest 5.x branch, the HAL files for I2S are completely different in this regard

a stereo/mono disparity in recording would explain an exact 2x sample rate. the downsampling modes have multipliers different than 2x (enum names are "8" and "16", effect described in docs is 64 and 128 respectively), i don't believe they're what's causing the issue. the part you are mentioning in the documentation seems to indicate those are the multipliers for the sample rate to the WS clock (WS clock will be downsample factor times desired sample rate) and the downsampling brings it back to the sample rate. to micropython, this should be all invisible.

@jonnor
Copy link
Contributor

jonnor commented May 1, 2024

I am also observing some noise or discontinuities. They seem to be related to the size of buffers being read, However this might be due to the internal flash writing, or something else that is not PDM specific. The non-blocking seems a bit worse than the blocking.

i can confirm this as well having tried to use it non-blocking, however i don't think there's much that can be done if the program cycle simply cannot read the buffers fast enough for continuous sound capture. i'm not sure a non-PDM input device would function any differently but i also don't have others on hand to test

Thank you for confirming this issue. I have ordered some I2S microphones now, hopefully will be able to test that next week.
The application is able to process the buffers fast enough (on average) - to my knowledge. I do not see any indications the buffers overflowed (though I am not entirely sure how to check)? But maybe @miketeachman or others with more experience can give some input on how to debug.

@jonnor
Copy link
Contributor

jonnor commented May 1, 2024

I now modified the machine_i2s.c to support an oversample parameter of 64/128. It makes no difference to the frequency doubling issue. So esp-idf correctly does the appropriate combination of clock configuration and downsampling.
Furthermore, I note that when I have the correct samplerate configured (that is without doing *2), I only get half as many samples as I should... Which might be a hint to what is actually going wrong here?

I think it would be a good idea to expose the PDM oversampling. The PDM clock controls the power modes of the microphone (which also influences acoustical performance). Especially, it is critical to avoid a clock that is out-of-spec or borderline with the power states.
For example 48 kHz * 128x = 6.144 Mhz is out-of-spec, so should use 64x instead.
The SPM1423 has a transition into sleep at 1 Mhz. So 16 kHz * 64x = 1024 khz oversampling should not be used (instead 128x oversampling) to avoid risk of shutting down in case of slight clock drifts. I have spent months debugging such issues...
At 11.025 kHz * 64x that mic won't turn on, must use 128x. On another mic which support a low power state, one would prefer to use 64x for lower power consumption.

EDIT: for reference, here is the code I added for PDM oversample parameter, jonnor@4c1f034

@jonnor
Copy link
Contributor

jonnor commented May 1, 2024

Regarding the intermittent noises or apparent discontinuities. I did a new test now where I recorded into RAM, and then wrote the WAV file afterward. This gave a very clean signal, no apparent periodic noises.

# Based on https://github.com/miketeachman/micropython-i2s-examples/blob/master/examples/record_mic_to_sdcard_blocking.py
# The MIT License (MIT)
# Copyright (c) 2022 Mike Teachman
# https://opensource.org/licenses/MIT

import os
from machine import Pin
from machine import I2S

SCK_PIN = 39
#WS_PIN = 25
SD_PIN = 38
I2S_ID = 0
BUFFER_LENGTH_IN_BYTES = 40000

# ======= AUDIO CONFIGURATION =======
WAV_FILE = "mic.wav"
RECORD_TIME_IN_SECONDS = 4
WAV_SAMPLE_SIZE_IN_BITS = 16
FORMAT = I2S.MONO
SAMPLE_RATE_IN_HZ = 16000
# ======= AUDIO CONFIGURATION =======

format_to_channels = {I2S.MONO: 1, I2S.STEREO: 2}
NUM_CHANNELS = format_to_channels[FORMAT]
WAV_SAMPLE_SIZE_IN_BYTES = WAV_SAMPLE_SIZE_IN_BITS // 8
RECORDING_SIZE_IN_BYTES = (
    RECORD_TIME_IN_SECONDS * SAMPLE_RATE_IN_HZ * WAV_SAMPLE_SIZE_IN_BYTES * NUM_CHANNELS
)


def create_wav_header(sampleRate, bitsPerSample, num_channels, num_samples):
    datasize = num_samples * num_channels * bitsPerSample // 8
    o = bytes("RIFF", "ascii")  # (4byte) Marks file as RIFF
    o += (datasize + 36).to_bytes(
        4, "little"
    )  # (4byte) File size in bytes excluding this and RIFF marker
    o += bytes("WAVE", "ascii")  # (4byte) File type
    o += bytes("fmt ", "ascii")  # (4byte) Format Chunk Marker
    o += (16).to_bytes(4, "little")  # (4byte) Length of above format data
    o += (1).to_bytes(2, "little")  # (2byte) Format type (1 - PCM)
    o += (num_channels).to_bytes(2, "little")  # (2byte)
    o += (sampleRate).to_bytes(4, "little")  # (4byte)
    o += (sampleRate * num_channels * bitsPerSample // 8).to_bytes(4, "little")  # (4byte)
    o += (num_channels * bitsPerSample // 8).to_bytes(2, "little")  # (2byte)
    o += (bitsPerSample).to_bytes(2, "little")  # (2byte)
    o += bytes("data", "ascii")  # (4byte) Data Chunk Marker
    o += (datasize).to_bytes(4, "little")  # (4byte) Data size in bytes
    return o


audio_in = I2S(
    I2S_ID,
    sck=Pin(SCK_PIN),
    #ws=Pin(WS_PIN),
    sd=Pin(SD_PIN),
    mode=I2S.PDM_RX,
    bits=WAV_SAMPLE_SIZE_IN_BITS,
    format=FORMAT,
    rate=SAMPLE_RATE_IN_HZ*2,
    ibuf=BUFFER_LENGTH_IN_BYTES,
)

# allocate sample arrays
# memoryview used to reduce heap allocation in while loop
mic_samples = bytearray(10000)
mic_samples_mv = memoryview(mic_samples)


recording_buffer = bytearray(RECORDING_SIZE_IN_BYTES)
bytes_received = 0

print("Recording size: {} bytes".format(RECORDING_SIZE_IN_BYTES))
print("==========  START RECORDING ==========")
try:
    while bytes_received < RECORDING_SIZE_IN_BYTES:
        # read a block of samples from the I2S microphone
        bytes_read = audio_in.readinto(mic_samples_mv)
        if bytes_read > 0:
            bytes_to_write = min(
                bytes_read, RECORDING_SIZE_IN_BYTES - bytes_received
            )
            recording_buffer[bytes_received:bytes_received+bytes_to_write] = mic_samples_mv[0:bytes_to_write]
            print('FILL', bytes_received, bytes_to_write)
            bytes_received += bytes_read

    print("==========  DONE RECORDING ==========")
except (KeyboardInterrupt, Exception) as e:
    print("caught exception {} {}".format(type(e).__name__, e))


# Write to WAV
wav = open(WAV_FILE, "wb")

# create header for WAV file and write to SD card
wav_header = create_wav_header(
    SAMPLE_RATE_IN_HZ,
    WAV_SAMPLE_SIZE_IN_BITS,
    NUM_CHANNELS,
    SAMPLE_RATE_IN_HZ * RECORD_TIME_IN_SECONDS,
)
wav.write(wav_header)

# write samples to WAV file
wav.write(recording_buffer)

# cleanup
wav.close()
print("Wrote ", WAV_FILE)
audio_in.deinit()

@jonnor
Copy link
Contributor

jonnor commented May 1, 2024

I also did a test where non-blocking read was used, and then doing a 20 ms computation every 100 ms. The resulting audio was clean - great! Code: https://github.com/jonnor/embeddedml/blob/master/handson/micropython-esp32-pdm/record_nonblocking_memory.py

So the noise and discontinuities does not seem to be a software issue (at least not a general one around buffering). Instead, it might be that writing to FLASH causes electromagnetic interference. Or that there are actually tiny sounds being created in the PCB near the microphone when the writing to FLASH (capacitors especially can exhibit slight expansion when exposed to a variable electric field - so-called microphonic effect). I have experienced both previously, on other boards and microcontrollers - it is not as far-fetched as it may sound.

So from my point of view, this code is already at a very useful state, and I would love to see it in mainline MicroPython. Ideally. the oddity about double frequency would be resolved. I would be in favor of doing a *2 multiplication inside the machine_i2s for ESP32, so that it correct at least on the API surface.

@miketeachman
Copy link
Contributor

miketeachman commented May 1, 2024

It's great to see the demonstration of the PDM protocol working on the ESP32 port of MicroPython. This protocol would be an excellent addition to MicroPython.

At this time, the PR demonstrates a PDM protocol proof-of-concept, by making a quick-fix adaptation to the machine.I2S class. This is an excellent first step to showing that it works. However, I strongly believe that more effort is needed before the PR is accepted into the mainline. To support the PDM protocol I recommend adding a separate machine.PDM class to MicroPython. This approach will give MicroPython users two easy-to-use and consistent APIs,machine.I2S and machine.PDM.

There is also the consideration of the other ports. machine.I2S is supported on 3 other ports: stm32, rp2, and mimxrt. Does the API proposed in this PR make sense for the PDM implementations offered by these ports?

Would the proposal to have separate machine classes add more work to the PR? Most definitely. The ESP32 I2S code will likely need to be refactored, to create a common base that is used by both machine.I2S and machine.PDM. This will involve a deep dive into the code to understand how it works and how to cleanly make a common base that supports both I2S and PDM.

I've found that getting a prototype to work is often only 10% of the overall effort. The other 90% involves getting into the software engineering to design a great interface and a solid implementation. MicroPython shows this principle. It can be a lot work, but it yields a better user experience and a code base that can be supported in the long term.

@Lana-chan
Copy link
Author

regarding the doubled sample rate, i found out i was doing something wrong in setting the mono slot config on the ESP-IDF mask, MicroPython interestingly expects you to record stereo no matter which mode you actually want, and in mono it discards half the stream internally before exposing it to the Python side. however, despite it being supposedly a valid mode in the ESP-IDF docs, i can't seem to get a I2S_SLOT_MODE_STEREO slot config to work, it simply triggers a ESP_ERR_INVALID_ARG. i'll try to troubleshoot this further on my own fork at another time, doubling the sample rate in MicroPython is not the correct approach.

regarding exposing the PDM downsampling mode, and for that matter, Mike's comment above, this PR is simply to expose the ESP32-only (and for that matter, only in select ESP32 boards) PDM to PCM hardware. I am not implementing PDM to PCM. if that is not the proper implementation, then we will simply have hardware in the ESP32 board that cannot be used for MicroPython. i don't have the capacity to make the "proper" interface for MicroPython at this time and i hope that someone better than me can step up to finally implement PDM in MicroPython one day.

@jonnor
Copy link
Contributor

jonnor commented May 6, 2024

@Lana-chan those are good findings. If you are able to debug a bit more and figure out a working approach, then I should be able to help out in refactoring into a PDM module. At least for ESP32, which I am currently using a lot. And I can do a verification of API design wrt STM32 DFSDM (which I have worked with before) and RP2 (where someone has made a merge request earlier, #8016).

@DDDanny
Copy link

DDDanny commented Oct 11, 2024

I spend over 16 hours to get something for PDM_RX working by (miss-)usage of 2MHz SPI (ESP32 based - M5 Stack Core 2) - but my mic only returns identical raw samples with magic "192" (1100 0000) - nevertheless loud or in an isolated box ... Therefore I'm willing to work here for this PR but I want to check that the design approach and demand is still valid:
I spent roughly 4h today to understand the mpy repo which seems "simple":
Each one of the extmods defines the protocol functions (with one include to implement the port for exactly this one protocol and delegate to the one underlying driver).

  1. @jonnor / @Lana-chan:
    A. I understood from esp32 port perspective and e.g. the SPM1423 specs: "Data Format: ½ Cycle PDM" could be related to your discussed 2*I2S clock discussion - nevertheless based on the POC I2S-PWM_RX I'm currently in a good mood that - with a clean seperation - also a temporary workaround of factor 2 would be ok within the esp32 port.
    B. And for the amplitude I have seen the same behaviour on e.g. STM devices with that mic on video-plattform: "I'm very close to the mic but it is very low". Should be solvable as audio tools are doing it, but is not a "PDM driver" issue.

  2. @miketeachman : I understood that the expected clean (=extensible&mainainable architecture) solution is to have a seperate extmod/machine_pwm.c together with a (first) ports/esp32/machine_pwm.c where as the board port should at least suffer the current used dependency (idf: version: ">=5.0.4" (4365edb) - ofc best/final with also new .h files
    I can imagine your fear of this PR is that in some month the "next guy" will appear and mix in e.g. the TDM protocol and then there is a dilemma to must include/build unnecessary code to a port-specific firmware because it is "all in one" //Beside of the "docu-hell" - correct?
    As the POC/protoype of @Lana-chan changed only some lines of code this would result in a lot of duplicate code and I understood this is meant by "deep dive&API": My question here is where to put that shared code, because what I have had seen in mpy-extmods so far that this will be the first time such a split would be done?!

In my current solution understanding (as you referenced to CPython "audiobus"-implementation):
A. The right approach could be to have machine_audio sharing all the (internal) reusable functions and doing handling the BLOCKING, NON_BLOCKING and ASYNCIO approach (and ofc also audio helpers like e.g. pitch, amplify etc. could be homed here...). It will have sub-packages like machine_audio_i2s and new machine_audio_pwm and e.g. machine_audio_tdm

or B. Did you have in mind some (internal) shared audio_helper that will cover all the cases (with dependency includes [if/ifdev]/ifndev) and there are still machine_i2s, machine_pdm, machine_tdm as "main modules" that are using this above shared module internally?

The Option A. is from my POV the pythonic approach but I really scare to be the first one with the upcoming challenges to have a running build for that oO.
Compared to Option B.: The shared lib will be included only with the necessary "code-fragments" based on what is supported by the port. The underlying/plain code will look identical to Option A. as both will end up in cases like If this protocol and this mode and this bit and this format then ....

Finally: If I start work implementing this split then the esp32 port implementation will suffer the idf protocol PWM specs (with current only"scope only: MONO & PWM_RX - because I would only be able to test that): So refering to 5.0.4 (https://docs.espressif.com/projects/esp-idf/en/v5.0.4/esp32/api-reference/peripherals/i2s.html)

@miketeachman
Copy link
Contributor

miketeachman commented Oct 21, 2024

I recommend first considering how PDM support will appear at the MicroPython API, and then look at the underlying file and code structure. Here are my thoughts on the MicroPython API:

  1. No changes to machine.I2S class
  2. add a new machine.PDM class to support PDM microphones. Model the semantics and arguments of machine.PDM on machine.I2S. Also, possibly take some ideas for the interface design from the PDM APIs in CircuitPython and Arduino.

Changes to the underlying code implementation will be difficult. The I2S code is difficult to understand. There is port agnostic code in extmod and port specific code in the port directories. The port agnostic code has many conditional macros that be confusing. And, there is a point in extmod/machine_i2s.c where all the port specific code gets included as shown below:

// The port provides implementations of the above in this file.
#include MICROPY_PY_MACHINE_I2S_INCLUDEFILE

All these macro includes make code comprehension rather difficult, even for me, who wrote most of the I2S code. So, don't feel bad if it takes some time to understand what is going on.

Here is an idea on file and code changes to get PDM support:

  1. add a new extmod/machine_pdm.c that contains the MicroPython bindings for the machine.PDM class and as little else as possible.
  2. refactor extmod/machine_i2s.c into two files: extmod/machine_audio_common.c and extmod/machine_i2s.c, where extmod/machine_audio_common.c contains the port agnostic code and machine_i2s.c contains the MicroPython I2S bindings and hopefully little else.
  3. in ports/esp32, take a look at splitting machine_i2s.c into machine_i2s.c and machine_pdm.c. Or, it may be more efficient (code size and memory) to just have one file for both I2S and PDM port implementations, perhaps called machine_audio.c?

It's somewhat difficult to speculate on what the code implementation might look like until you start refactoring.

But, my main recommendation goes back to the MicroPython interface design. Spend time to get it right as that is what users see, not the underlying code structure. Don't let ease of implementation lead you to a compromised API design.

@jonnor
Copy link
Contributor

jonnor commented Oct 23, 2024

On the API design. I think that one should follow machine.I2S as much as possible. However, the constructor needs some adaptation, because the pins/configuration is different for PDM.

class machine.I2S(id, sck, ws, sd, mck=None, mode, bits, format, rate, ibuf)
class machine.PDM(id, clock, data, oversample, bits, format, rate, ibuf) 

Where clock and data are the Pins used. And oversample is the oversampling factor. Which also determines the clock frequency, at rate (sample rate) x oversample. A typical value for oversampling is 64. All other parameters the same.

@jonnor
Copy link
Contributor

jonnor commented Oct 23, 2024

In my current solution understanding (as you referenced to CPython "audiobus"-implementation): A. The right approach could be to have machine_audio sharing all the (internal) reusable functions and doing handling the BLOCKING, NON_BLOCKING and ASYNCIO approach (and ofc also audio helpers like e.g. pitch, amplify etc. could be homed here...).

(emphasis mine).

I do not think that general audio processing code should be in machine - I think that should just give the neccesary hardware access to get the data out. For audio processing, there is a very large set of possible processing algorithms, with many levels of sophistication. It would be hard to provide everything that is relevant inside MicroPython project/build - and also many other users would not use it either - for them, it would just be a waste of FLASH.
Instead, I think that it is preferable to have audio processing functionality in separate modules - preferably installable at runtime. This is doable using native .mpy modules, as demonstrated by emlearn-micropython. Here is one example of audio processing https://github.com/emlearn/emlearn-micropython/tree/master/examples/soundlevel_iir

@DDDanny
Copy link

DDDanny commented Oct 24, 2024

Yes I starred some hours on the macro-mess, but it is "ok and understandable".
I'll adopt the changes from @Lana-chan, but

  • split extmod I2S into three files (audio_common, i2s, pdm) as @miketeachman mentioned and by usage of the good work that @Lana-chan already did
  • and I'll take a look if one esp32/port include file is feasible

and @jonnor - sorry I was unclear here, but with (internal) reusable functions I tried to address/target e.g. the ringbuffer technical "helpers" and not any highlevel audio.

@DDDanny
Copy link

DDDanny commented Dec 17, 2024

  1. Short status update for redesign: My first try failed hard with dynamic typing, I'm working on the second approach which will also split/delegate extmod i2c to audio_common args, calls and reusage.

  2. I have implemented the original PDM functionalliy in mp v1.25.0 from @Lana-chan , extended with the oversample param from @jonnor (64->8S, 128->16S, default MAX).My m5 stack core2 no recording. Finally I identified that my m5 stack core2 pdm mic is internally configured to the "right" slot, but mp mono config with espressif I2S_PDM_SLOT_BOTH will only transfer the left channel... (like the I2S function only RX left in mono, but TX to both channels). - suspiciously hard inverting the clock flag did not work.

  3. Therefore I added a channel-paramter to address format=MONO, channel=RIGHT (t.b.d. later if combined in format or as two params) and the PDM RX works also for MONO, RIGHT without the need to record STEREO by setting slot mask I2S_PDM_SLOT_RIGHT

dirty implementation: https://github.com/DDDanny/micropython/blob/c6abafe46330172a4c63d72176fe736678cbdada/ports/esp32/machine_i2s.c#L430-L471


Maybe now I found another hint for the magic doubling / speed up issue - but I can't explain the root cause so far:

  • use mpy STEREO and espressif internal slot_cfk SLOT_BOTH -> rate*1 (works as expected)
  • use mpy MONO and espressif slot_cfk SLOT_BOTH -> requires rate*2 (that was the original impl.)
  • use mpy MONO and espressif SLOT_LEFT/SLOT_RIGHT -> requires rate*4

I wasn't able to identify if the rate-issue is in "our" or "their" side - so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants