Skip to content

DMA support for Pi Pico rp2 port #7641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

nickovs
Copy link
Contributor

@nickovs nickovs commented Aug 12, 2021

This PR implements fairly complete support for the DMA controller in the rp2 series of microcontrollers. It provides a class for accessing the DMA channels through a high-level, pythonic interface, a class for setting and manipulating the DMA channel configurations and a utility function that makes it easier to implement complex scatter-gather type chained DMA configurations.

Creating an instance of the rp2.DMA class claims one of the processor's DMA channels. A sensible, per-channel default value for the configuration register can be fetched from the default_config attribute and the components of this configuration can be manipulated through its own attributes.

The read, write, count and config attributes provide read/write access to the respective registers of the DMA controller and while the setup() method allows any or all of these values to be set simultaneously and adds a trigger keyword argument to allow the setup to immediately be triggered. The read and write attributes (or keywords in setuo()) accept either actual addresses or any object that supports the buffer interface. The active attribute provides read/write control of the channel's activity, allowing the user to start and stop the channel and test if it is running.

Standard Micropython interrupt handlers are supported through the irq() method and the channel can be released either by deleting it and allowing it to be garbage-collected or with the explicit close() method.

Direct, unfettered access to the DMA controllers registers is provided through a proxy array() object returned by the registers attribute that maps directly onto the memory-mapped registers. This is necessary for more fine-grained control and is helpful for allowing chaining of DMA channels.

A utility function called buffer_address() is also added to the rp2 module that returns the address of the memory buffer underlying any object that supports the buffer interface. This is necessary for building DMA scatter-gather lists. Passing the registers value to this function returns the address of the registers themselves.

I've tried to strike a balance between simplicity and comprehensiveness but I'm sure that others will have suggestions!

@nickovs nickovs force-pushed the rp2_dma branch 4 times, most recently from 2ff81f0 to 85894b8 Compare August 12, 2021 03:29
@dpgeorge
Copy link
Member

Thanks for the contribution! It looks pretty comprehensive. Can you give some examples of how it would be used? Can it interface with SPI, for example?

I think the main point of discussion is the general approach taken to DMA. So far no port exposes DMA directly, rather it's used under-the-hood to implement other features, like fast SPI. As such:

  1. Reserve DMA for internal use only, so it's not exposed at the Python level. Background and async methods can use it, eg background SPI transfers, background memory copying.
  2. Provide a Python API (like proposed here) so the user has full control over DMA.

The difficulty with option (2) is making it compatible across ports (the usual problem!). DMA is a pretty common feature of MCUs to copy data from A to B, but the implementation differs a lot between different architectures (eg on nRF parts there is EasyDMA which is tied to each peripheral). OTOH I don't see an issue with providing a rp2-specific DMA class (like done here), so portability is not an issue if it's just rp2.DMA.

@dpgeorge
Copy link
Member

the setup() method allows any or all of these values to be set simultaneously

Is this method similar enough to existing config() methods (eg WLAN.config()) that it's worth calling it config?

A utility function called buffer_address() is also added to the rp2 module that returns the address of the memory buffer underlying any object that supports the buffer interface.

Such functionality is already provided by uctypes.addressof() (although without the addition argument to select read/write, which is a nice addition). Would be better to use that existing one.

@hoihu
Copy link
Contributor

hoihu commented Aug 12, 2021

might be useful for streaming frambuffer data to neopixels? DMA write needs then to be mapped to the PIOs shift register. The PIO would then use autopull & output the data in neopixel format, possibly even convert rgb565 colors to rgb888.

@hoihu
Copy link
Contributor

hoihu commented Aug 12, 2021

Great to see this PR! I also think DMA support for the pico is a useful addition.

I was about to write a similar module, although not as a c module but as a pure python modul that uses viper to access registers directly….

DMA irq handling was something that I wasn‘t (yet) able to do. Most other things were relatively easy to implement though.

Kind of raises the question if there is a chance for viper based (or other direct memory) pure-python drivers? Would maybe limit the amount of c code in the repo? I dont want to hijack this PR for this though, just curious….

@robert-hh
Copy link
Contributor

Th eDAM support matches the PIO well. For other devices it may be less useful. Actually I had made such a Python module already with some support for DMA: https://github.com/robert-hh/RP2040-Examples/tree/master/rp2_util

@nickovs
Copy link
Contributor Author

nickovs commented Aug 12, 2021

Thanks for the contribution! It looks pretty comprehensive. Can you give some examples of how it would be used? Can it interface with SPI, for example?

In the simplest use case, using DMA to do a fast memory copy, you just need:

a = bytearray(32*1024)
b = bytearray(32*1024)
d = rp2.DMA()
d.setup(read=a, write=b, count=len(a)//4, config=d.default_config, trigger=True)
# Wait for completion
while d.active:
    pass

To pump byte-wide data out to the TX FIFO of a PIO block, as and when the PIO needs it, you would do something like this:

FIFO_ADDRESS = 0x50200000 + pio_num * 0x100000 + 0x10 + sm_num * 4
DREQ = (pio_num << 3) + sm_num

a = bytearray(1024)
d = rp2.DMA()

c = d.config
c.size = 0
c.inc_write = False
c.treq_sel = DREQ

d.setup(read=a, write=b, count=len(a)//4, config=c, trigger=True)

I was planning to write up full documentation when the API was stable.

I think the main point of discussion is the general approach taken to DMA.
...
The difficulty with option (2) is making it compatible across ports (the usual problem!). DMA is a pretty common feature of MCUs to copy data from A to B, but the implementation differs a lot between different architectures (eg on nRF parts there is EasyDMA which is tied to each peripheral). OTOH I don't see an issue with providing a rp2-specific DMA class (like done here), so portability is not an issue if it's just rp2.DMA.

My feeling was that trying to come up with a cross-platform abstraction of DMA, even if it were possible, would end up very convoluted, take up a lot of code space and, importantly, use valuable CPU cycles which would make it hard to write efficient interrupt handlers. Given that it's so hardware specific and that probably the biggest use case is in conjunction with PIO, that putting it next to the PIO in the rp2 module and making the model closely follow the hardware was the best plan.

@nickovs
Copy link
Contributor Author

nickovs commented Aug 12, 2021

@dpgeorge

the setup() method allows any or all of these values to be set simultaneously

Is this method similar enough to existing config() methods (eg WLAN.config()) that it's worth calling it config?

Yes, it is, and it's worth it! Currently the DMA configuration attribute is called config but I will re-name it ctrl, since that's the name of the register that is used in the RP2040 datasheet.

A utility function called buffer_address() is also added to the rp2 module that returns the address of the memory buffer underlying any object that supports the buffer interface.

Such functionality is already provided by uctypes.addressof() (although without the addition argument to select read/write, which is a nice addition). Would be better to use that existing one.

Yes, you're probably right. uctypes.addressof() doesn't check if the object is nominally writable, but in fact there are cases where you want to be able to write into a memoryview, which are read-only through the buffer interface, so it's probably better to not even try and just use the old function.

On a related note, I was thinking it would be helpful to add a method to the PIO class to get an array proxy for its TX and RX FIFO registers. This would both allow for the extraction of the per-channel register address using addressof() and also offer a non-blocking way to get and put values. If you think it would be useful then let me know and I'll put together a separate PR for that.

@nickovs
Copy link
Contributor Author

nickovs commented Aug 12, 2021

@hoihu

DMA irq handling was something that I wasn‘t (yet) able to do. Most other things were relatively easy to implement though.

Yes, you can do pretty much everything you need with Viper, but you can't sensibly do the interrupt stuff.

Kind of raises the question if there is a chance for viper based (or other direct memory) pure-python drivers? Would maybe limit the amount of c code in the repo? I dont want to hijack this PR for this though, just curious….

Viper is still quite a bit slower than C, and a run-time loaded module would take up RAM rather than sitting in ROM. You could freeze it, but then it's as inflexible as the C that the whole of the rest of Micropython is written in. Also, as noted above, it's currently impractical to do the interrupt handlers in Viper code.

@dpgeorge
Copy link
Member

On a related note, I was thinking it would be helpful to add a method to the PIO class to get an array proxy for its TX and RX FIFO registers. This would both allow for the extraction of the per-channel register address using addressof() and also offer a non-blocking way to get and put values. If you think it would be useful then let me know and I'll put together a separate PR for that.

I do like array/memoryview proxies! But can the same be achieved by simply using machine.mem32[...] and have constants for the required registers? This is how stm32 works and it's very powerful, it gives the user almost full control over peripherals.

Yes, you can do pretty much everything you need with Viper, but you can't sensibly do the interrupt stuff.

As a side note: I've had good success in a few projects (on stm32 targets) relocating the IRQ table to RAM (and redirecting it with VTOR) and then overriding selected IRQ handlers with custom functions. Such functions can be dynamically loaded C code, or inline asm (but I don't think you can make viper functions work... might be nice to provide support for that).

@nickovs
Copy link
Contributor Author

nickovs commented Aug 12, 2021

@dpgeorge

I do like array/memoryview proxies! But can the same be achieved by simply using machine.mem32[...] and have constants for the required registers?

That could work too, although we would still need a change to the PIO class since at the moment if I pass an instance of PIO then the receiver can't extract which PIO block or state machine it represents. The other down side of just using constants is that you need to know the right way to combine the offsets according to which PIO block and state machine you are accessing. It seemed like if we had a method for getting the pointer from the PIO instance then if it all changed in some future RP4184 processor then code would still work.

@hoihu
Copy link
Contributor

hoihu commented Aug 12, 2021

Viper is still quite a bit slower than C

Even though it outputs native instructions? And since most likely you'd access the peripherals directly with viper, you would also bypass HAL calls...

You could freeze it, but then it's as inflexible as the C that the whole of the rest of Micropython is written in

Yes, but there are IMO two differences:

  1. the logic is written in Python and can be adapted by users without C knowledge. Otherwise, whenever a specific addition is needed, you'll need to dig through C files...
  2. you can override a frozen module with a normal py file with the same name with adaptions to sys.path. This allows patches or changes to bring in even if the original content is already frozen.

it's currently impractical to do the interrupt handlers in Viper code.

Yeah. Would be nice to have a generic way of passing a python method callback when a certain irq number is fired. Even if the callback is not written in viper... would probably work for most use cases.

@nickovs
Copy link
Contributor Author

nickovs commented Aug 12, 2021

Viper is still quite a bit slower than C

Even though it outputs native instructions? And since most likely you'd access the peripherals directly with viper, you would also bypass HAL calls...

What's still a lot slower in Viper is access to fields in your object. Accessing an element of a struct is hugely faster than looking up an object's attributes. At the very least your Viper handler will need to look up some context information.

Yes, but there are IMO two differences:

  1. the logic is written in Python and can be adapted by users without C knowledge. Otherwise, whenever a specific addition is needed, you'll need to dig through C files...

I'd counter this by pointing out that we are talking about implementing the low-level DMA driver here, not the user application that sits on top of it. Part of the point of the PR here is that it allows users to write application-level code completely in Python. The interrupt handler in my C code is about reacting the the shared CPU interrupt, checking and clearing the interrupt flags in device registers and working out which of the users' handlers need to be called. Whether you like it or not, this level of code requires digging through files!

I'd also say that if your argument about wanting to write your lowest-level hardware drivers in Python/Viper holds for the DMA controller then why not insist that the UART and SPI and GPIO and Timer and all the other drivers be written in Python?

  1. you can override a frozen module with a normal py file with the same name with adaptions to sys.path. This allows patches or changes to bring in even if the original content is already frozen.

This is a more compelling argument, but the low-level handler is only 10 lines of C that then allows the user to write per-channel handlers in Python. This is something that users are not very likely to need to change.

it's currently impractical to do the interrupt handlers in Viper code.

Yeah. Would be nice to have a generic way of passing a python method callback when a certain irq number is fired. Even if the callback is not written in viper... would probably work for most use cases.

It might useful to be able to do that, for the bits of hardware for which there is not already a driver (or if you really decide that you want to rewrite all the drivers in Python!)

@hoihu
Copy link
Contributor

hoihu commented Aug 12, 2021

What's still a lot slower in Viper is access to fields in your object. Accessing an element of a struct is hugely faster than looking up an object's attributes. At the very least your Viper handler will need to look up some context information.

Ok, I see your point.

Whether you like it or not, this level of code requires digging through files!
why not insist that the UART and SPI and GPIO and Timer and all the other drivers be written in Python?

uh, don't get me wrong, I'm not against the PR as it stands. As I said, I think it is a much needed addition for the RPi port.

I was merely thinking if python-based peripheral drivers could make sense. And yes, possibly also for UARTs, Timers, ADCs etc. But this is certainly not the scope of this �PR, so please ignore my comments above. Sorry for the noise.

@nickovs
Copy link
Contributor Author

nickovs commented Aug 13, 2021

I have switched the implementation of the registers proxy from an array to a memoryview. The main advantage of this is that now you can take a slice of the registers and the address of the resulting slice is the address of the sliced registers (rather than the address of a copy of their values). This lets you write code like dma1.write = dma2.registers[14:16] when you want to use dma1 to send a gather list to dma2.

@tjvr
Copy link
Contributor

tjvr commented Aug 14, 2021

Very excited for this! I started trying to write a DMA class myself, for streaming data into PIO faster than is possible from Python, but I didn't get nearly this far.

The trickiest part I found was trying to come up with a nice API for setting up a DMA channel to write to a specific PIO StateMachine, as you touched on in the thread above.

I'm hoping to see this merged! 🙏

@nickovs
Copy link
Contributor Author

nickovs commented Aug 17, 2021

@dpgeorge Further to your first comment last week, I have submitted PR #7670 with documentation that includes a couple of examples as well as API details.

@nickovs
Copy link
Contributor Author

nickovs commented Aug 21, 2021

What's the chances of this, and #7650 (and #7670) getting merged? The tests are passing, it doesn't really touch any existing code so regression is unlikely and there seems to be interest in the functionality. @dpgeorge @jimmo

@nickovs
Copy link
Contributor Author

nickovs commented Dec 29, 2021

Any further opinions on this and and then other two associated PRs? It would be nice to stop needing custom builds for projects that use DMA.

@nickovs
Copy link
Contributor Author

nickovs commented Mar 20, 2022

@dpgeorge Any chance of getting this merged?

@dpgeorge
Copy link
Member

dpgeorge commented Apr 4, 2022

Sorry, will take a look at this next.

@dpgeorge
Copy link
Member

dpgeorge commented Apr 8, 2022

This addition is well thought out and has good use-cases, and is a very clean implementation that follows the API of existing classes (except perhaps for heavy use of attributes, but that's not a big issue).

But I feel that it might be overkill and possibly going in a slightly wrong direction. Maybe I'm being too critical... but if we distill this API down into the core functionality it provides we get:

  1. a convenient way to get/set the DMA config register (via DMAConfig class)
  2. a way to allocate an unused DMA channel (calling dma_claim_unused_channel())
  3. a way to set DMA read/write/count registers and start the DMA
  4. a way to set DMA IRQ handlers

(1) can already be done via appropriate mem32[...] calls, or using uctypes and its bitfield functionality. Similarly (3) can use mem32[...].

(2) and (4) provide new functionality.

The API provided here is quite low level, the user would still need to read the RP2040 datasheet to understand what values to set the config register to (via DMAConfig). If the API is low-level, does it really provide much more than just getting/setting registers via mem32[...]?

I don't really have any strong objection to this PR, just a feeling that there could be a better way to expose DMA functionality. Eg not expose DMA directly to the user but rather have methods on PIO to do DMA put/get to the state machine FIFO. (SPI and I2S already use DMA behind the scenes.)


My biggest question for this PR is: can it retain the same functionality but in half the code size? Can you take away things while still letting the user do everything?

It seems to me that DMAConfig is not really necessary and instead config values can just be a (32-bit) integer. That would probably halve the code size and not reduce functionality. The default config value can be provided as an int (DMA.default_ctrl is just an int), and maybe even a helper function included to compute a config value from its components, eg create_config(high_priority, size, inc_read, inc_write, ...) -> int. And maybe even the DMA attributes (read, write, count) are not necessary because they can already be set via DMA.config().

@dpgeorge
Copy link
Member

dpgeorge commented Apr 8, 2022

And maybe even the DMA attributes (read, write, count) are not necessary because they can already be set via DMA.config().

Also note that other APIs that have a .config() method allow retrieving a config value by specifying it as a string as a single arg, eg DMA.config('count') would return the count register value.

@nickovs
Copy link
Contributor Author

nickovs commented Oct 23, 2023

When you do addressof(d.registers[5]) you are taking the address of the value that gets fetched from the register at index 5. To take the address of the register at index 5 you need to take the address of a slice that starts at index 5, i.e. addressof(d.registers[5:6]).

As for the registers not being writable, I suspect that there must have been some change to the initialisation process since I wrote the code two years ago, since it worked fine back then! I will investigate.

As for the conflict of DMA channels, the code here claims an un-claimed channel. If the library you are using is just using channels 0 and 1 without properly claiming them I'd argue that's a bug in their code not mine! It's worth noting that properly claiming channels is the other thing that this code does that can't be done at all in pure Python code.

Signed-off-by: Nicko van Someren <nicko@nicko.org>
@nickovs
Copy link
Contributor Author

nickovs commented Oct 23, 2023

It does seem that the semantics of memoryview creation changed at some point in the last couple of years. The code is now fixed and the registers view is writable once more. Thank you @hexagon5un for catching that!

@hexagon5un
Copy link

Awesome, memoryview works for write and read!

Putting it through its paces, I find the following behavior strange:

>>> d = rp2.DMA()
>>> [hex(x) for x in d.registers[0:4]]
['0x0', '0x0', '0x0', '0x0']
>>> d.config(count=5)
>>> [hex(x) for x in d.registers[0:4]]
['0x0', '0x0', '0x0', '0x0']   ## what??

Fleshing it out a bit more:

>>> a = bytes("hello","ascii")
>>> b = bytearray(5)
>>> d = rp2.DMA()
>>> d.config(read=a, write=b, count=5, ctrl=d.pack_ctrl(size=0))
>>> [hex(x) for x in d.registers[0:4]]
['0x2000a422', '0x2000a190', '0x0', '0x3f8031']
>>> d.config(trigger=True)
>>> print(b)
bytearray(b'hello')

It looks like rp2.DMA is queuing the transaction count internally, only writing it (to a trigger register) when trigger is set true in config. But why does it not write COUNT in the first place?

This will mess up configuration for automatic DMA chains where COUNT isn't the trigger. If I had a control DMA feeding to the READ_TRIG register, for instance, maybe to gather words from different memory locations into a single peripheral, this would fail because there's nothing in the COUNT register despite setting up config(count=5, write=addr, ctrl=c_ctrl) command.

A fix?

Because the registers are aliases, you can write to them all without firing off the DMA as long as you don't write to any of the trigger registers. So, for instance, I have this snippet in my hacky DMA lib:

## You can always write to one of these 4 and it will _never_ trigger
READ_ADDR   = 0x00
WRITE_ADDR  = 0x04
TRANS_COUNT = 0x08
CTRL        = 0x10

## Writing to one of these will _always_ trigger
READ_ADDR_TRIG   = 0x3C
WRITE_ADDR_TRIG  = 0x2C
TRANS_COUNT_TRIG = 0x1C
CTRL_TRIG        = 0x0C

So you could have rp2.DMA unconditionally write all data to the "safe" registers and COUNT would get stored.

But then what should/could rp2.DMA do in the case of triggering without writing a value? d.config(trigger=True) with no config data?

One approach is to read an arbitrary register and write the value back to the trigger version of the same register. That won't change any of the configs, but it will trigger. E.g. this is essentially a trigger-NOP: d.registers[3] = d.registers[4].

Another possibility is to break trigger and config functionality apart in the rp2.DMA code: d.config() and d.trigger(). That's further from the hardware, but maybe a cleaner coding flow? config() could always write to safe registers, and trigger() could always execute a trigger-NOP.

@nickovs
Copy link
Contributor Author

nickovs commented Oct 24, 2023

@hexagon5un The register reading behaviour you see when reading the count register is the expected one. From the RP2040 datasheet:

2.5.1.2. Transfer Count
Reading from TRANS_COUNT yields the number of transfers remaining in the current transfer sequence

When you write a count to the count register it is remembered for the next transfer that you trigger, but you can't read it back because that register always returns the number of transfers left in the current sequence.

As for setting up gather chains with DMA, this works just fine as is. You can just follow the example in section 2.5.6.2. The model here is that you have one DMA channel wrote (length, read address) pairs into alias 3 of the registers (d.registers[14:16]) for the other channel, and you set the chain_to field in the control word for the second channel to start another transfer in the first channel. On the first channel you enable 'ring' mode for the writes with a size of 8 bytes, so that all the writes of (length, read address) pairs get sent to the same registers.

I hope this helps!

@hexagon5un
Copy link

hexagon5un commented Oct 24, 2023

Thanks! I see now in the datasheet. How strange that I've never noticed that before! I guess I've never tried to read the count.
It's stored in a DBG register, and as you say, is only loaded in when the next transaction starts.

>>> machine.mem32[dma.BASE+0x804]
5

Thanks again for the explanation.

I'm actually in the process of converting some pretty gnarly chains of DMA/PIO machines into your DMA library's implementation, because the interrupts should actually help me untangle some of it, so I've been double-checking everything along the way, and learning a lot. It's a very interesting exercise. :)

@nickovs
Copy link
Contributor Author

nickovs commented Oct 24, 2023

@hexagon5un You're very welcome. I'm grateful that you're testing it this hard; it's good to have shaken these thing out before the merge. I'm updating the documentation for the DMA at the moment and I'll try to make sure that these sorts of use cases are covered.

@nickovs
Copy link
Contributor Author

nickovs commented Oct 24, 2023

@dpgeorge I'm going to assert that this failing check is not the fault of my code:

Build started
Update-AppveyorBuild -Version $env:appveyor_repo_commit.substring(0,8)
git clone -q --depth=1 https://github.com/micropython/micropython.git C:\projects\micropython
fatal: unable to access 'https://github.com/micropython/micropython.git/': Could not resolve host: github.com
Command exited with code 128

@samskiter
Copy link

Hey, I've been using this branch to access DMA from micropython: #10704 Ive submitted some fixes on top to bring it up to date and add documentation (markb139#1)

What's the pro's cons of each PR? which are the maintainers looking to take forward?

@nickovs
Copy link
Contributor Author

nickovs commented Nov 2, 2023

@samskiter I've not used that other branch, but taking a quick look here are some of the differences:

  • My patch provides a more general interface for specifying the source and destination of the transfer, where the parameters can either be an integer or anything that offers a buffer interface. As a result, whereas I have only one general function for setting the source and destination, the PR you mention offers three different function (write_from, read_to and copy) and it doesn't support transfers from one arbitrary registrer address to another at all.
  • My patch presents a simple interface for referencing the bank of registers for the DMA channel, without knowing anything about where the RP2040 places things, which makes chaining much simpler and more intuitive.
  • My patch provides comprehensive control over all of the DMA control register, rather than only supporting a specific subset of the use cases.
  • The patch that you reference includes a whole extra Timer class, which is clearly useful but rather unrelated.

@samskiter
Copy link

@samskiter I've not used that other branch, but taking a quick look here are some of the differences:

  • My patch provides a more general interface for specifying the source and destination of the transfer, where the parameters can either be an integer or anything that offers a buffer interface. As a result, whereas I have only one general function for setting the source and destination, the PR you mention offers three different function (write_from, read_to and copy) and it doesn't support transfers from one arbitrary registrer address to another at all.
  • My patch presents a simple interface for referencing the bank of registers for the DMA channel, without knowing anything about where the RP2040 places things, which makes chaining much simpler and more intuitive.
  • My patch provides comprehensive control over all of the DMA control register, rather than only supporting a specific subset of the use cases.
  • The patch that you reference includes a whole extra Timer class, which is clearly useful but rather unrelated.

Thanks - I have found some issues with the way Stubs are generated for the RP2 class which I think would be worth bearing in mind when adding the docs for this feature.... #7641 (comment)

@samskiter samskiter mentioned this pull request Nov 2, 2023
@markb139
Copy link

markb139 commented Nov 3, 2023

Author of the other patch here.
Access to the Timer is related in that you can pace DMA transfers by using the Timer. I use this in my code for playing audio at a fixed rate

@nickovs
Copy link
Contributor Author

nickovs commented Nov 3, 2023

@markb139 Thanks for the comment. I can see how that would be useful. Can you explain the difference between your Timer class and the standard machine.Timer class? If there are specific features needed it might make sense to add these to the standard class rather than replicating it.

@mendenm
Copy link

mendenm commented Nov 5, 2023

I really look forward to this appearing in micropython. the rp2 DMA is a critical feature. Really, the claim and release functions are the most critical, since without them conflicts are inevitable. Everything else could be pure python (although it is better compiled in). I am planning to implement the use of a dma channel to compute crc-16-ccitt, which is a cute side feature of a dma channel, which can be invoked by just taking data already in memory and dma-ing it to a non-incrementing address, while collecting the CRC.

@samskiter
Copy link

Really, the claim and release functions are the most critical

Definitely would second this - if there was any doubt or delay getting this PR in I would consider separating just the claim/unclaim part and getting it over the line

@dpgeorge
Copy link
Member

Thanks @nickovs for updating this PR based on the feedback. And thanks to everyone who has tested this, including @hexagon5un . I understand there are subtleties using the DMA (eg ordering of register writes, and the trigger aliases) and it seems the code here has got things working well.

I can see how that would be useful. Can you explain the difference between your Timer class and the standard machine.Timer class? If there are specific features needed it might make sense to add these to the standard class rather than replicating it.

Regarding the timer: according to the datasheet the DMA has four internal pacing timers, and these are independent resources, not related to other timers or machine.Timer. It would probably make sense to expose the functionality of these pacing timers. They are very simple so not worth making a class for them, they only have one register associated with them which sets their rate of counting. Then you select a timer for the DMA DREQ using the TREQ_SEL entry in the CTRL register.

Support for these pacing timers could be added in a separate PR.

}

// Make sure that the value is an integer
o = mp_unary_op(MP_UNARY_OP_INT_MAYBE, o);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably simplest to just use mp_obj_int_get_truncated(o) here. That will do necessary checking and calling MP_UNARY_OP_INT_MAYBE if needed. And it will support large integers that point to an address within the 32-bit address space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use mp_obj_get_int_truncated().

};

STATIC mp_obj_t rp2_dma_make_new(const mp_obj_type_t *type, size_t n_args, size_t n_kw, const mp_obj_t *args) {
mp_arg_check_num(n_args, n_kw, 0, 0, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth taking an optional argument here, to specify the dma_channel explicitly, as an integer? If nothing is given then this function can use dma_claim_unused_channel().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do that, but before doing so we would need to decide what to do in the case that the user picks a channel that is already in use. I can't think of a scenario where it makes sense for the user to usurp a channel that someone else is already using. As a result I can see downsides to this but not upsides. If there's a use case then I'd be happy to support it but without one I think it's unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of the case in the comment above #7641 where DMA 0 and 1 were used by another driver.

From your comments, I think your opinion there is that the other driver should also use the dma_claim_unused_channel(), and I guess that's the preferred solution: that all users of DMA must allocate a DMA instance in this way. Eg that other driver could create an rp2.DMA() instance and use dma.channel_id to get the DMA to use.

So, let's leave this as you have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, another driver written in C can and should call dma_claim_unused_channel() and once there is an official rp2.DMA() class, drivers written in Python can and should use that.


STATIC void rp2_dma_error_if_closed(rp2_dma_obj_t *self) {
if (self->channel == CHANNEL_CLOSED) {
mp_raise_ValueError(MP_ERROR_TEXT("Channel closed"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use lowercase to start an error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use lowercase.

dest[0] = reg_view;
} else {
// If a Micropython class supports attributes then the locals dict is not searched.
// We do it manually here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, this can be simplified.

if (attr_in == MP_QSTR_active) {
rp2_dma_error_if_closed(self);
uint32_t busy = dma_channel_is_busy(self->channel);
dest[0] = mp_obj_new_bool((mp_int_t)busy);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other existing classes (eg network.WLAN, bluetooth.BLE, rp2.PIO) have an active method with signature:

.active([value])

That allows getting (don't pass the value) and setting (do pass the value) the "active" state of the peripheral. I think this DMA class should match that API and have a DMA.active([value]) method, instead of this attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that they do, but that pattern has always bugged me so I was trying to avoid it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a method.

} else {
dma_channel_abort(self->channel);
}
dest[0] = MP_OBJ_NULL; // indicate success
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I think this attribute should be a method to match other classes.

Copy link
Member

@dpgeorge dpgeorge Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left as attributes.

active changed to a method.

} else if (attr_in == MP_QSTR_ctrl) {
rp2_dma_error_if_closed(self);
dest[0] = mp_obj_new_int_from_uint((mp_uint_t)reg_block->al1_ctrl);
} else if (attr_in == MP_QSTR_channel_id) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call it channel? That's a little shorter and it I don't think the _id adds much..?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. Will do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to channel.

} else if (attr_in == MP_QSTR_read) {
uint32_t value = rp2_dma_register_value_from_obj(dest[1], REG_TYPE_ADDR_READ);
dma_channel_set_read_addr(self->channel, (volatile void *)value, false);
dest[0] = MP_OBJ_NULL; // indicate success
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these attribute setters needed? Is there any difference in doing:

dma.config(read=x)
# vs
dma.read = x

?

Similarly, getters could be dma.config("read").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left as-is.

mp_uint_t mask = ((1 << rp2_dma_ctrl_fields_table[i].length) - 1);
mp_uint_t masked_value = field_value & mask;
if (field_value != masked_value) {
mp_raise_ValueError(MP_ERROR_TEXT("Bad field value"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use lowercase to start error messages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use lowercase.

}

if (remaining) {
mp_raise_type(&mp_type_AttributeError);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be mp_raise_TypeError(NULL). Eg in CPython:

>>> print(z=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'z' is an invalid keyword argument for print()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a TypeError.

@dpgeorge
Copy link
Member

I have applied changes based on the discussion above and merged this in cfc212b. I also fixed the finaliser so it works, and renamed IRQ_quiet to irq_quiet to match all the other entries in the ctrl register.

@dpgeorge dpgeorge closed this Dec 22, 2023
@dpgeorge
Copy link
Member

I did not push to this PR before merging (like I usually do). Instead I left it for reference. The changes I made before merging were (the only real change to the API is changing active from an attribute to a method):

--- a/ports/rp2/rp2_dma.c
+++ b/ports/rp2/rp2_dma.c
@@ -24,26 +24,19 @@
  * THE SOFTWARE.
  */
 
-
 #include <string.h>
 
-#include "py/binary.h"
 #include "py/runtime.h"
 #include "py/mperrno.h"
-#include "py/mphal.h"
-#include "py/objint.h"
 #include "py/objarray.h"
+#include "shared/runtime/mpirq.h"
 #include "modrp2.h"
 
 #include "hardware/irq.h"
 #include "hardware/dma.h"
-#include "shared/runtime/mpirq.h"
 
 #define CHANNEL_CLOSED 0xff
 
-// Forward declaration
-STATIC const mp_obj_dict_t rp2_dma_locals_dict;
-
 typedef struct _rp2_dma_ctrl_obj_t {
     mp_obj_base_t base;
     uint32_t value;
@@ -73,7 +66,7 @@ STATIC rp2_dma_ctrl_field_t rp2_dma_ctrl_fields_table[] = {
     { MP_QSTR_ring_sel,     10, 1, 0 },
     { MP_QSTR_chain_to,     11, 4, 0 },
     { MP_QSTR_treq_sel,     15, 6, 0 },
-    { MP_QSTR_IRQ_quiet,    21, 1, 0 },
+    { MP_QSTR_irq_quiet,    21, 1, 0 },
     { MP_QSTR_bswap,        22, 1, 0 },
     { MP_QSTR_sniff_en,     23, 1, 0 },
     { MP_QSTR_busy,         24, 1, 1 },
@@ -82,8 +75,8 @@ STATIC rp2_dma_ctrl_field_t rp2_dma_ctrl_fields_table[] = {
     { MP_QSTR_read_err,     30, 1, 0 },
     { MP_QSTR_ahb_err,      31, 1, 1 },
 };
-STATIC uint32_t rp2_dma_ctrl_field_count = sizeof(rp2_dma_ctrl_fields_table) / sizeof(rp2_dma_ctrl_field_t);
 
+STATIC const uint32_t rp2_dma_ctrl_field_count = MP_ARRAY_SIZE(rp2_dma_ctrl_fields_table);
 
 #define REG_TYPE_COUNT       0  // Accept just integers
 #define REG_TYPE_CONF        1  // Accept integers or ctrl values
@@ -99,17 +92,9 @@ STATIC uint32_t rp2_dma_register_value_from_obj(mp_obj_t o, int reg_type) {
         }
     }
 
-    // Make sure that the value is an integer
-    o = mp_unary_op(MP_UNARY_OP_INT_MAYBE, o);
-
-    if (mp_obj_is_int(o)) {
-        return mp_obj_int_get_uint_checked(o);
-    } else {
-        mp_raise_ValueError(MP_ERROR_TEXT("value not an integer"));
-    }
+    return mp_obj_get_int_truncated(o);
 }
 
-
 STATIC void rp2_dma_irq_handler(void) {
     // Main IRQ handler
     uint32_t irq_bits = dma_hw->ints0;
@@ -162,7 +147,7 @@ STATIC mp_obj_t rp2_dma_make_new(const mp_obj_type_t *type, size_t n_args, size_
         mp_raise_OSError(MP_EBUSY);
     }
 
-    rp2_dma_obj_t *self = m_new_obj(rp2_dma_obj_t);
+    rp2_dma_obj_t *self = m_new_obj_with_finaliser(rp2_dma_obj_t);
     self->base.type = &rp2_dma_type;
     self->channel = dma_channel;
 
@@ -172,7 +157,7 @@ STATIC mp_obj_t rp2_dma_make_new(const mp_obj_type_t *type, size_t n_args, size_
 
 STATIC void rp2_dma_error_if_closed(rp2_dma_obj_t *self) {
     if (self->channel == CHANNEL_CLOSED) {
-        mp_raise_ValueError(MP_ERROR_TEXT("Channel closed"));
+        mp_raise_ValueError(MP_ERROR_TEXT("channel closed"));
     }
 }
 
@@ -182,11 +167,7 @@ STATIC void rp2_dma_attr(mp_obj_t self_in, qstr attr_in, mp_obj_t *dest) {
     if (dest[0] == MP_OBJ_NULL) {
         // Load attribute
         dma_channel_hw_t *reg_block = dma_channel_hw_addr(self->channel);
-        if (attr_in == MP_QSTR_active) {
-            rp2_dma_error_if_closed(self);
-            uint32_t busy = dma_channel_is_busy(self->channel);
-            dest[0] = mp_obj_new_bool((mp_int_t)busy);
-        } else if (attr_in == MP_QSTR_read) {
+        if (attr_in == MP_QSTR_read) {
             rp2_dma_error_if_closed(self);
             dest[0] = mp_obj_new_int_from_uint((mp_uint_t)reg_block->read_addr);
         } else if (attr_in == MP_QSTR_write) {
@@ -198,20 +179,15 @@ STATIC void rp2_dma_attr(mp_obj_t self_in, qstr attr_in, mp_obj_t *dest) {
         } else if (attr_in == MP_QSTR_ctrl) {
             rp2_dma_error_if_closed(self);
             dest[0] = mp_obj_new_int_from_uint((mp_uint_t)reg_block->al1_ctrl);
-        } else if (attr_in == MP_QSTR_channel_id) {
+        } else if (attr_in == MP_QSTR_channel) {
             dest[0] = mp_obj_new_int_from_uint(self->channel);
         } else if (attr_in == MP_QSTR_registers) {
             mp_obj_array_t *reg_view = m_new_obj(mp_obj_array_t);
-            mp_obj_memoryview_init(reg_view, 'I' | 0x80, 0, 16, dma_channel_hw_addr(self->channel));
+            mp_obj_memoryview_init(reg_view, 'I' | MP_OBJ_ARRAY_TYPECODE_FLAG_RW, 0, 16, dma_channel_hw_addr(self->channel));
             dest[0] = reg_view;
         } else {
-            // If a Micropython class supports attributes then the locals dict is not searched.
-            // We do it manually here.
-            mp_map_t *locals_map = (mp_map_t *)&rp2_dma_locals_dict.map;
-            mp_map_elem_t *elem = mp_map_lookup(locals_map, MP_OBJ_NEW_QSTR(attr_in), MP_MAP_LOOKUP);
-            if (elem != NULL) {
-                mp_convert_member_lookup(self, &rp2_dma_type, elem->value, dest);
-            }
+            // Continue attribute search in locals dict.
+            dest[1] = MP_OBJ_SENTINEL;
         }
     } else {
         // Set or delete attribute
@@ -222,14 +198,7 @@ STATIC void rp2_dma_attr(mp_obj_t self_in, qstr attr_in, mp_obj_t *dest) {
 
         rp2_dma_error_if_closed(self);
 
-        if (attr_in == MP_QSTR_active) {
-            if (mp_obj_is_true(dest[1])) {
-                dma_channel_start(self->channel);
-            } else {
-                dma_channel_abort(self->channel);
-            }
-            dest[0] = MP_OBJ_NULL; // indicate success
-        } else if (attr_in == MP_QSTR_read) {
+        if (attr_in == MP_QSTR_read) {
             uint32_t value = rp2_dma_register_value_from_obj(dest[1], REG_TYPE_ADDR_READ);
             dma_channel_set_read_addr(self->channel, (volatile void *)value, false);
             dest[0] = MP_OBJ_NULL; // indicate success
@@ -254,7 +223,7 @@ STATIC void rp2_dma_print(const mp_print_t *print, mp_obj_t self_in, mp_print_ki
     mp_printf(print, "DMA(%u)", self->channel);
 }
 
-
+// DMA.config(*, read, write, count, ctrl, trigger)
 STATIC mp_obj_t rp2_dma_config(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
     rp2_dma_obj_t *self = MP_OBJ_TO_PTR(*pos_args);
 
@@ -313,9 +282,28 @@ STATIC mp_obj_t rp2_dma_config(size_t n_args, const mp_obj_t *pos_args, mp_map_t
 }
 STATIC MP_DEFINE_CONST_FUN_OBJ_KW(rp2_dma_config_obj, 1, rp2_dma_config);
 
+// DMA.active([value])
+STATIC mp_obj_t rp2_dma_active(size_t n_args, const mp_obj_t *args) {
+    rp2_dma_obj_t *self = MP_OBJ_TO_PTR(args[0]);
+    rp2_dma_error_if_closed(self);
+
+    if (n_args > 1) {
+        if (mp_obj_is_true(args[1])) {
+            dma_channel_start(self->channel);
+        } else {
+            dma_channel_abort(self->channel);
+        }
+    }
+
+    uint32_t busy = dma_channel_is_busy(self->channel);
+    return mp_obj_new_bool((mp_int_t)busy);
+}
+STATIC MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(rp2_dma_active_obj, 1, 2, rp2_dma_active);
+
 // Default is quiet, unpaced, read and write incrementing, word transfers, enabled
 #define DEFAULT_DMA_CONFIG (1 << 21) | (0x3f << 15) | (1 << 5) | (1 << 4) | (2 << 2) | (1 << 0)
 
+// DMA.pack_ctrl(...)
 STATIC mp_obj_t rp2_dma_pack_ctrl(size_t n_pos_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
     // Pack keyword settings into a control register value, using either the default for this
     // DMA channel or the provided defaults
@@ -327,16 +315,10 @@ STATIC mp_obj_t rp2_dma_pack_ctrl(size_t n_pos_args, const mp_obj_t *pos_args, m
     }
     mp_uint_t remaining = kw_args->used;
 
-    for (mp_uint_t i = 0; i < remaining; i++) {
-        if (!mp_obj_is_int(kw_args->table[i].value)) {
-            mp_raise_ValueError(MP_ERROR_TEXT("Arg not an int"));
-        }
-    }
-
     mp_map_elem_t *default_entry = mp_map_lookup(kw_args, MP_OBJ_NEW_QSTR(MP_QSTR_default), MP_MAP_LOOKUP);
     if (default_entry) {
         remaining--;
-        value = mp_obj_int_get_uint_checked(default_entry->value);
+        value = mp_obj_get_int_truncated(default_entry->value);
     }
 
     for (mp_uint_t i = 0; i < rp2_dma_ctrl_field_count; i++) {
@@ -349,11 +331,11 @@ STATIC mp_obj_t rp2_dma_pack_ctrl(size_t n_pos_args, const mp_obj_t *pos_args, m
             remaining--;
             // Silently ignore read-only fields, to allow the passing of a modified unpack_ctrl() results
             if (!rp2_dma_ctrl_fields_table[i].read_only) {
-                mp_uint_t field_value = mp_obj_int_get_uint_checked(field_entry->value);
+                mp_uint_t field_value = mp_obj_get_int_truncated(field_entry->value);
                 mp_uint_t mask = ((1 << rp2_dma_ctrl_fields_table[i].length) - 1);
                 mp_uint_t masked_value = field_value & mask;
                 if (field_value != masked_value) {
-                    mp_raise_ValueError(MP_ERROR_TEXT("Bad field value"));
+                    mp_raise_ValueError(MP_ERROR_TEXT("bad field value"));
                 }
                 value &= ~(mask << rp2_dma_ctrl_fields_table[i].shift);
                 value |= masked_value << rp2_dma_ctrl_fields_table[i].shift;
@@ -362,22 +344,19 @@ STATIC mp_obj_t rp2_dma_pack_ctrl(size_t n_pos_args, const mp_obj_t *pos_args, m
     }
 
     if (remaining) {
-        mp_raise_type(&mp_type_AttributeError);
+        mp_raise_TypeError(NULL);
     }
 
     return mp_obj_new_int_from_uint(value);
 }
 STATIC MP_DEFINE_CONST_FUN_OBJ_KW(rp2_dma_pack_ctrl_obj, 1, rp2_dma_pack_ctrl);
 
+// DMA.unpack_ctrl(value)
 STATIC mp_obj_t rp2_dma_unpack_ctrl(mp_obj_t value_obj) {
     // Return a dict representing the unpacked fields of a control register value
     mp_obj_t result_dict[rp2_dma_ctrl_field_count * 2];
 
-    if (!mp_obj_is_int(value_obj)) {
-        mp_raise_ValueError(MP_ERROR_TEXT("int required"));
-    }
-
-    mp_uint_t value = mp_obj_int_get_uint_checked(value_obj);
+    mp_uint_t value = mp_obj_get_int_truncated(value_obj);
 
     for (mp_uint_t i = 0; i < rp2_dma_ctrl_field_count; i++) {
         result_dict[i * 2] = MP_OBJ_NEW_QSTR(rp2_dma_ctrl_fields_table[i].name);
@@ -432,7 +411,7 @@ STATIC mp_obj_t rp2_dma_irq(size_t n_args, const mp_obj_t *pos_args, mp_map_t *k
 }
 STATIC MP_DEFINE_CONST_FUN_OBJ_KW(rp2_dma_irq_obj, 1, rp2_dma_irq);
 
-
+// DMA.close()
 STATIC mp_obj_t rp2_dma_close(mp_obj_t self_in) {
     rp2_dma_obj_t *self = MP_OBJ_TO_PTR(self_in);
     uint8_t channel = self->channel;
@@ -456,6 +435,7 @@ STATIC MP_DEFINE_CONST_FUN_OBJ_1(rp2_dma_close_obj, rp2_dma_close);
 
 STATIC const mp_rom_map_elem_t rp2_dma_locals_dict_table[] = {
     { MP_ROM_QSTR(MP_QSTR_config), MP_ROM_PTR(&rp2_dma_config_obj) },
+    { MP_ROM_QSTR(MP_QSTR_active), MP_ROM_PTR(&rp2_dma_active_obj) },
     { MP_ROM_QSTR(MP_QSTR_irq), MP_ROM_PTR(&rp2_dma_irq_obj) },
     { MP_ROM_QSTR(MP_QSTR_close), MP_ROM_PTR(&rp2_dma_close_obj) },
     { MP_ROM_QSTR(MP_QSTR___del__), MP_ROM_PTR(&rp2_dma_close_obj) },
@@ -474,7 +454,6 @@ MP_DEFINE_CONST_OBJ_TYPE(
     locals_dict, &rp2_dma_locals_dict
     );
 
-
 void rp2_dma_init(void) {
     // Set up interrupts.
     memset(MP_STATE_PORT(rp2_dma_irq_obj), 0, sizeof(MP_STATE_PORT(rp2_dma_irq_obj)));

@markb139
Copy link

Is there any documentation or examples ?

@nickovs
Copy link
Contributor Author

nickovs commented Dec 22, 2023

@markb139 Yes, there is documentation coming in #7670. Damien made some changes to few of the names and switched out one attribute for a function, and there was also a recent change to the way the config works, so I need to do a few updates before it gets merged. I will try to get to those this weekend.

@Hoernchen
Copy link

It's great that this was finally merged, but does not still lack a helper function to get aligned memory to make rings work?

@mendenm
Copy link

mendenm commented Dec 24, 2023

If you are willing to waste some memory, the aligned buffers for wrap-around DMA can be done in pure python. I wrote this class, which I can contribute if there is interest. It allocates extra space in a regular bytearray, and then works out the offset to get aligned buffers. If you aren't running right at the limit of available RAM, it is effective.

I also wrote a version which works with the unstriped memory blocks, for very high speed access, but it wastes a lot more memory, and the utility of it is limited. Realistically, I doubt there are many people who need the simultaneous bandwidth of multiple unstriped blocks.

class aligned_normal_buffers:
    """
        make this a class, so it is hard to orphan the memoryviews and corrupt stuff.
        Usually this should get called very early on to create DMA buffers,
        and leave them forever
        since there are very likely pointers to them in DMA registers.
        This is designed to work with powers of 2 for the bufsize, for DMA wrapping and chaining (ping-pong).
        This achieves the alignment by allocating a bigger chunk of memory (n+1 buffers),
        and then offsetting the base of the selected chunks enough to align them.
        It is slightly wasteful, but not likely to be a problem unless you are
        using up all memory with a small number of very big buffers.
    """
    def __init__(self, bufsize, nbuf):
        """ create a set of aligned buffers in cycling blocks on normal SRAM, for DMA wrapping
        """
        self.bufcount = nbuf
        self.bufsize = bufsize
        
        count = bufsize * (nbuf + 1)
        baseblock = bytearray(count)
        addr = uctypes.addressof(baseblock)
        
        #bump base of buffers forward to (presumably 2^n) boundary
        addr  = ((addr + bufsize - 1) //  bufsize) * bufsize
        
        self.aligned_addr = addr
        
        self.buffers = [
            uctypes.bytearray_at(addr + i * bufsize, bufsize)
                for i in range(nbuf)
        ]
        self.baseblock = baseblock
        self.offset = 0
        
    def __getitem__(self, idx):
        return self.buffers[idx]
    
    def __str__(self):
        return (
            f'{str(self.__class__):s}, count = {self.bufcount:d}, size = {self.bufsize:d} at\n' +
            '\n'.join([f"0x{uctypes.addressof(x):08x}" for x in self.buffers])
        )

@andrewleech
Copy link
Contributor

Do you just want aligned buffers to make a (python) ringbuffer? If so, maybe this would work better? #9458

@Hoernchen
Copy link

No, the plan was wrapping transfers from a ring containing data for the aliased dma regs, like the https://github.com/raspberrypi/pico-examples/blob/master/dma/control_blocks/control_blocks.c example.

I guess the unknown amount of heap fragmentation makes this difficult, the only reasonable way that does not waste a lot of ram for larger buffers would be some sort of micropython pre (gc)-init hook that reserves aligned ram.

@nickovs
Copy link
Contributor Author

nickovs commented Dec 24, 2023

@Hoernchen Right now there isn't a good way of getting the memory allocator to allocate aligned buffers, so doing this requires deeper work inside the allocator. That said, you don't need to allocate aligned buffers in order to implement scatter/gather transfers like the one in your example. In that case the ring operation is applied on the register bank but it doesn't need a ring on the memory blocks, so aligned memory is unnecessary.

I plan to finish up the documentation over the holidays and I will include a Python version of the control blocks code you cited in the examples.

@Hoernchen
Copy link

Ah yes, sorry, what I had in mind is actually the opposite and I picked the wrong example, the control buffer ring was supposed to wrap and write to one not incremented dma reg, in which case it does have to be aligned...

@nickovs nickovs deleted the rp2_dma branch December 28, 2023 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.