py: Cast mp_obj_t to concrete types explicitly. #1166

pfalcon · 2015-03-19T23:17:41Z

mp_obj_t internal representation doesn't have to be a pointer to object,
it can be anything.

There's also a support for back-conversion in the form of MP_OBJ_UNCAST.
This is kind of optimization/status quo preserver to minimize patching the
existing code and avoid doing potentially expensive MP_OBJ_CAST over and
over. But then one may imagine implementations where MP_OBJ_UNCAST is very
expensive. But such implementations are unlikely interesting in practice.

pfalcon · 2015-03-19T23:18:26Z

Ok, catching it while it hot - following comment #1161 (comment) . @dpgeorge , how do you like this?

dpgeorge · 2015-03-19T23:27:13Z

I was independently thinking about this in the following context: one may want to box types in a different way to the current implementation. Eg nan-boxing, or, more useful, having 64-bit objects on a 32-bit system, such that a pointer would take up only half the object. The reason for the latter would be to have 63-bit integers on a 32-bit system, and 31-bit integers on a 16-bit system.

dpgeorge · 2015-03-19T23:28:56Z

py/objlist.c

@@ -156,7 +156,7 @@ STATIC mp_obj_t list_subscr(mp_obj_t self_in, mp_obj_t index, mp_obj_t value) {
        // delete
 #if MICROPY_PY_BUILTINS_SLICE
        if (MP_OBJ_IS_TYPE(index, &mp_type_slice)) {
-            mp_obj_list_t *self = self_in;
+            mp_obj_list_t *self = MP_OBJ_CAST(self_in);


If MP_OBJ_CAST is something like (self_in+4) and self_in is never referenced again in this function, then the compiler may optimise it such that self reuses the stack/register that self_in previously used. Thus, no pointer to the original self_in exists any more and the object may be GC'd.

Well, this is bigger issue, as mentioned in #1161. The general direction of handling it should be that GC is not async, but may happen only in specific checkpoints (like when allocations is called). And past those checkpoints, MP_OBJ_CAST should be called again. And yes, at the worst case, we will need to add (configurable) volatile there. But let's wait first till it happens.

But this code example here does exactly a sync allocation between CAST and UNCAST! (list_new calls m_new_obj). There are going to be lots of non-obvious bugs introduced by such a change.

This change is null (CAST and UNCAST will be identity for the current codebase of course). This change is intended to allow to experiment with other allocation/gc schemes and identify issues with them, instead of speculating what they may be. It would be nice to have just drop-in patch which at once implements concurrent compacting gc, but I don't think that's realistic (and even if it happens - somebody does it as part of dayjob for example - it may turn out that such patch is not easily mergeable because reshuffles a lot of core code in unpredictable way). I also have a lot of experimental stuff locally, but it's at least clustered. These crumb-like changes are almost impossible to maintain in a branch, so I'd like to have them in mainline.

it may turn out that such patch is not easily mergeable because reshuffles a lot of core code in unpredictable way

I've had a thought about how to deal with this and related issues of configurability for a while, and now seems like a good time to mention it: ultimately I would rewrite uPy at a higher level so that it becomes a description of Python rather than an implementation. Then also write a translator to translate this description to C (or any other language). This allows to you describe exactly what you want to do (eg get a pointer for writing, critical section for GC, write barrier for GC, critical section for multiprocessing, global state access, etc etc) and would allow many implementations to be generated, each with their own compromises/features. It would allow to have highly configurable features, retain current implementation, and allow to experiment with lots of different ways of doing things. This is my dream :)

That is a really neat idea!

This is my dream :)

Sounds nice for a dream, but this would be really long way. And all efficient things appear to be hand-crafted. Also, isn't PyPy does just that - a description of language (not necessarily Python) which can be parametrized, and for which efficient interpreter/compiler are generated (taking several gigabytes and dozen of minutes in the process).

To start with such a meta-python doesn't have to be anything more than the current uPy source code which is preprocessed, with the preprocessing doing almost nothing to begin with. Then we add things as needed. Eg we can resolve #704 by translating all external API names in the processor. And we can add a first argument to all functions that is the global state. Sooner or later we need to do these things and a preprocessor is the only way I can think of doing it properly.

pfalcon · 2015-03-20T00:32:33Z

Yes, there're definitely can be many uses for it. I had in mind exactly the opposite example of yours - to have 16-bit mp_obj_t for small-heap 32-bit systems. And double pointer indirection movable heap.

So, I assume the general scheme is ok and naming is sane too. I then proceed further (not going to convert all in once, but gradually proceed file by file will give warm feeling of progress ;-) ).

dpgeorge · 2015-03-20T12:45:53Z

I had in mind exactly the opposite example of yours - to have 16-bit mp_obj_t for small-heap 32-bit systems.

Nice idea, it would save a lot of RAM on MCUs with <64k heap, but then how do you "point" to ROM objects? You would need some flag in the 16-bit mp_obj_t to indicate it's a ROM object, and then you'd need a lot of magic to enumarte the ROM objects...

So, I assume the general scheme is ok and naming is sane too.

I was thinking of using MP_OBJ_PTR_VALUE and MP_OBJ_NEW_PTR, to match with the SMALL_INT and QSTR versions. It makes it clear then what you are casting to/uncasting from.

Note that we might also need separate macros for ROM versions of ptrs, eg MP_OBJ_NEW_ROM_PTR.

pfalcon · 2015-03-20T13:03:44Z

but then how do you "point" to ROM objects?

No idea so far, would need to experiment ;-).

I was thinking of using MP_OBJ_PTR_VALUE and MP_OBJ_NEW_PTR, to match with the SMALL_INT and QSTR versions. It makes it clear then what you are casting to/uncasting from.

Let's sync this with what's discussed in #1161. So, we need to pairs of macros - one to work with objects, another - with "raw memory". And they will appear very often, so would be nice to have them shorter rather than longer. I thought about MP_OBJ_CAST/MP_OBJ_UNCAST & MP_RAW_CAST/MP_RAW_UNCAST.

To address issue #699.

coveralls · 2015-03-20T22:24:00Z

Coverage increased (+0.0%) to 90.07% when pulling 89221c57d0312742833a4fd88a9d663e638c4d02 on pfalcon:mp-obj-cast into 69922c6 on micropython:master.

pfalcon · 2015-03-20T22:35:59Z

Ok, so this now compiles and represents complete conversion of objlist.c. Completeness was tested by defining mp_obj_t to struct foo*. A typesafety error was found during conversion: 69922c6

Note that these patches assume that memory allocation functions return mp_obj_t, which hopefully very reasonable. But then m_new_obj() and friends should have "prototype" return mp_obj_t, which cannot be done now due to need to include obj.h into misc.h, which is layering violation. So, I propose (as one of next steps) to move those funcs either to obj.h, or to a separate header like alloc.h. That step is also on critical path for #1161 too.

I left macro namings per my original proposal and per argument in #1166 (comment) . @dpgeorge , feel free to insist them to be changed, but then please provide naming for "raw memory" accessor counterparts.

dpgeorge · 2015-03-20T22:41:59Z

py/mpconfig.h

@@ -644,6 +644,16 @@ typedef double mp_float_t;
 #define MP_PLAT_FREE_EXEC(ptr, size) m_del(byte, ptr, size)
 #endif

+// Cast mp_obj_t to object pointer
+#ifndef MP_OBJ_CAST


I don't think these should be here. They should be in obj.h. They are not for a port to configure, but rather for uPy itself to configure based on what object model is being used.

but rather for uPy itself to configure based on what object model is being used.

... which is itself might be configured by port. Of course, there's a long way to GC method being configurable per port, though that can be start.

Other reason I put it to mpconfig.h was that it's wrapped in #ifndef. Are you ok to have them in obj.h wrapped like that?

coveralls · 2015-03-20T23:33:02Z

Coverage increased (+0.0%) to 90.08% when pulling 24e4a64ddd015ac2a83310b49d9e834bcfbb4ff7 on pfalcon:mp-obj-cast into 3cc17c6 on micropython:master.

mp_obj_t internal representation doesn't have to be a pointer to object, it can be anything. There's also a support for back-conversion in the form of MP_OBJ_UNCAST. This is kind of optimization/status quo preserver to minimize patching the existing code and avoid doing potentially expensive MP_OBJ_CAST over and over. But then one may imagine implementations where MP_OBJ_UNCAST is very expensive. But such implementations are unlikely interesting in practice.

pfalcon · 2015-03-22T21:02:41Z

Ok, macros moves to obj.h. Hope the patch is ok for merge now.

coveralls · 2015-03-22T21:06:56Z

Coverage increased (+0.0%) to 90.11% when pulling 689b654 on pfalcon:mp-obj-cast into cfe623a on micropython:master.

dpgeorge · 2015-03-24T00:21:14Z

The problem with all this is that it has to also work with ROM objects. Given an "object" (of type mp_obj_t) it could be a small int, a qstr, a handle to RAM, or a handle to ROM. Handles to ROM will pretty much always have to be a direct pointer, due to linker limitations. That means that in general your MP_OBJ_PTR_VALUE macro must distinguish between RAM and ROM and extract the pointer accordingly. And you also will need to have separate constructors for ROM objects, eg: MP_OBJ_NEW_ROM_SMALL_INT, MP_OBJ_NEW_ROM_QSTR and MP_OBJ_NEW_ROM_PTR.

To test this out I tried making mp_obj_t a union of a double and a void* on a 32-bit system. And indeed you get lots of nasty error messages, eg, comparing unions, initialising them as ROM data, etc.

So at this point it's not clear that the approach we have is general enough for realistic applications, at least those beyond something as simple as (ptr + 4).

pfalcon · 2015-03-24T09:52:46Z

or a handle to ROM. Handles to ROM will pretty much always have to be a direct pointer, due to linker limitations.

Well, linker limitation is not CPU architecture limitation, and per the latter, "ROM handle" can be as anything as anything else. For example, index in an indirection table. Or all ROM objects could be packed to together, and with 32K heap, both of them can be referenced with 16-bit mp_obj_t. Or whatever.

your MP_OBJ_PTR_VALUE macro

You didn't explicitly requested that name to be used, so there's MP_OBJ_CAST, not MP_OBJ_PTR_VALUE.

macro must distinguish between RAM and ROM and extract the pointer accordingly

Yes, override macro for mp_obj_t which doesn't allow to reference all address space needs to do that, yes. But there're no such macros so far.

And you also will need to have separate constructors for ROM objects, eg: MP_OBJ_NEW_ROM_SMALL_INT, MP_OBJ_NEW_ROM_QSTR and MP_OBJ_NEW_ROM_PTR.

That's out of scope of this patch (and general proposed transformation).

So at this point it's not clear that the approach we have is general enough for realistic applications, at least those beyond something as simple as (ptr + 4).

Yes, as was discussed, this patch is just first step on the way which was thought about a year ago, but had no progress. This patch's aim is to get away from flat mess o'pointers to a more structured way, where "object handles" are differentiated from "object pointers" on source code level. The first are stored inside objects, the second are used to actually access objects in memory. This patch allows to handle some scenarios of when handle and pointer have different representation, not all of them. More advanced scenarios will require more patches. But it's pretty clear that any advancement in the area of memory management depends on differentiating handles and pointers, and that's exactly what this patch approaches.

pfalcon · 2015-03-25T07:27:28Z

Ok, hope all these arguments are convincing, merging. Macro naming can always be tweaked later.

dpgeorge · 2015-03-27T23:37:47Z

Well, linker limitation is not CPU architecture limitation,

Right, but linker limitation is a real-world limitation. Eg some archs allow ptr&4 as const data while others don't.

For example, index in an indirection table.

This is probably doable, but would need something like mechanism we're using for qstrs to make it work without help from the linker.

You didn't explicitly requested that name to be used, so there's MP_OBJ_CAST, not MP_OBJ_PTR_VALUE.

There was never anything decided, it was still an open discussion! And I get confused by your notation: to me MP_OBJ_CAST means cast<mp_obj>(...) when in fact your meaning is the opposite. Also the notion of "uncast" is confusing (similar to double negative) because you can cast from A to B, but going from B to A is also a cast.

And you also will need to have separate constructors for ROM objects, eg: MP_OBJ_NEW_ROM_SMALL_INT, MP_OBJ_NEW_ROM_QSTR and MP_OBJ_NEW_ROM_PTR.

That's out of scope of this patch (and general proposed transformation).

But you need to think about how to deal with this: string and dict/map objects have pointer to raw data, and ROM versions of these objects are created in the code, so how is that going to work?

But it's pretty clear that any advancement in the area of memory management depends on differentiating handles and pointers, and that's exactly what this patch approaches.

Agreed. But the ROM issue is just as important as the RAM issue IMHO.

pfalcon · 2015-03-27T23:59:15Z

There was never anything decided, it was still an open discussion!

Well, good news is that now changing that is just matter of blind search and replace. Whereas adding annotation is matter of pretty careful code review and not exactly trivial transformations. A week before this, I spent a lot of time reading GC papers (with emphasis on defragmenting heap). By now I forgot most of the details. If we had such annotations before, I could instead play with some ideas. So, I forgot details of those advanced GCs, but remembered that we need those annotations. But I could forgot that too if not proceed with it (and I wanted to play with gc a year ago, and had bunch of various ideas over that time, most forgotten.

And I get confused by your notation:

Logic is very simple: macros which deal with mp_obj_t have MP_OBJ_ prefix. Then "forward" operation is certainly converting mp_obj_t to concrete object. That's why it's called "cast". The opposite op is thus called "uncast". All that governed by requirement of being short.

But well, looking from outside, you certainly see some things clearer, so if you think longer, but more descriptive names would be better, ok, let's go for them.

pfalcon · 2015-03-28T00:07:15Z

But the ROM issue is just as important as the RAM issue IMHO.

Yes. but it's separate and orthogonal issues. As I said at the beginning, I don't look to provide a complete solution, I look towards making few steps in the right direction, with hundreds more required for some "complete solution".

dpgeorge · 2015-03-28T00:23:05Z

I look towards making few steps in the right direction

Ok! I'll also see what it takes to get a full solution. My aim is for 64-bit objects on 32-bit system.

Reorder status matrix and module list in shared-bindings for increase…

dpgeorge reviewed Mar 19, 2015
View reviewed changes

pfalcon referenced this pull request Mar 20, 2015

unix: Enable extra compiler warnings.

92496ab

To address issue #699.

pfalcon force-pushed the mp-obj-cast branch from e9ba28c to 89221c5 Compare March 20, 2015 22:11

dpgeorge reviewed Mar 20, 2015
View reviewed changes

pfalcon force-pushed the mp-obj-cast branch from 89221c5 to 24e4a64 Compare March 20, 2015 23:24

pfalcon force-pushed the mp-obj-cast branch from 24e4a64 to 689b654 Compare March 22, 2015 20:56

pfalcon closed this Mar 25, 2015

pfalcon deleted the mp-obj-cast branch March 25, 2015 07:30

tannewt added a commit to tannewt/circuitpython that referenced this pull request Sep 7, 2018

Merge pull request micropython#1166 from dfarning/clean-up-docs

06a610f

Reorder status matrix and module list in shared-bindings for increase…

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py: Cast mp_obj_t to concrete types explicitly. #1166

py: Cast mp_obj_t to concrete types explicitly. #1166

pfalcon commented Mar 19, 2015

pfalcon commented Mar 19, 2015

dpgeorge commented Mar 19, 2015

dpgeorge Mar 19, 2015

pfalcon Mar 20, 2015

dpgeorge Mar 20, 2015

pfalcon Mar 20, 2015

dpgeorge Mar 20, 2015

stinos Mar 20, 2015

danicampora Mar 20, 2015

pfalcon Mar 20, 2015

dpgeorge Mar 20, 2015

pfalcon commented Mar 20, 2015

dpgeorge commented Mar 20, 2015

pfalcon commented Mar 20, 2015

coveralls commented Mar 20, 2015

pfalcon commented Mar 20, 2015

dpgeorge Mar 20, 2015

pfalcon Mar 20, 2015

coveralls commented Mar 20, 2015

pfalcon commented Mar 22, 2015

coveralls commented Mar 22, 2015

dpgeorge commented Mar 24, 2015

pfalcon commented Mar 24, 2015

pfalcon commented Mar 25, 2015

dpgeorge commented Mar 27, 2015

pfalcon commented Mar 27, 2015

pfalcon commented Mar 28, 2015

dpgeorge commented Mar 28, 2015

py: Cast mp_obj_t to concrete types explicitly. #1166

py: Cast mp_obj_t to concrete types explicitly. #1166

Conversation

pfalcon commented Mar 19, 2015

pfalcon commented Mar 19, 2015

dpgeorge commented Mar 19, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pfalcon commented Mar 20, 2015

dpgeorge commented Mar 20, 2015

pfalcon commented Mar 20, 2015

coveralls commented Mar 20, 2015

pfalcon commented Mar 20, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Mar 20, 2015

pfalcon commented Mar 22, 2015

coveralls commented Mar 22, 2015

dpgeorge commented Mar 24, 2015

pfalcon commented Mar 24, 2015

pfalcon commented Mar 25, 2015

dpgeorge commented Mar 27, 2015

pfalcon commented Mar 27, 2015

pfalcon commented Mar 28, 2015

dpgeorge commented Mar 28, 2015