Normally the reason to do a conservative GC is because it makes the VM implementation (and potentially FFI) simpler, not because it’s faster. Also for fun of course. (There are reasons a fully-conservative design might be faster, like not having to having a ton of scanning functions if you have a ton of disparate types, but in general my perception is that production GCs are normally moving to at least some degree, which a fully conservative one can’t be.)
I haven't measured the performance. I would like to. I'm especially interested in comparing it with the current version of the collector. It is now capable of heap compaction which will be the subject of a future article. I'm also interested in knowing how big a problem the conservative part of the collector is. The C stack depth has been minimized significantly since I got rid of the recursive evaluator. I don't think it's a significant problem but I could be wrong.
I need to implement the compiler's profiling functions in lone. Documentation on the matter seems scarce. Not even sure if those instrumentation APIs are stable. I got away with not implementing the ASAN and stack protector since they added flags that make the compiler emit trapping instructions instead of function calls. Profiling is a lot more complex.
Thanks for the feedback. That's due to a viewport configuration meta tag which I added recently in an attempt to make it more responsive in portrait mode. I just reverted it. Should work just like on desktop now.
TL;DR: Conservative collector. Not where I would have taken things, but valid.
Re forcing a register spill: assuming the GC is invoked via an ABI-compliant function call, you don’t actually need to save all the scalar registers manually, only the callee-save ones (as setjmp does). Alternatively, you can make the compiler do the spilling by inserting a function preserving no registers into the call chain—this is spelled __attribute__((preserve_none)) in sufficiently new Clang or GCC, but you can also accomplish this kind of thing on Watcom with #pragma aux, for example.
Re obtaining the top and bottom of the stack: in addition to __builtin_frame_address, there’s also the age-old option of declaring a local and looking at its address, as long as strict aliasing doesn’t bite you. And if you know you’re running on Linux or something sufficiently compatible (as seen from the reference to auxv), you can treat argv as the bottom of the stack for the main thread, as the startup stack on Linux is (IIRC) argv, then the argument strings, then the environment, then the auxiliary vector.
8 comments:
Good? Bad? Doesn't matter as long as you had fun.
Have you tested this GCs performance? Sometimes a baby GC can be fast enough.
Normally the reason to do a conservative GC is because it makes the VM implementation (and potentially FFI) simpler, not because it’s faster. Also for fun of course. (There are reasons a fully-conservative design might be faster, like not having to having a ton of scanning functions if you have a ton of disparate types, but in general my perception is that production GCs are normally moving to at least some degree, which a fully conservative one can’t be.)
Probably bad, right? I did have fun though.
I haven't measured the performance. I would like to. I'm especially interested in comparing it with the current version of the collector. It is now capable of heap compaction which will be the subject of a future article. I'm also interested in knowing how big a problem the conservative part of the collector is. The C stack depth has been minimized significantly since I got rid of the recursive evaluator. I don't think it's a significant problem but I could be wrong.
I need to implement the compiler's profiling functions in lone. Documentation on the matter seems scarce. Not even sure if those instrumentation APIs are stable. I got away with not implementing the ASAN and stack protector since they added flags that make the compiler emit trapping instructions instead of function calls. Profiling is a lot more complex.
I wanted to read this but I couldn’t because of all the allusions in the article that distracted me from the points the author was trying to make.
Yeah it totally grossed me out too, it's quite twisted. Author has serious issues.
On mobile all the text is about 3 times bigger than it needs to be as well making for an obnoxious amount of scrolling. Unreadable code examples.
Thanks for the feedback. That's due to a viewport configuration meta tag which I added recently in an attempt to make it more responsive in portrait mode. I just reverted it. Should work just like on desktop now.
TL;DR: Conservative collector. Not where I would have taken things, but valid.
Re forcing a register spill: assuming the GC is invoked via an ABI-compliant function call, you don’t actually need to save all the scalar registers manually, only the callee-save ones (as setjmp does). Alternatively, you can make the compiler do the spilling by inserting a function preserving no registers into the call chain—this is spelled __attribute__((preserve_none)) in sufficiently new Clang or GCC, but you can also accomplish this kind of thing on Watcom with #pragma aux, for example.
Re obtaining the top and bottom of the stack: in addition to __builtin_frame_address, there’s also the age-old option of declaring a local and looking at its address, as long as strict aliasing doesn’t bite you. And if you know you’re running on Linux or something sufficiently compatible (as seen from the reference to auxv), you can treat argv as the bottom of the stack for the main thread, as the startup stack on Linux is (IIRC) argv, then the argument strings, then the environment, then the auxiliary vector.