As part of my delving into the idea of asynchronous process in Fantom (and in general) I happened upon the conclusion that the primary goal of asynchronous processing on a server (like node.js) is to avoid allocating threads (and in particular the memory associated with their stacks) to operations that are waiting for I/O.
I think that the best implementation of a system that doesn't hold onto a system thread while waiting for I/O is in fact a stackless thread model - where the state of the thread is stored on the heap and only what is needed it kept. The drawback of that approach is that if you go all in then performance suffers.
What I've been working on then is a hybrid model where methods that may block for I/O can be run stacklessly in a new thread but other methods called from the stackless ones can use a regular thread.
I'm calling it "green threads" for now, which seems to be the common term for application managed threading.
Currently I have gotten so far as to process this:
using greenthreads
using concurrent
const class TestClass
{
@Green
Void greenMethod()
{
echo("Start")
Actor.sleep(1sec)
echo("End")
}
}
And I added a class GreenThread to create a thread, run the states, handle sleeping, I/O, etc which is kind of long to post here. Note how Actor.sleep is replaced with asyncSleep. My current concept is to have a list of replacements for core methods that are used in the "Green" version of the method.
Any thoughts on this approach?
brianFri 23 Mar 2012
I personally find little appeal in trying to implement things like green threads and continuations in the compiler, when they really belong in the runtime. To me it almost seems better to find a better runtime, rather than force fit this stuff in the JVM.
For example given that Fantom already compiles to JavaScript, has anyone actually tried Fantom in node.js?
Another very interested aspect of Fantom is that we have our bytecode. If we were to do something like C# await, I'd actually see if we tackle that as a special opcode and then delegate the runtime to figure it out (much better than complicating the compiler).
And another really cool idea much be to build a fcode interpreter in Java or another lang. It might not be fast enough for production, but could be a really cool project to explore new async design patterns.
KevinKelleyFri 23 Mar 2012
Dang, and I was just starting to get my head around / like the idea of / using C#'s async/await as a source-level transformation...
marking a method async means, it runs on an actor, which expects to "return" the method's result as the arg to a continuation closure;
marking a call to it as await means, the rest of the block containing the call gets wrapped into a continuation closure that takes the declared return of the async method as an argument.
It sounds like it makes sense. I'm thinking there are issues with mutable, since the continuation block would need access to locals...
Pushing it down to the runtimes is a problem because then you have to wait for the runtimes to implement it, or stop using that runtime. There's a lot of code running on the JVM.
We need something -- the system now, where callbacks get nested deeper and deeper, is this century's spaghetti code: not refactorable, not debuggable, unreadable.
dobesvFri 23 Mar 2012
I agree this would ideally be built into the interpreter/VM/OS but I'm dreaming of using this same model for javascript code too. Might be a bit crazy, though, I don't know.
Currently the advantage of adding this green-threads system on the server side would be to reduce the per-client memory usage of the server because there's a whole stack allocated per thread. In terms of raw I/O performance it might be slower (I did some reading and found some people who measured nio to run more slowly).
I personally had issues with a previous application where threads were adding a lot of memory usage to the application and causing some crashes. However, I suppose it is possible that I could have avoided this by configuring a smaller stack size using -Xss; I never measured how much stack space was truly required for the application. Who knows - a stack 1/10th the size might have don it!
A fully asynchronous system isn't really necessary - it just needs to be "asynchronous enough" that you're not holding threads and memory for really idle threads but you're also not bloating the program with all kinds of callbacks for no reason.
Anyone have good stories and benchmarks showing the benefits of releasing those threads while waiting for I/O, or is it all a guess at this point?
brianFri 23 Mar 2012
I'm personally a bit suspect of this whole async craze. In many cases synchronous threads are fine and can handle a huge volume of I/O. But definitely some use cases with lots many concurrent WebSocket messages etc require async.
In some ways actors are like green threads. Millions of actors may be multiplexed onto a pool of X threads. It is just that unlike ad hoc threading we never have to save the state of the stack, because scheduling only switches threads on message processing boundaries. This is why I really like the actor model as a way to handle concurrent tasks without a lot of complexity.
As Kevin was sort of hinting, I like the idea of how we can leverage the existing actor model to avoid some of the ugly, deep nesting that happens with async callbacks.
that's a beautiful rant. "What was that last part, again?"
dobesvSat 24 Mar 2012
brian:
Asynchronous programming using callbacks is definitely not a clear win. Actors are not any better. Manually converting a linear algorithm to use actors is just as (or even more) painful.
In my mind I would rather have millions of threads than millions of actors (or millions of callbacks registered). It's fine if that threaded code only shares immutable data between threads the way Actors work now. If they are green threads they can be multiplexed onto a smaller thread pool too...
dobesvSat 24 Mar 2012
Anyway, I think what I'll do is build out enough async stuff to try and run some benchmarks. I think having some measurement of this would be better than the rumours and myths I've been working with so far ...
dobesvMon 26 Mar 2012
OK really producing a benchmark is too hard for me to invest in now. One thing I have done is a bunch of reading and I think that except for the most demanding of applications thread-per-connection is fine, and thread-per-request (i.e. no thread for idle clients, like Jetty/Glassfish) is great.
It seems like the actual memory overhead of a thread on a 64-bit system is in the 10s of KiB, maybe 40KiB on my Windows 7 machine (measured approximately by creating ~ 200,000 threads and seeing a resident size of ~ 8GB). So if you want to handle 1 million connections you might need 40GB of RAM just for thread overhead. On a thread-per-request system like Jetty this would mean you actually have 1 million active http requests being processed, so your database backend, CPUs, network, etc. and so on had better have similar capacity.
When you do have the need to have huge numbers of simultaneous connected clients the best way to scale might be to use an external pusub server of some sort and route the millions of connections through there. It could be written in C and optimized to hell while you write nice looking linear code in your "normal" application server code.
dobesv Fri 23 Mar 2012
As part of my delving into the idea of asynchronous process in Fantom (and in general) I happened upon the conclusion that the primary goal of asynchronous processing on a server (like node.js) is to avoid allocating threads (and in particular the memory associated with their stacks) to operations that are waiting for I/O.
I think that the best implementation of a system that doesn't hold onto a system thread while waiting for I/O is in fact a stackless thread model - where the state of the thread is stored on the heap and only what is needed it kept. The drawback of that approach is that if you go all in then performance suffers.
What I've been working on then is a hybrid model where methods that may block for I/O can be run stacklessly in a new thread but other methods called from the stackless ones can use a regular thread.
I'm calling it "green threads" for now, which seems to be the common term for application managed threading.
Currently I have gotten so far as to process this:
So that a method is added like this:
And I added a class GreenThread to create a thread, run the states, handle sleeping, I/O, etc which is kind of long to post here. Note how
Actor.sleep
is replaced withasyncSleep
. My current concept is to have a list of replacements for core methods that are used in the "Green" version of the method.Any thoughts on this approach?
brian Fri 23 Mar 2012
I personally find little appeal in trying to implement things like green threads and continuations in the compiler, when they really belong in the runtime. To me it almost seems better to find a better runtime, rather than force fit this stuff in the JVM.
For example given that Fantom already compiles to JavaScript, has anyone actually tried Fantom in node.js?
Another very interested aspect of Fantom is that we have our bytecode. If we were to do something like C# await, I'd actually see if we tackle that as a special opcode and then delegate the runtime to figure it out (much better than complicating the compiler).
And another really cool idea much be to build a fcode interpreter in Java or another lang. It might not be fast enough for production, but could be a really cool project to explore new async design patterns.
KevinKelley Fri 23 Mar 2012
Dang, and I was just starting to get my head around / like the idea of / using C#'s async/await as a source-level transformation...
It sounds like it makes sense. I'm thinking there are issues with mutable, since the continuation block would need access to locals...
Pushing it down to the runtimes
is a problem because then you have to wait for the runtimes to implement it, or stop using that runtime. There's a lot of code running on the JVM.We need something -- the system now, where callbacks get nested deeper and deeper, is this century's spaghetti code: not refactorable, not debuggable, unreadable.
dobesv Fri 23 Mar 2012
I agree this would ideally be built into the interpreter/VM/OS but I'm dreaming of using this same model for javascript code too. Might be a bit crazy, though, I don't know.
Currently the advantage of adding this green-threads system on the server side would be to reduce the per-client memory usage of the server because there's a whole stack allocated per thread. In terms of raw I/O performance it might be slower (I did some reading and found some people who measured nio to run more slowly).
I personally had issues with a previous application where threads were adding a lot of memory usage to the application and causing some crashes. However, I suppose it is possible that I could have avoided this by configuring a smaller stack size using
-Xss
; I never measured how much stack space was truly required for the application. Who knows - a stack 1/10th the size might have don it!A fully asynchronous system isn't really necessary - it just needs to be "asynchronous enough" that you're not holding threads and memory for really idle threads but you're also not bloating the program with all kinds of callbacks for no reason.
Anyone have good stories and benchmarks showing the benefits of releasing those threads while waiting for I/O, or is it all a guess at this point?
brian Fri 23 Mar 2012
I'm personally a bit suspect of this whole async craze. In many cases synchronous threads are fine and can handle a huge volume of I/O. But definitely some use cases with lots many concurrent WebSocket messages etc require async.
In some ways actors are like green threads. Millions of actors may be multiplexed onto a pool of X threads. It is just that unlike ad hoc threading we never have to save the state of the stack, because scheduling only switches threads on message processing boundaries. This is why I really like the actor model as a way to handle concurrent tasks without a lot of complexity.
As Kevin was sort of hinting, I like the idea of how we can leverage the existing actor model to avoid some of the ugly, deep nesting that happens with async callbacks.
qualidafial Fri 23 Mar 2012
A propos: "This is where I typically stab myself repeatedly in the ears with a fork until I stop hearing you." :D
KevinKelley Fri 23 Mar 2012
that's a beautiful rant. "What was that last part, again?"
dobesv Sat 24 Mar 2012
brian:
Asynchronous programming using callbacks is definitely not a clear win. Actors are not any better. Manually converting a linear algorithm to use actors is just as (or even more) painful.
In my mind I would rather have millions of threads than millions of actors (or millions of callbacks registered). It's fine if that threaded code only shares immutable data between threads the way Actors work now. If they are green threads they can be multiplexed onto a smaller thread pool too...
dobesv Sat 24 Mar 2012
Anyway, I think what I'll do is build out enough async stuff to try and run some benchmarks. I think having some measurement of this would be better than the rumours and myths I've been working with so far ...
dobesv Mon 26 Mar 2012
OK really producing a benchmark is too hard for me to invest in now. One thing I have done is a bunch of reading and I think that except for the most demanding of applications thread-per-connection is fine, and thread-per-request (i.e. no thread for idle clients, like Jetty/Glassfish) is great.
It seems like the actual memory overhead of a thread on a 64-bit system is in the 10s of KiB, maybe 40KiB on my Windows 7 machine (measured approximately by creating ~ 200,000 threads and seeing a resident size of ~ 8GB). So if you want to handle 1 million connections you might need 40GB of RAM just for thread overhead. On a thread-per-request system like Jetty this would mean you actually have 1 million active http requests being processed, so your database backend, CPUs, network, etc. and so on had better have similar capacity.
When you do have the need to have huge numbers of simultaneous connected clients the best way to scale might be to use an external pusub server of some sort and route the millions of connections through there. It could be written in C and optimized to hell while you write nice looking linear code in your "normal" application server code.