#344 Bug: Concurrency problem in Thread

cgrinds Mon 25 Aug 2008

I've been writing a Fan based solution to the Santa Claus problem, which I plan on posting for comment soon. While writing a solution to this problem I ran into a serious threading problem. The gist of the problem is pretty simple: I have 2 actors communicating with each other via sendAsync, more or less ping/ponging messages between the two. The main thread joins on one of the actors to keep the program from exiting. The bug is one of the actors sends a message to the other actor but it doesn't receive it.

I was able to track down the problem and fix it. The issue is the use of notify in enqueue. As I'm sure you know, using notify instead of notifyAll can be dicey unless all waiting threads are waiting on the same condition. In this case they aren't - the main thread that joined isn't waiting on the same condition. In my case, the notify in enqueue was waking the main thread that was waiting on the state == RUNNING condition.

Here's a sample that recreates the problem.

class ThreadBug {
  static Void main() {
      ThreadBug().go
  }  
  Void go() {
    sole := Sole().start
    edna := sole.peer.start
    sole.sendAsync(2)
    sole.join
  }
}
const class Sole : Thread {
  const Peer peer
  new make() : super.make("sole") {
    peer = Peer("peer", this)
  }
  override Obj run() {
    loop |Obj msg->Obj| {
      echo("sole: received $msg")
      if(msg is Int)
        peer.sendAsync([this, msg].toImmutable)
      else {
        Int count := msg->get(1) as Int
        if(count <= 0) return this
        peer.sendAsync([this, count - 1].toImmutable)
      }
      return this
    }
    return null
  }
}

const class Peer : Thread {
  const Sole sole
  new make(Str name, Sole sole) : super.make(name) {
    this.sole = sole
  }
  override Obj run() {
    loop |Obj msg->Obj| {
      echo("$name: received $msg")
      Int count := msg->get(1) as Int
      if(count <= 0) return this
      sole.sendAsync([this, count - 1].toImmutable)
      return msg
    }
    return this
  }
}

The correct output should be:

sole: received 2
peer: received [sole, 2]
sole: received [peer, 1]
peer: received [sole, 0]

with the existing version of 1.0.31 the output instead is"

sole: received 2
peer: received [sole, 2]

My fix is to change the notify in enqueue() to notifyAll(). The notify in dequeue() should be changed too. I didn't check all the uses of notify() in Thread.java but suspect they should be changed too.

=== modified file 'Thread.java'
--- Thread.java	2008-07-27 08:19:25 +0000
+++ Thread.java	2008-08-25 03:17:48 +0000
@@ -725,7 +725,7 @@
     size--;
 
     // notify enqueue (if blocked on max queue)
-    notify();
+    notifyAll();
 
     // return message
     return m;
@@ -748,7 +748,7 @@
     if (size > peek) peek = size;
 
     // notify get thread
-    notify();
+    notifyAll();
   }

brian Mon 25 Aug 2008

Chris,

I agree completely with your evaluation of the problem - we definitely shouldn't be using notify with two different wait conditions. A case of me being a little too eager for optimization. I checked your patches in. Thanks a lot for taking the time to dig into it!

Login or Signup to reply.