#235 WideFinder-2 in Fan (version 2)

brian Wed 4 Jun 2008

My second Fan version of Tim Bray's Wide Finder 2 project splits the program into two threads. One thread reads lines, and the other thread does the totalization and reporting. The two threads communicate via Fan's built-in message passing - in this case the messages are the lines being read. This improvement adds 24 lines to the program.

While not suitable for a 8-core machine, it happens to suit my dual core PC (running XP) to keep both cores humming. When run with Server HotSpot 1.6 this program runs on the 100k dataset in 4.5sec. That is the same speed as the original single threaded program. What is really interesting is that when I drop back to Client HotSpot, this multi-threaded version runs faster at 4.0sec versus 5.5sec for the single threaded version - a 27% speed improvement. I have no idea why Client HotSpot exhibits better multi-threading performance than Server HotSpot. Seems pretty weird to me.

**
** WideFinder2 Version 2 - Two threads (reader, totalizer)
**
class WideFinder
{
  Void main()
  {
    t1 := Duration.now
    worker := Thread.make("totalizer", &workerRun).start
    Sys.args[0].toUri.toFile.eachLine |Str line|
    {
      worker.sendAsync(line)
    }
    worker.sendAsync(null)
    worker.join
    t2 := Duration.now
    echo("Total: ${(t2-t1).toMillis}ms")
  }

  static Void workerRun(Thread t)
  {
    totalizer := Totalizer.make
    t.loop |Obj msg|
    {
      if (msg == null) { totalizer.reports; t.stop }
      else { totalizer.process(msg) }
    }
  }
}

class Totalizer
{
  Str:Int hits    := Str:Int[:] { def = 0 }
  Str:Int bytes   := Str:Int[:] { def = 0 }
  Str:Int s404s   := Str:Int[:] { def = 0 }
  Str:Int clients := Str:Int[:] { def = 0 }
  Str:Int refs    := Str:Int[:] { def = 0 }
  Regex re := Regex.fromStr(r"^/ongoing/When/\d\d\dx/\d\d\d\d/\d\d/\d\d/[^ .]+$")

  Void process(Str line)
  {
    toks := line.split(" ")
    if (toks[5] != "\"GET") return
    client := toks[0]; u := toks[6]; status := toks[8]
    bytes := toks[9];  ref := toks[10]
    if (status == "200") record(client, u, bytes.toInt, ref)
    else if (status == "304") record(client, u, 0, ref)
    else if (status == "404") s404s[u]++
  }

  Void reports()
  {
    report("Top URIs by hit", hits)
    report("Top URIs by Megabytes", bytes, true)
    report("Top 404s", s404s)
    report("Top client addresses", clients)
    report("Top referrers", refs)
  }

  Void record(Str client, Str u, Int size, Str ref)
  {
    bytes[u] += size
    if (!re.matches(u)) return
    hits[u]++
    clients[client]++
    if (ref != "\"-\"" && !ref.contains("http://www.tbray.org/ongoing/"))
      refs[ref[1..-2]]++  // lose the quotes
  }

  Void report(Str label, Str:Int map, Bool isBytes := false)
  {
    // find top 10
    threshold := map.values.sortr[9]
    top := map.findAll |Int v->Bool| { return v >= threshold }
    topKeys := top.keys.sortr |Str a, Str b->Int| { return top[a] <=> top[b] }

    echo(label)
    topKeys.each |Str key, Int index|
    {
      pkey := key.size > 60 ? key[0 .. 59] + "..." : key
      val  := top[key]
      if (isBytes) val /= 1024*1024
      echo("  ${val.toStr.justr(5)} $pkey")
    }
    echo("")
  }
}

Login or Signup to reply.