My second Fan version of Tim Bray's Wide Finder 2 project splits the program into two threads. One thread reads lines, and the other thread does the totalization and reporting. The two threads communicate via Fan's built-in message passing - in this case the messages are the lines being read. This improvement adds 24 lines to the program.
While not suitable for a 8-core machine, it happens to suit my dual core PC (running XP) to keep both cores humming. When run with Server HotSpot 1.6 this program runs on the 100k dataset in 4.5sec. That is the same speed as the original single threaded program. What is really interesting is that when I drop back to Client HotSpot, this multi-threaded version runs faster at 4.0sec versus 5.5sec for the single threaded version - a 27% speed improvement. I have no idea why Client HotSpot exhibits better multi-threading performance than Server HotSpot. Seems pretty weird to me.
brian Wed 4 Jun 2008
My second Fan version of Tim Bray's Wide Finder 2 project splits the program into two threads. One thread reads lines, and the other thread does the totalization and reporting. The two threads communicate via Fan's built-in message passing - in this case the messages are the lines being read. This improvement adds 24 lines to the program.
While not suitable for a 8-core machine, it happens to suit my dual core PC (running XP) to keep both cores humming. When run with Server HotSpot 1.6 this program runs on the 100k dataset in 4.5sec. That is the same speed as the original single threaded program. What is really interesting is that when I drop back to Client HotSpot, this multi-threaded version runs faster at 4.0sec versus 5.5sec for the single threaded version - a 27% speed improvement. I have no idea why Client HotSpot exhibits better multi-threading performance than Server HotSpot. Seems pretty weird to me.
** ** WideFinder2 Version 2 - Two threads (reader, totalizer) ** class WideFinder { Void main() { t1 := Duration.now worker := Thread.make("totalizer", &workerRun).start Sys.args[0].toUri.toFile.eachLine |Str line| { worker.sendAsync(line) } worker.sendAsync(null) worker.join t2 := Duration.now echo("Total: ${(t2-t1).toMillis}ms") } static Void workerRun(Thread t) { totalizer := Totalizer.make t.loop |Obj msg| { if (msg == null) { totalizer.reports; t.stop } else { totalizer.process(msg) } } } } class Totalizer { Str:Int hits := Str:Int[:] { def = 0 } Str:Int bytes := Str:Int[:] { def = 0 } Str:Int s404s := Str:Int[:] { def = 0 } Str:Int clients := Str:Int[:] { def = 0 } Str:Int refs := Str:Int[:] { def = 0 } Regex re := Regex.fromStr(r"^/ongoing/When/\d\d\dx/\d\d\d\d/\d\d/\d\d/[^ .]+$") Void process(Str line) { toks := line.split(" ") if (toks[5] != "\"GET") return client := toks[0]; u := toks[6]; status := toks[8] bytes := toks[9]; ref := toks[10] if (status == "200") record(client, u, bytes.toInt, ref) else if (status == "304") record(client, u, 0, ref) else if (status == "404") s404s[u]++ } Void reports() { report("Top URIs by hit", hits) report("Top URIs by Megabytes", bytes, true) report("Top 404s", s404s) report("Top client addresses", clients) report("Top referrers", refs) } Void record(Str client, Str u, Int size, Str ref) { bytes[u] += size if (!re.matches(u)) return hits[u]++ clients[client]++ if (ref != "\"-\"" && !ref.contains("http://www.tbray.org/ongoing/")) refs[ref[1..-2]]++ // lose the quotes } Void report(Str label, Str:Int map, Bool isBytes := false) { // find top 10 threshold := map.values.sortr[9] top := map.findAll |Int v->Bool| { return v >= threshold } topKeys := top.keys.sortr |Str a, Str b->Int| { return top[a] <=> top[b] } echo(label) topKeys.each |Str key, Int index| { pkey := key.size > 60 ? key[0 .. 59] + "..." : key val := top[key] if (isBytes) val /= 1024*1024 echo(" ${val.toStr.justr(5)} $pkey") } echo("") } }