#1752 How to improve reading web pages?

Xan Fri 13 Jan 2012

Hi,

I'm new to fantom language. Yesterday I wrote code that simply puts the size of webpages passed as arguments. This code uses webclient. I think it's too slow, compared for example than scala (I use scalac).

I tested with the list {http://www.yahoo.com http://www.ubuntu.com http://www.fsf.org http://www.redhat.com} pages.

The results are that fantom code spends around 8 seconds and scala spends about 6 seconds. Why scala is faster than fantom? How can I improve this?

The code:

fantom:

using web::WebClient

class HelloWorld {
	static Void main(Str[] args) {
 		try {
 			args.each {
 				response := WebClient(`$it`).getStr.size
 				echo("The size of " + it + " is " + response + " bytes.\n")
 			}
 		}
 		catch (Err e) {
 			echo(e)
 		}
 	}
}

scala:

import java.net.URL
import io.Source

object Spider {
  def main (args: Array[String]) {

	if (args.length > 0) {
		for (arg <- args) {
			val u = new java.net.URL(arg)
        	val in = scala.io.Source.fromURL(u)
			print("Size of: " + arg + " is " + in.getLines.mkString.length + " bytes.\n")
		}
	}
	else
		println("No arguments")
        }
  }

I use: time <command> <list of pages>

Thanks in advance, Xan.

brian Fri 13 Jan 2012

The results are that fantom code spends around 8 seconds and scala spends about 6 seconds. Why scala is faster than fantom? How can I improve this?

Its sort of an odd test because in both cases the actual performance is all buried in low level Java code, not the languages themselves. I would say performance in your case is going to be hugely influenced by internal buffer sizes. You might try switching to a Buf, or even pre-allocating the size of the Buf based on the Content-Length header.

andy Fri 13 Jan 2012

@Xan - just FYI - indent your code samples two spaces to get the nice monospaced formatting you see elsewhere (see docs):

using web::WebClient 

class HelloWorld {
  static Void main(Str[] args) { 
    try { 
      args.each { 
        response := WebClient(`$it`).getStr.size echo(
          "The size of " + it + " is " + response + " >bytes.\n") 
      } 
    } 
    catch (Err e) { echo(e) } 
  } 
}

go4 Sat 14 Jan 2012

I also doubt about this. The draft proxy is very slow on Ajax call. Did you do some buffer for ChunkInStream?

Xan Sat 14 Jan 2012

@andy: updated. Thanks for the comment on the code. I read Fandoc Cheatsheet in te editor and I deduced I have to put ">" in everyline.

@brian: can you explain me that:

I would say performance in your case is going to be hugely influenced by internal buffer sizes

in other words? I'm a newbee in JVM. What happens really internally? Why always scala is faster thanb fantom in my tests? Is scala implicitily use Buffer in my code?

Xan Tue 17 Jan 2012

Anyone?

I want to have very fast reading web pages. How can I improve that code?

Thanks in advance, Xan.

brian Tue 17 Jan 2012

I want to have very fast reading web pages. How can I improve that code?

In the end what matters is what you are going to do with the content. For example if you are going to parse an XML you would end up with something like this:

XParser(WebClient(`http://foo.com/`).getIn).parseDoc

Although you XParser is really just for true XML and won't handle most HTML sites

DanielFath Tue 17 Jan 2012

Yeah, Fantom needs a more relaxed parser. But if you value comformity to Html5 over speed you can use Jsoup using JavaFFI. IIRC that's exactly what Tales does.

Xan Tue 17 Jan 2012

Okay. thanks. Return type (getStr, getBuff or getIn) could also improve it?

Thanks, Xan.

Xan Thu 1 Mar 2012

Returning to my original question: what is the reason that Fantom is slower than Scala in this topic? Really, I'm surprise and really I don't understand it. Can you explain why? I need high performance.

brian Thu 1 Mar 2012

Returning to my original question: what is the reason that Fantom is slower than Scala in this topic?

Your original times were 8sec vs 6sec - there isn't a lot of variation there. Its probably just different code paths, but in the end they are both sourced by java.io.InputStream. If you really are concerned about performance I'd drop to as low-level as possible and use Java FFI directly.

More importantly if you are really concerned with performance, reading an entire stream into memory as a String is probably not what you want. For example, I assume you actually want to do something with this data like parse it, etc?

alex_panchenko Thu 1 Mar 2012

@Xan I afraid you measure JVM startup time and internet connection bandwidth.

Out of curiosity, what do you get with time wget <args> ?

Xan Tue 6 Mar 2012

@alex_panchenko: I use darkhttpd for servir local directory: $ darkhttpd ./ darkhttpd/1.8, copyright (c) 2003-2011 Emil Mikulic. listening on: http://0.0.0.0:8080/

Then, with time wget I get:

$ time wget localhost:8080
--2012-03-06 19:37:12--  http://localhost:8080/
Resolent localhost... ::1, 127.0.0.1
S'està connectant a localhost|::1|:8080... Fallat: S’ha refusat la connexió.
S'està connectant a localhost|127.0.0.1|:8080... conectat.
HTTP: Petició enviada, esperant resposta... 200 OK
Longitud: 595 [text/html]
S'està desant a: «index.html»

100%[===================================================================================================================>] 595         --.-K/s   en 0s      

2012-03-06 19:37:12 (16,3 MB/s) - s'ha desat «index.html» [595/595]


real	0m0.012s
user	0m0.000s
sys	0m0.008s

So very quick:0.012s

With time fan myfile.fan I get:

$ time fan mget-versió-1.fan http://localhost:8080
The size of http://localhost:8080 is 595 bytes.


real	0m4.327s
user	0m4.744s
sys	0m0.248s

and with scala:

$ time scala Spider http://localhost:8080
Size of: http://localhost:8080 is 571 bytes.

real	0m1.123s
user	0m1.056s
sys	0m0.164s

so evidence!. Try this.

Thanks all of you with feed-back.

@brian: I only want to download file. Then I process as another process asynchronously. One "module" is here: simply download as fast as possible an URI from the web. Perpahs VM is the guilty of slower behaviour?

brian Tue 6 Mar 2012

I think as @alex_panchenko mentioned what you are really measuring there is JVM startup for the Scala and Fantom runtimes which probably dwarfs the actual time being spent reading the file over HTTP. I would expect Fantom to have a longer startup time than Scala since we pre-load a lot of stuff.

If you want to measure actual download time, I would recommend:

  1. running thru one WebClient access to ensure everything is loaded into memory
  2. measure inside of Fantom itself
// do something to prep VM
t1 := Duration.now
// do something
t2 := Duration.now
echo((t2-t1).toLocale)

Xan Tue 6 Mar 2012

Mmm... okay I understand now. With localhost test I don't mesure internet connection bandwith but yes the JVM startup.

Your trick is interesting, but I have to adapt to asynchronous case if I use Actors, isn't?

Thanks for all,

alex_panchenko Wed 7 Mar 2012

There is also compilation time. And scala does a trick to improve it, by starting the compile server upon the first execution. So, subsequent scala invocations delegate compilation to the already initialized compiler, that's why the total time is smaller.

Xan Fri 9 Mar 2012

You are right: ~ 160 ms taking localhost home page and 60 ms and 21 ms with Go and D languages, respectively (obvious compiled lang are fasteer than JVM, but it's great performance)

Thanks,

Login or Signup to reply.