#550 Localization enhancements

brian Sun 26 Apr 2009

Way back when Andy and I talked about a painless way to use localized strings via interpolation. In order to make localization easy to both read/write, I propose an extension to interpolation which takes a localization key rather than a variable:

// string literal        =>  translates into
"<h1>$^title</h1>"       =>  "<h1>" + EnclosingType#.get("title") + "</h1>"
"<h1>$^pod::title</h1>"  =>  "<h1>" + Pod.find("pod").get("title") + "</h1>"

The abstract format is $^key for a localization string in the current pod or $^podName::key for a fully qualified localization string. I am using the caret just an example, other options might be:

"<h1>$~key</h1>"
"<h1>$!key</h1>"
"<h1>$[key]</h1>"
"<h1>$=key</h1>"
"<h1>$@key</h1>"
"<h1>$:key</h1>"

I think the caret looks good up next to the dollar sign, but suggestions welcome.

Another feature which we don't currently support is the ability to insert variables into the localization string. I propose an extension to the existing APIs to pass an array of positional arguments:

// existing methods
Str? Locale.get(Str pod, Str key, Str? def := "key")
Str? Pod.get(Str key, Str? def := "key")
Str? Type.get(Str key, Str? def := "key")

// proposed signature:
Str? Locale.get(Str pod, Str key, Str? def := "key", Obj[]? args := null)
Str? Pod.get(Str key, Str? def := "key", Obj[]? args := null)
Str? Type.get(Str key, Str? def := "key", Obj[]? args := null)

For example:

// localization props file
cannotOpenFile=Cannot open file '$0' (error code $1)

// usage
filename := "somefile.txt"
errCode := 404
type.get("cannotOpenFile", null, [filename, errCode])

// outputs
Cannot open file 'somefile.txt' (error code 404)

I might switch around def/args, the way that signature works today is that null might be a valid value, yet the default if not found is to return the fully qualified key (which makes debugging easy).

tactics Mon 27 Apr 2009

I don't have an opinion on this feature overall, but as long as you're messing around with string interp again, I want to bring up URL interp. The compiler currently disallows it, but throws an error suggesting it will eventually be available. I think it would be pretty nice to have.

(I will say I do like the $^ sytnax for this proposal).

brian Mon 27 Apr 2009

I don't have an opinion on this feature overall, but as long as you're messing around with string interp again, I want to bring up URL interp.

Actually I was planning on adding that feature as part of this work (just forgot to mention it). It will work just like string interpolation:

uri := `somedir/${typeName}.fan`

andy Mon 27 Apr 2009

I'd vote for one of these:

"<h1>$^key</h1>"
"<h1>$~key</h1>"
"<h1>$!key</h1>"

jodastephen Mon 27 Apr 2009

What would you think about the idea of providing both the localized key and a default text form? This way, you don't have to break step while coding/reading to go to another file to see what the typical message is.

"<h1>$~{page.title:Home page}</h1>"

JohnDG Mon 27 Apr 2009

And one more feature on Stephen's suggestion:

name := "Bob"
"<h1>$~{page.title:Home page of ${name}}</h1>"

which is translated into:

name := "Bob"
type.get("page.title", "Home page of ${name}", [name])

JohnDG Mon 27 Apr 2009

uri := somedir/${typeName}.fan

Will this do URL escaping?

tactics Mon 27 Apr 2009

Will this do URL escaping?

By URL escaping, you mean this, right:

typeName := "foo/bar"
uri := `somedir/${typeName}.fan`
echo(uri)

Prints:

somedir/foo\/bar.fan

brian Mon 27 Apr 2009

Will this do URL escaping?

My thinking is that URI interpolation doesn't do any escaping, it is simply sugar for:

`a${b}c`  =>  ("a" + b + "c").toUri

Not sure if you are familiar with how I did URIs, but the normalized form used in Fan doesn't use the % encoding, we just backslash escape for general delimiters - for example if pound is used in as a filename instead of the fragment identifier. Take a look at the sys::Uri class header for details.

What would you think about the idea of providing both the localized key and a default text form? This way, you don't have to break step while coding/reading to go to another file to see what the typical message is.

Are you thinking about this as a comment or as the actual default string value?

The problem here is that fundamentally you want to pull strings out-of-band so that they are easily translated into alternate languages. So from a storage perspective it is nice to have them not not embedded in source code (even the defaults). But obviously you can dream of all sorts of tools to help automate the process.

Personally I think from a file storage perspective it makes sense to use indirection and actually store all the localized default strings in a separate file. But since Fan standardizes how this all works, IDEs could do a lot to automate the process and do in-place cross references.

tactics Mon 27 Apr 2009

With the Uri interpolation, I was thinking for strings, it should escape the whole string, but for interpolated URIs, it might make sense to join them together through Uri.plus.

So for example

file := "dir/file.txt"
`http://example.com/$file`

Would be the most likely incorrect URL: http://example.com/dir\/file.txt

While

file := `dir/file.txt`
`http://example.com/$file`

Would be http://example.com/dir/file.txt

jodastephen Mon 27 Apr 2009

Are you thinking about this as a comment or as the actual default string value?

No I mean the actual default value.

I've found that most of the time, the developers will end up writing the default language message, which will be in English for most English speaking developers. What the developers won't be responsible for is translating it to other languages.

As such, its quite a burden, have to create and manage another file as a developer, when you won't ever use it for the translations.

Now, obviously at some point a translator does come in and add in the foreign language text. To do this, there would need to be a tool to extract out the default text into a standard file format.

Note that this approach allows even the default English text to be overridden by the translator if desired. All in all, a powerful (and optional) approach.

brian Mon 27 Apr 2009

I've found that most of the time, the developers will end up writing the default language message, which will be in English for most English speaking developers.

I can see a lot of value in that. Two things which pop into my mind:

  • often I have common localized strings in another pod, so we still need existing mechanism (then we'd have two ways to do it)
  • in the case of Flux, we use the locale files for much more than just a text, for example:
    cut.name=Cut
    cut.icon=fan:/sys/pod/icons/x16/cut.png
    cut.accelerator=Ctrl+X

So I can definitely see the value in doing something like that: relying on the compiler to pull the strings out into locale/en.props. But it adds a bit of complexity and multiple ways to do the same thing, when my first thought is that it could be handled quite elegantly by the IDE.

So I'd like to hear what Andy has to say on the subject.

andy Mon 27 Apr 2009

@jodastephen

I'm sorta on the fence on that one, but my thoughts:

I agree its a burden to require the use of an external file. If it was easier to "inline" localization, people might be more inclined to do things the right way upfront.

But I don't like the idea of having to crawl your source with some tool to find all those localization keys. I feel like having a formal definition of your keys up front in the en.props file is the right way to go.

So having said that, if we wanted to promote the latter, I don't think we should make it easy to do the former.

andy Mon 27 Apr 2009

relying on the compiler to pull the strings out into locale/en.props.

I hadn't thought about that - something like that could be interesting, and would address my gripe of having to use another tool to grep your code.

jodastephen Wed 29 Apr 2009

The reality is that we have no real way of knowing or measuring if embedded defaults would be a good idea or not. I know that for the environment I work in, being able to easily specify the default value would be useful, as developers don't code anything other than the default language/config. But we have lots of non-string config and some of that is rather complex.

As mentioned, this could be a feature of an advanced IDE, especially here where it becomes easy to identify what the keys are.

Maybe the broader concern I have here is that the shortcut only applies for string and URI interpolation. In the applications I build, I have lots of config, and getting access to that is a constant pain. One solution would be language level access to configuration, something like:

icon = URI^cut.icon
alt = Str^cut.alt
siteStartDate = Date^site.startDate

These would just call the type/pod methods to get the config. (There may need to be a mechanism to alter that lookup in the future, to support database-backed config for example)

Furthermore, there is probably a case to provide for optimisation of the conversion of config to any immutable object so it is only done once (as converting it each time, especially for dates is wasteful).

So, given all of this, the question is how far does Fan the language want to go to integrate config with code? Most other languages leave this to libraries. But I do like the integrated approach if complete and done well.

brian Wed 29 Apr 2009

The reality is that we have no real way of knowing or measuring if embedded defaults would be a good idea or not.

I don't either, I can see value in doing it both ways. But since we have to do the external file no matter what, I suggest we do that. Then we can always enhance it to allow inline defaults in the future. I suspect tooling will solve the problem better than a language feature.

So, given all of this, the question is how far does Fan the language want to go to integrate config with code?

I think the answer is definitely yes. Fan isn't an academic language focused on features which make for good PhD papers, but rather features that aid those of us who actually build software systems for a living.

What we have today is:

  • sys::Sys.env: which is kind of like a super System.getProperties() which encapsulates environment variables, Java VM system properties, and lib/sys.props; I expect soon we will enhance sys.props to be "project based"
  • Locale.get, Type.get, Pod.get designed for localized string values (which can easily represent serialized objects too)

So I guess the question is where we might go from there?

JohnDG Wed 29 Apr 2009

I second "inline localization". Developers don't translate text, but they do often provide the English text. And in 95% of cases, that's all that's required, until when and if an application grows to the point where alternate translations are required (and when this does happen, it's usually a royal pain because no one wants to do localization the right way -- it takes too much time and is too painful).

Moreover, an additional advantage is that you could allow interpolation:

name := "foo.txt"
"<h1>$~{err.fopen:Cannot open file ${name}}</h1>"
err.fopen=Kann Datei nicht öffnen '$0'

(Or, if you had runtime access to field names, you could even allow ${name}).

Compiler pulling it into a localization file is one way to do it (one that would make it easy for translators to translate the text, because all the strings actually used in the application are right there), or the specified text could just be used as the default in case there is nothing in the props file.

Login or Signup to reply.