All Topics

#1514 Global Pod Naming

brian Thu 28 Apr 2011

As part of the effort to establish a federation of pod repositories, we need to nail down the global namespace. This is really just one issue in the overall architecture, but many previous discussions have resulted in many different opinions and perspectives. So I want to pull this issue out into its own separate pre-discussion.

The fundamental issue is this: pod names define the root of the Fantom namespace, and therefore they must have globally unique names. This is in itself pretty straight forward. Where the debates have been:

how to scope pod names (.com address, project names, etc)
and if the pod name syntax should be expanded to include a separator char to indicate scope hiearchy

The current convention which is described in the Conventions doc says this:

Pod names are lower camel case and globally unique. You should prefix your pod names with something to avoid naming conflicts. For example a SourceForge or Google Code project name is a safe naming convention. Or prefix pods with an organization or domain name. If you own a ".com" domain, don't include the "com" in your pod names.

This is fairly sensible, but a little vague. And more importantly it isn't really being followed by the community. In fact Andy and I are guilty ourselves of not following these conventions ourselves in our SkySpark product.

Andy and I have had several brainstorming on this topic, and I think after going around and around on various options we always land back to the same basic design we are already using today.

To change the pod name syntax to make it explicitly hierarchical is just too much of a breaking change. There is now just a huge amount of infrastructure baked around current conventions of pod names as simple identifiers. All the tools like fan and fant take qnames, serialization uses qnames, and there is already quite a bit of code and other resources which make use of qnames.

Plus given the extensive use of qnames, I'm sure I really want to make core pod names like sys or fwt any longer than they already are. Having short qnames for core types is really quite nice.

But even without an explicit hierarchy separator, we do need to get serious about ensuring our pod names are globally unique. This can be done by using implied hierarchies with camel case prefixes. My proposal for pod naming is as follows:

core pods developed and deployed as part of the Fantom continue to use their existing short names (sys, fwt, util, etc)
prefixes assigned to a specific organization or project own the namespace under that prefix; for example if you own the prefix "fooBar", then you own any pod name which is prefixed with "fooBar" using camel case such as "fooBarBaz", but not "fooBark"
the top-level domain names "com", "org", "net" etc are implicitly owned by the domain owner, for example "comAcme" is owned by whoever owns the "acme.com" domain
projects, products, and organizations can register their own prefixes with us on fantom.org to use shorter names such as "tales", "mustache", or "spectre". I've found many projects do have short, unique names but are not really associated with a meaningful DNS name

So that is what we're thinking. Obviously we can't force you to name your pods any given way. But whatever conventions we decide upon will be forced if you want to publish your pod on fantom.org. Feedback welcome.

tactics Thu 28 Apr 2011

In the scope of the code, we want the pod name to be as short as possible for easy typing.

In the scope of the community, we want the pod name to be as long as possible for uniqueness guarantees.

It's this tension that seems to be causing issues.

In regards to this issue, we can make a few useful parallels in programming languages.

Not every object in a program is important enough to get its own name. For unimportant objects, we have expressions, object literals, and anonymous functions. This relieves us of having to come up with an intelligent name when one isn't needed. (The absence of this would be SSA form used by some compilers).

Similarly, not every named variable is equal in importance. Global names are reserved for important objects that. Their names must be chosen carefully to avoid collisions and end up much longer (TextEditorOptions, mysql_fetch_assoc, FandocIndexToHtmlGenerator). For less important objects, we can fall back on local names. They are non-unique, but they are scoped. They are short and don't require much attention when naming. (For example, x, i, data, and val).

Similarly, with organizing code modules, not every module is particularly important. That is, it isn't useful outside of the context of the project it resides in or to the overall community.

For example, it makes a lot of sense to create a pod for your website's custom ORM library. But it wouldn't make sense use orm as the universal name for it, since it would likely collide with another project's custom orm pod.

An important aspect that seems to be missing from pods is a notion of "scope". Pods provide only a single global scope to organize your code in. This means you have to choose between creating monolithic pods or littering the namespace. Important names (opengl, etc) also get mixed in with unimportant names (my lame, hacked-together orm pod, etc).

Just to restate my stance on the issue, my stance is:

Allow dots in names
Encourage Java-looking pod names (fantom.fwt for FWT)
Rely heavily on sys::Env to resolve pod names in the context of the code (so we can all still just type fwt).

I'm not sure on the details, but the important point are:

Java (and .NET developers?) would feel comfortable with the dot syntax.
It encourages long, unique names.
It allows a hierarchal division of code.
Developers can still use short identifiers in context.

brian Thu 28 Apr 2011

It's this tension that seems to be causing issues.

Excellent observations. I guess I took it for granted that a pod name should be the same across the entire spectrum. But I think it is worth discussion.

Although we could use more complex names only for centralized repos, what really matters in the end is the qname names used by code, dependencies, serialized objects, etc. Qnames get embedded in fcode, string literals, serialized objects, dependency trees, everywhere! You can't really "mask" them with some sys::Env magic.

If we let Org-1 have a pod called "chart" and we let Org-2 have a pod called "chart", then those two pods could never work together in the same environment. And if serialized objects or dependences escaped into the wild, then no one would really know what "chart" pod was used.

So I don't really see how that model could work. It seems to cause way more problems than it solves. Other comments?

msl Thu 28 Apr 2011

What's the smallest change that could work and be enforced for all users? I suspect the answer is to enforce it at the repository level and leave the code side as is.

Given underscores are valid for pod names (not that I'm a big fan (pun unintended) of them) - the repo could enforce msl_cron (<org>_<artifact>) naming and anyone who wants to use an artifact has it named as such. If I want to rename those pods locally for my own purposes then I get what I get when things blow up.

My preference is also for a concise combination of organisation + artifact. Perhaps a first-in-best-dressed approach for which names are used by who (given the community here seems far from unruly, I don't think it'll be a problem). If there is any abuse, it should be pretty easy to identify and clean up.

How that works going forwards: no idea! It's not too difficult to think of someone claiming "oracle" (or even worse "joda") and having no affiliation with them and possible confusion (or even worse security issues) resulting from that.

Policing is always a pain - but then again, so is verification.

I guess it comes down to what the terms of use of the repository are. Is source required (negates security concerns), are contact details mandatory (puts people in touch with people to sort issues) and are names changeable (so if I do innocently claim "joda" I can shift sideways -- and what does this mean to users of the innocent "joda" package?).

brian Thu 28 Apr 2011

If I want to rename those pods locally for my own purposes then I get what I get when things blow up.

The key issue here is that a pod can't just be renamed. All the fcode, docs, and JS inside uses the pod name too. And if you have dependencies they will be using the original pod name too. Pods are like java packages - you can't just rename them and expect everything to continue to work.

I guess it comes down to what the terms of use of the repository are

I think this is a good point, we can have a convention the community uses and you can choose not to adhere to it. But in the end centralized repos like the one we will host here (and hopefully mirror elsewhere) will have some policing if needed. But I think in general people try to avoid naming their projects to avoid confusion so while issues might come up, I would expect them to be rare.

ahhatem Fri 29 Apr 2011

I think we are a little over-complicating things here....

Fantom main pods keep their names...

New pods either have a unique and distinguishable name that is like a brand on its own like tales for example or must use a fooBar_Chart.. and to be added to the repo, you must adhere to this... and the judgement will be subjective of whether the name is actually unique enough to be a brand of its own or it needs prefixing with the company name....

Inside my code I will do whatever I want ... and whether that works or not that is my problem.... that is it!

The only discussion is whether to allow dots or just use underscores... I don't think it will make a huge difference any way... we just need to agree on something.

MoOm Fri 29 Apr 2011

I agree with ahhatem on this (and therefore with Brian). We need to keep things simple.

About "pod-separator", personally, I don't really like the camel-case or the underscore notations. I understand that allowing dots may make the grammar a lot more complex. Can't we find another separator? I propose :, not sure how much it'll make the grammar more complex.

Examples:

using sys
using foo:chart

foo:chart::Chart //qname of type "Chart" from pod "foo:chart"

If we can't find a pod-separator that looks good enough and that doesn't make the grammar a nightmare, I'd prefer the camel-case instead of the underscore notation.

brian Fri 29 Apr 2011

Loving all the feedback!

I think we are a little over-complicating things here

I mostly agree...

First off, many of us in the Fantom community are a bit steeped in Java culture. And lets face it, the Java culture is a bit about "over engineering" things :-) Let's consider a simple example: the Java package name for the Lucene core is "org.apache.lucene".

Now lets consider another community - the RubyGems community. They call their Lucene wrapper gem simply "lucene". They have 10,000s of gems published all using just a simple namespace, most of them all short, simple names.

So I think simple project names is fine and we don't need to over complicate things. Simple names seems to work fine in very large communities like RubyGems, SourceForge, etc.

However, I also think that the idea of scoping sub-pods under a project is an import concept. We already have an example of this today with flux. There is a core pod called "flux" (which seems fine as a global name). And there is sub-pods under that for specific languages, functions. Today we have "fluxText" and I eventually plan to have "fluxAst", "fluxFan", "fluxJs", etc.

So my main point was that as the owner of the name "flux", I implicitly own names under "flux" such as "fluxText". I am not sure if that gets formalized into some actual check on the repository or not, but just a convention that everybody adheres to.

I think for practical purposes we may have an human approval anyways to create a new pod on the repo (at least this one). So issues like this can naturally be handled if an issue arose. Need to think about that some more.

And lets consider the separator issue with the flux case. I like "fluxText" and think that makes a lot more sense than "flux.text" myself. So I personally think this whole idea of separator chars is part of the Java culture which encourages really verbose package names.

alex_panchenko Fri 29 Apr 2011

Ruby, perl, etc - they have a separator, so name hierarchy is clear.

So my main point was that as the owner of the name "flux", I implicitly own names under "flux" such as "fluxText". I am not sure if that gets formalized into some actual check on the repository or not, but just a convention that everybody adheres to.

Can I take the name fluxion?

The camel case convention is fine in the source, but on the file system it should be hierarchic. What if we do the following change when looking for the pod file: fluxTest -> flux_test.pod ?

rfeldman Fri 29 Apr 2011

The camel case does feel a bit too magical. As OO programmers, we're used to separators in our hierarchies, and I don't think that's just a Java thing - I have spent a lot of time writing production Perl code as well.

The idea of camel case implying hierarchy is just alien to me. Camel case has never implicitly held metadata, it's just a stylistic convention.

ahhatem Fri 29 Apr 2011

+1 for having a separator..

flux_whatever is owned by the flux owner... It is more clear this way.... and reduces the likely hood of problems.

So, whatever before the first separator has to be unique...

brian Fri 29 Apr 2011

Can I take the name fluxion?

Sure, that is a different project name than flux.

The camel case convention is fine in the source, but on the file system it should be hierarchic

Why in the world would you want to use two different formats for the same name?

The idea of camel case implying hierarchy is just alien to me

I think this gets down to the core point - the pod namespace is not a hierarchy. It is a flat global, namespace. The only thing we are really trying to figure out is how to manage that flat programmatic namespace. The idea of organizational hierarchy cleanly detailed in a pod name using some separator character seems to have appeal. But for the software perspective "fluxText" vs "flux.text" is just an opaque string identifier.

However there is a hierarchy in Fantom of "pod::Type.slot". So even if we were to pick a separator it can't be dot. I would probably pick "-" for Lisp naming conventions. However the tradeoff is that you can't have qnames in code (other than in using statements) and extra complexity for anyone trying to parse a qname (like the serialization code). I hate the look of flux_text which seems like C code, not Fantom code to me, so not a fan of that approach.

I guess that is why I am not all that hung up the actual pod name, because all the associated meta-data around it and community conventions are really how humans make sense of it all. I am having trouble seeing justification for complicating the grammar and Fantom type namespace for what is essentially a higher level organizational naming problem. Because as soon as pick a separator which is not an identifier, then we have to come up with some translation/escape for Java code, JavaScript code, etc.

ahhatem Fri 29 Apr 2011

flux_text which seems like C code, not Fantom code to me, so not a fan of that approach.

I agree, but actually it looks like the only approach that satisfies the requirements with no complexity....

As much as I hate it..... I will +1 the underscore....

andy Fri 29 Apr 2011

I think this gets down to the core point - the pod namespace is not a hierarchy. It is a flat global, namespace.

An important point I think some may misunderstand.

Can I take the name fluxion?

This brings up an interesting issue tho. Can my "prefix" use camel case? I think we would have to restrict it to all-lower case - to avoid things like this:

fluxIon - fluxIon pod under flux prefix 
fluxion - new project prefix

For me this would be only valid argument for adding a separator character.

DanielFath Fri 29 Apr 2011

If we go the separator route I agree with brian +1 to - separator.

MoOm Fri 29 Apr 2011

The more I think about it, the more I agree with brian that a separator makes no sense as there is no hierarchy between pods.

My choice goes to camel-case notations.

yachris Fri 29 Apr 2011

Is there a case where we'd get screwed by case-insensitive file systems, so "fluxAeon" would become "fluxaeon"?

brian Fri 29 Apr 2011

Is there a case where we'd get screwed by case-insensitive file systems, so "fluxAeon" would become "fluxaeon"?

Well I don't think we would ever allow two pods to differ only by case. And we actually already detect this condition in the JVM runtime (you have to use the proper capitalization):

Fantom Shell v1.0.58 ('?' for help)
fansh> Pod.find("Fwt")
sys::UnknownPodErr: Mismatch case: Fwt != fwt
  fan.sys.Pod.readFPod (Pod.java:164)
  fan.sys.Pod.doFind (Pod.java:65)
  fan.sys.Pod.find (Pod.java:45)

poltomb Fri 29 Apr 2011

You say that you will not allow different pod names that are simply case chagnes of others, but what about the case changes that are completely different words:

Base word
- thesewereat
Option one
- theSewerEat
Option two
- theseWereAt

I know these are pretty pitiful examples, but in more complex names, there may be chance more conflicts.

go4 Sat 30 Apr 2011

It is a flat namespace in code, but in our mind it's not flat. The fluxText, fluxEditor, fluxIcon are considered one group.

I propose 3-part pod name, just like version number format: "major.minor.build.patch."

"prefixName_ProjectName_moduleName"

in fantom

pod.name        => sidewalk_flux_text
pod.prefixName  => sidewalk
pod.projectName => flux
pod.moduleName  => text

qualidafial Sat 30 Apr 2011

In Maven terms, the "prefix name" as you put it is the group ID.

lbertrand Sat 30 Apr 2011

Even if the pod names are a flat global space, we still, as developers, organize our code into hierarchical pods: main pod which is the one we expose and pod dependencies (libraries, internal project dependencies, ...).

I think it will be bad to lose this distinction and there is clearly a link between flux pod and fluxText pod, ... This should be obvious from the name and on the repository.

For libraries, this is fine to leave on the global space as other projects can reuse them. And libraries can depend on other libraries.

But for internal project dependencies, I do not see why they should be exposed. One option is to include it in a bigger pod when deployed from repository. Another one I prefer will be to store them under the main project repository as internal dependencies so no clash can happen with another project.

The line between a library and an internal dependency can be quite blurry sometimes.

jodastephen Wed 4 May 2011

The camel case approach with no further meaning is something I've long criticised for being arcane, difficult to read and hiding meaningful data in string form (Fantom is generally pretty good at expanding meaningful data). I'm not fussed about the separator used, dot, dash, underscore, colon... just not camel case.

If you read the whole debate above, you'll see the tension between "a simple string" and "containing vital information about a hierarchy". Its no coincidence that other module systems have a level for the "organisation" or "group" - Maven most notably. Its an ownership level that provides a vital hook for checks and verification. Its also something enterprises find valuable when evaluating something like Fantom. Note that github added an "organisation" concept after a while. (I also note that the discussions about fluxion and fluxIon being completely different emphasise the importance of the top level).

(To use a module system most effectively, I also would like to see that pod "a.b.c.d" could be restricted to be only used by "a.b" or lower. This would add a lot of power to scoping and access control. This is a side goal, but is impossible to add later if there is no formal separator for pod names)

I don't believe that the reverse domain name "com.foo.bar" approach is necessary, as a single top level name is enough for most use cases, and rarely fought over. Within the top level, the number of levels used is up to the organisation.

However, I do want to strongly recommend promoting "projects" as the top level, not "organisations", "companies" or groups like Apache. Thus, "lucene" should be the top level, not "apache". That is because projects sometimes move locations, or company subdivisions get sold, making the project name easier to manage. It also aids the next point...

The real issue with longer names is being able to treat them in a shorter way. I'm proposing that a pod name MUST have a "project" name as a prefix, even for core pods:

fantom_fantom       // currently "sys"
fantom_email        // currently "email"
joda_time
mustache_mustache
lucene_lucene
lucene_tools_ast    // lucene is the project, tools_ast is the module

Thus, there should be one simple rule to allow a simpler, shorter form. If the "project" name and the remaining "module" name are the same, then the short form SHOULD be used. Thus serialization or code uses "mustache", not "mustache_mustache" (although the long form would not be an error).

fantom_fantom       // referred to in code as "fantom"
fantom_email        // referred to in code as "fantom_email"
joda_time           // referred to in code as "joda_time"
mustache_mustache   // referred to in code as "mustache"
lucene_lucene       // referred to in code as "lucene"
lucene_tools_ast    // referred to in code as "lucene_tools_ast"

In summary, please do the right thing and add another level. As above, it can be just a formal subdivision of the pod name, but it must as a minimum (a) not use camel case, and (b) be used as a filing system hierarchy when actually laying out the repo.

ahhatem Thu 5 May 2011

+1 to what jodastephen said...

except I don't see that fantom_fantom is necessary.... It looks like fantom or sys or even fan_sys would do the job...

Being able to programmatically separate different projects is certainly an asset that will probably help in the repo....