All Topics

Blog Post

#1517 Pod Repo Design

brian Fri 29 Apr 2011

This post lays out the proposed design for centralized online pod repositories including:

basic concepts for pod repos
fanr tool for installing pods from a remote repo
new Fantom APIs for working with repos
REST API for communication with a repo over HTTP
custom install/uninstall scripts
documentation and search web UI

Basic Concepts

A pod repo is a repository of multiple Fantom pods, where each pod has one more versions. Although in context the we'll use the simple term repo. A pod file is uniquely keyed by its unique name a unique version. There can only be only one version of a pod with a given version number. A repository is essentially a "database" of pod files we can query, download, or upload.

A repo can be implemented many ways, but for now we are primarily interested in two types of repos:

FileRepo: a repo local to the current machine hosted by the file system
HttpRepo: a repo on the network we access over the HTTP protocol

When we wish to discuss how to install pods from a repo, we'll use these terms:

repo: a single repo I used to query and install the pods
env: my local Fantom environment

To keep the design simple, we'll assume a simple cardinality of one-to-one, env-to-repo for install operations. This means for any given fanr operation, we have exactly one repo and one env. The env will be often be determined implicitly by the installation hosting the fanr tool. The repo will probably often be configured to some reasonable default, but could be specified on a per operation basis with a command line option. In the future, you could easily imagine something like a CompositeRepo implementation that joins together disparate repos into one view (but out of scope for this proposal).

These are the terms we will use for typical repo operations:

query: finding which new pods or new versions are available for installation
install: copy/download a pod from the repo to my env and ensure depends are met
uninstall: remove a pod from my env and ensure depends are maintained
publish: copy/upload a pod from my local env to the repo

These basic concepts are inspired mostly from how Mercurial works. At any one time, I am updating or committing my working directory with a single repo. But over the course of development I might pull/push between multiple repos locally on my file system over a network protocol such as HTTP or SSH. Although I don't want this comparison to be taken too far because the two problems are different enough that its apples to oranges. In Mercurial we have repo-to-repo (pull/push) and repo-to-working (update/commit). But in our case we just have repo-to-env (install/publish):

[repo] --- install --> [env]
[repo] <-- publish --- [env]

There is a concept of pod repo-to-repo synchronization, but I think that use case is mostly when we talk about repository mirroring. This will be easily available with the APIs, but I am not going to discuss it this proposal.

Fanr Tool

There will be a new repo pod in the Fantom core which implements the basic APIs and provides a new tool called fanr. This command line tool will provide access to common operations similar to a tool like Ruby gem. Here is what I am thinking for basic commands:

fanr env <query>            // query locally installed pods
fanr query <query>          // query remote repo
fanr new                    // query new versions of pods I am using
fanr patches                // query new patched versions of pods I am using
fanr versions <pod>         // list versions available of given pod
fanr install <pod>          // install latest version of pod 
fanr install <pod> -v <ver> // install explicit version of pod
fanr uninstall <pod>        // uninstall specified pod from env
fanr publish <pod>          // upload pod from env to repo

Whenever we need to query pods, we'll use a simple query language:

"*"         // list all pods
"foo"       // list specific pod
"foo*"      // list pods with some wildcard

I also plan on some type of simple tag/metadata query syntax too. Something like this:

"vendor:SkyFoundry"  // query pods tagged with vendor = SkyFoundry

But I need to give that aspect more thought. We have a very awesome tag/query model in SkySpark, but its a bit much to apply to this problem.

Basic configuration like default repo and authentication credentials can be configured in "etc/repo/config.props" or there will be command line options for overrides.

Installing to Env

At first I was thinking that we would have an explicit RepoEnv that could load pods straight from the repo database. But now I'm thinking it is better to keep Env and Repo orthogonal. That way you can continue to use PathEnv or you can create new custom Envs.

So when we install a new pod, it will go into the lib directory of Env.work. If you are using PathEnv, this means you can upgrade pods in your boot environment, since the work directory takes higher priority than boot.

Repo API

The new repo pod will define the core Fantom APIs for working with repositories and creating new, custom ones. Basic idea of what I am thinking:

const abstract class Repo
{
 ** Find and create Repo implementation for URI based on its 
 ** scheme (pluggable via indexed props)
 static Repo makeForUri(Uri uri)

 ** List current version of all pods available
  PodSpec[] list()

  ** Find for current version of all pods which match query
  PodSpec[] query(PodQuery q)

 ** Find all versions of the given pod
  PodSpec[] versions(Str podName)
}

const class FileRepo : Repo { new make(File dir) }

const class HttpRepo : Repo { new make(Uri uri) }

const class PodSpec
{
   Str name()
   Version version()
   Depend[] depends()
   Str summary()
   Str:Str meta()
}

REST API

The HttpRep will use make use of a simple, standard REST API:

GET  {base}/query?{query}          // query specific pods
GET  {base}/pod/{name}             // list all versions of pod
GET  {base}/pod/{name}/{version}   // download specific pod version
POST {base}/pod                    // upload/publish pod

In the case where pod specs (name, version, summary, etc) are returned, we'll use a simple text format, probably JSON. Need to think about it a bit more.

Install Scripts

I have deliberately tried to keep everything about this design strictly focused on pods. Pods are nice simple, versionable packages. But since they are zip files they can be useful packages for distributing other things that a Fantom application might need.

To increase the flexibility of using pods for more sophisticated package management such as installing DLLs, Java jars, etc I (tentatively) propose an advanced feature where each pod can declare a "repo/install.fan" and "repo/uninstall.fan" script bundled in its zip file. If fanr detects this script it will run the script on install/uninstall to allow the pod to perform house keeping activities. To keep things sane, we'll restrict these scripts to have not have dependencies on the pods being installed since that could severely complicate the life cycle.

Doc and Search Web UI

My main goal for pod repos (other than easy installation) is to be able to collect a database of pods and treat them as a logical whole when you want to search or review the docs. For example if browsing the fandocs for "tales" and it has a hyperlink to a class in "web", then you want it all to look seamless. And if you search the docs for some term, you want to be able to find it in any pod available in the repo.

Of course the problem with this whole idea is that for a given pod you might have multiple versions. Andy and I have talked about this quite a bit. I think in the future we might need to deal with versions in more sophisticated way, but for the near term we can make a simple compromise: all search indexing and inter-pod hyperlinks will use the current (newest) version of the pod. Once you are looking at given pod's docs, then we can then let you "step back in time" and look at the fandoc/APIs for older versions.

We will be building the web doc and search interface for the fantom.org repo using the new version of our sidewalk web framework stack (which also includes the forum). Eventually this will be open source, but not immediately (it has some dependencies on our commercial software).

Next Steps

Just so everybody is aware of the timeline, I plan on starting this feature next week. So if you have feedback, suggestions, etc please share sooner rather than later. I look forward to what everybody thinks.

qualidafial Fri 29 Apr 2011

Looks good to me.

Since publishing the same artifact repeatedly is presumably idempotent, should that operation be a PUT instead of POST?

Also, for shared repos, how are we going to authenticate and control permissions for who gets to upload which pods?

tactics Fri 29 Apr 2011

Neat :)

brian Sat 30 Apr 2011

Since publishing the same artifact repeatedly is presumably idempotent, should that operation be a PUT instead of POST?

Interesting question actually because the real issue is that we should we allow you to upload a pod version already there? In most cases I think the answer is no, if you need to change what is there, it must be a new version because otherwise you have "two versions of the same version". However, pragmatically speaking sometimes you upload something wrong and want to fix it and know it is pretty safe (I've done on this on occasion with the fantom builds). So what I was thinking was something like this - you can PUT to rewrite a pod, but only if that version has been downloaded less than 10 times or was created less than 1hr ago (or something like that).

Also, for shared repos, how are we going to authenticate and control permissions for who gets to upload which pods?

I think we'll do a basic security mechanism for HTTP that all clients can guarantee, then how security works server side will be a repo specific decision.

We are actually going to run two repos:

For fantom.org we'll host the standard community repo, but probably require a manual review for a user to create a new pod. Then that user will have publish rights for new versions and can choose to grant other users publish permission.

We are also going to run a repo for our commercial product SkySpark which will have security permissions for both install and publish. In fact we'll be rolling this version out first to get a bit of experience.

DanielFath Sat 30 Apr 2011

I think the main question is how will this repo integrate with Maven. I haven't personally used Maven, but wasn't this entire idea based around Maven integration?

tcolar Sat 30 Apr 2011

So how does it deal with java libraries jars used in a pod (if at all)? Could that go either with the pod and / or separately in the repo and be pulled as a dependency. (Example : mysql jar, swt)

It just isn't practical to have those installed separately in the fantom runtime.

tcolar Sat 30 Apr 2011

Sorry I had missed the "Install script" section ... that would do I guess.

peter Sat 30 Apr 2011

Will there be some way of flagging and then querying the platform a pod is available for? Perhaps this information can be provided in the tag/metadata.

buntar Sat 30 Apr 2011

hi I new here and just came by, but something about fanr is a bit unintuitive imo.

The whole point of Pod Repo Design is to manage pods, right? Why calling the main user interface fanr then? I find this irritating, because - if I'm get this right - the cool thing about pods, that you gain a robust, versionated entity and that you can leave the source level behind. Isn't the term fan somehow associated with the source code? Why not call this important tool something like rpod or podlib? regards ben

brian Mon 2 May 2011

wasn't this entire idea based around Maven integration?

It isn't part of my design criteria. To me that is an orthogonal issue. The fact that we have simple, versioned pods with flexible meta-data should make Maven integration possible.

Will there be some way of flagging and then querying the platform a pod is available

My immediate and primary goal is to ensure distribution of Fantom pods which are portable. Although we could easily come up standardized meta keys for platform support etc

Why calling the main user interface fanr then?

The fanr command matches all the other tools fant, fanp, fansh. So I think that terms works. Maybe for debate is what the overall feature is called. I've been using the generic term "repo" or "pod repo" to avoid trying to coin any clever name. If we had a clever name that might be cool.

DanielFath Mon 2 May 2011

If we had a clever name that might be cool.

RepoMan! Kidding of course. I like fanr simple and to the point.

tactics Mon 2 May 2011

Especially with Linux/Mac users, it's not uncommon for users to discover commands through the use of tab-completion. Typing fan<TAB><TAB> shows you all the Fantom tools at once: fan, fant, fanp, fanr. They will see "fant", for example, and wonder, "what is this?" And a fant --help later, they have learned about a new tool.

qualidafial Tue 3 May 2011

So what I was thinking was something like this - you can PUT to rewrite a pod, but only if that version has been downloaded less than 10 times or was created less than 1hr ago (or something like that).

Another option is to allow POST only if the pod is not yet present, and use DELETE to remove the pod, under the constraints you outlined.

jodastephen Wed 4 May 2011

I'd prefer to use a plural for the "collection" level in the RESTful URL, plus I don't see "query" as being acceptable in a RESTful style URL:

GET  {base}/pods?{query}            // query specific pods
GET  {base}/pods/{name}             // list all versions of pod
GET  {base}/pods/{name}/{version}   // download specific pod version
POST {base}/pods                    // upload/publish pod

brian Wed 4 May 2011

I'd prefer to use a plural for the "collection" level in the RESTful URL

We use single form pretty much across the board. Consider this site, we use topic/xxx, not topics/xxx.

These URIs will probably change around a bit anyways though. I am working on a single query syntax that can be used across the board for all pod/version listings which might simplify a lot of things by moving all the complexity into the query grammar. Will post on a separate topic later today or tomorrow.

tonsky Thu 5 May 2011

Why not unite queries and pod retrieving by version? Imagine that pod has following versions released:

1.0.0
...
1.0.20

1.1.0

2.0.0
2.0.1
2.0.2

2.1.0
...
2.1.7

And queries will be resolved as follows:

/pod/?{name}           → latest version             (2.1.7)
/pod/?{name}==1.0.16   → exactly this version       (1.0.16)
/pod/?{name}==1.0      → latest of 1.0 version      (1.0.20)
/pod/?{name}>=2.1      → 2.1 or greater             (2.1.7)
/pod/?{name}>2.0       → anything greater than 2.0  (2.1.7)

Here /pod/ uri will always return body of one single version of a pod (latest meeting query criteria).

Exactly the same format can be used for querying repo for pods meta-info:

/pods/?{name}           → all versions
/pods/?{name}==1.0.16   → exactly this version
/pods/?{name}>2.0       → all versions greater than 2.0 (2.1.0 ... 2.1.7)

/pods/ uri will return json with meta-info for all versions meeting the criteria.

Query syntax matches the one used in pip. I think it’s pretty nice, but it will require url encoding, so maybe we should replace it with smth literal, like /pod/?{name}=ge1.6.0

And last, but not least, it would be marvelous if we’ll be able to define pod’s dependencies in exactly the same syntax:

depends = ["sys==1.0",
           "build",
           "spectre>=0.8"]

tonsky Thu 5 May 2011

Also, I’m not sure that we need to query pods using wildcards or tags — can somebody give me a case?

I can think about pods search on repo web-interface, but it must be full-featured search by pods’ names and descriptions.

brian Thu 5 May 2011

Why not unite queries and pod retrieving by version?

I am going to unite them into a single query syntax that will support options for querying by name, different versions matching a depend, or by meta data. Hope to post design later this morning.

Also, I’m not sure that we need to query pods using wildcards or tags — can somebody give me a case?

Whatever we might decide for pod naming convention, I think the beginning of the pod will always be useful prefix to wildcard. Or another use case is where you want to build a screen like Ruby gems with you can navigate all pods by their first letter, etc. I don't think it is strictly necessary, but since it is fairly easy to add into the query language, I think we should go ahead and do it.

yachris Thu 5 May 2011

One thing that the new(ish) Gemfile added to Rails was a matching operator (that I can't remember the syntax of offhand, sorry) which lets you say, "Any release of 2.0". So to tonsky's examples:

/pod/?{name}>=2.1      → 2.1 or greater             (2.1.7)
/pod/?{name}>2.0       → anything greater than 2.0  (2.1.7)

we could add (say)

/pod/?{name}>~2.0      → anything within the 2.0 release  (2.0.2)

That is, it picks the latest 2.0 release, but won't go up to 2.1 or beyond. The idea is that there may be breaking changes in 2.1, but we still want to pull in any bugfixes on the 2.0 release trail.

tonsky Thu 5 May 2011

@yachris, in my syntax it will simply be ==2.0 (not specifying bugfix version means that it should pick the latest one)

brian Thu 5 May 2011

My plan was to largely reuse Depend syntax where 2.0 matches anything b/w 2.0.0 inclusive and 2.1 exclusive. If version is unspecified it defaults to current.