Front page

Making the "git" action in ick more useful

d20aa3eb7bb042c2ba3912f8086deef7
STANDARD APOLLO REFORM

From: Lars Wirzenius <liw@liw.fi>
Date: Mon, 2 Jul 2018 18:24:48 +0300

   The current version of ick has a "git action", which works like this:
   
   * requires the project parameters git_url, git_ref, git_dir
   
   * is run on the host, not in a container, because it uses the host's
     (which is a worker) ssh key to access the git server, if access is
     via ssh
   
   * if the directory named in git_dir does NOT exist, effecitvely runs
     "git clone $git_url $git_dir" (except without shell quoting issues)
   
   * if the git_dir directory DOES exist, runs, in that directory:
     "git remote update --prune"
   
   * the above commands do all the network operations; the user is then
     expected to run other commands, inside the container, to do anything
     else to access the right ref ("git checkout $git_ref" etc)
   
   
   This design means that only the parts that are absolutely necessary to
   run outside the container are done there, and the user has the
   flexibility to customise the rest. The goal is for ick to provide a
   default "prepare_workspace" pipeline that does all the things that
   typical users require, so each user is not forced to reinvent a
   triangular wheel, loosely attached.
   
   However, the above design isn't flexible enough for Daniel, who needs
   to access multiple git repositories, not just once, and build a tree
   of clones. I'll let Daniel tell us what his use-case is like. (I think
   I know, but it's better if he tells it himself. I can be a bit of a
   broken telephone.)
   
   Let the Brainst Orming begin. (See description of this product:
   https://www.amazon.de/dp/B00UWNJ9OI/)
   
   -- 
   I want to build worthwhile things that might last. --joeyh
From: Daniel Silverstone <dsilvers@digital-scurf.org>
Date: Tue, 3 Jul 2018 13:28:26 +0100

   On Mon, Jul 02, 2018 at 18:24:48 +0300, Lars Wirzenius wrote:
   > However, the above design isn't flexible enough for Daniel, who needs
   > to access multiple git repositories, not just once, and build a tree
   > of clones. I'll let Daniel tell us what his use-case is like. (I think
   > I know, but it's better if he tells it himself. I can be a bit of a
   > broken telephone.)
   
   So I have a couple of personal use-cases.  One is covered by the above
   and one which isn't.  I have a third use-case which is so "out there"
   that I doubt we'll be able to cover it usefully, but I'll mention it
   just in case :-)
   
   So, my primary non-covered use case is for building packages for Gitano
   and its related projects.  To do this, two git repositories have to be
   checked out which are project-related, and one which is shared tooling
   for building the packages.
   
   I have many projects, luxio, lua-scrypt, clod, gall, supple, lace, tongue,
   and gitano, all of which can be built with essentially the same pipeline
   stages, so long as they are parameterised by the project name.  As such,
   my projects have a reponame parameter.  Given that, if we assumed that
   the git action could be parameterised in the pipelines:
   
   - action: git
     repo: "git://git.gitano.org.uk/{{ reponame }}.git"
     ref: "{{ refname }}"
     path: "{{ reponame }}"
   - action: git
     repo: "git://git.gitano.org.uk/{{ reponame }}/debian.git"
     ref: "{% if debianrefname %}{{ debianrefname }}{% else %}{{ refname }}{% endif %}"
     path: "{{ reponame }}/debian"
   - action: git
     repo: git://git.gitano.org.uk/gp-packaging-tools.git
     ref: master
     path: gp-packaging-tools
   
   would be approximately what I'd need for my git checkout pipeline.
   
   If "optional" parameters is not a thing, then the debianrefname parameter would
   just have to be provided at all times, defaulting to master as refname would.
   
   If, on the other hand, the ref was irrelevant because it was expected that
   there'd be a stage doing `git checkout ${refname}` or similar later, then I can
   handle the case of debianrefname being empty/not-provided in shell.
   
   My "out-there" use-case is what I call, in Gitano, system branches.  That works
   by having branches of the same name in any/all of the above repos, and then
   they're checked out together (along with master for any repo without the
   branch) and all built together for test purposes.  This project would not
   produce any debs because it purely exists to prove out the test suite on a
   system other than my development system, and I have tooling to produce the
   checkouts and build the code, providing that it can run somewhere with network
   access.  We'd need something like:
   
   - action: git
     repo: "git://git.gitano.org.uk/gitano.git"
     ref: "{{ sysbranchref }}"
     fallback-ref: master
     path: gitano
   
   Where fallback-ref is checked out if ref isn't available in the remote.
   
   I don't know how much of this you, or others, would want in Ick, but there you
   have it.
   
   For me, the obviously desirable option is (a) that the git action can take
   repo, ref, and path directly in the pipeline stage, and (b) that we can use
   some kind of syntax to interpolate parameters into those strings (using jinja2
   seems obvious, though requiring that at the ick controller end might be a
   smidge iffy for maintaining the possibility of replacing the python
   implementation in the future).
   
   I look forward to your take on all the above,
   
   D.
   
   -- 
   Daniel Silverstone                         http://www.digital-scurf.org/
   PGP mail accepted and encouraged.            Key Id: 3CCE BABE 206C 3B69
   
   _______________________________________________
   ick-discuss mailing list
   ick-discuss@ick.liw.fi
   https://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/ick-discuss-ick.liw.fi
From: Lars Wirzenius <liw@liw.fi>
Date: Wed, 4 Jul 2018 10:29:14 +0300

   On Tue, Jul 03, 2018 at 01:28:26PM +0100, Daniel Silverstone wrote:
   > On Mon, Jul 02, 2018 at 18:24:48 +0300, Lars Wirzenius wrote:
   > > However, the above design isn't flexible enough for Daniel, who needs
   > > to access multiple git repositories, not just once, and build a tree
   > > of clones. I'll let Daniel tell us what his use-case is like. (I think
   > > I know, but it's better if he tells it himself. I can be a bit of a
   > > broken telephone.)
   >
   > [description of use cases omitted]
   
   > I have many projects, luxio, lua-scrypt, clod, gall, supple, lace, tongue,
   > and gitano, all of which can be built with essentially the same pipeline
   > stages, so long as they are parameterised by the project name.  As such,
   > my projects have a reponame parameter.  Given that, if we assumed that
   > the git action could be parameterised in the pipelines:
   > 
   > - action: git
   >   repo: "git://git.gitano.org.uk/{{ reponame }}.git"
   >   ref: "{{ refname }}"
   >   path: "{{ reponame }}"
   > - action: git
   >   repo: "git://git.gitano.org.uk/{{ reponame }}/debian.git"
   >   ref: "{% if debianrefname %}{{ debianrefname }}{% else %}{{ refname }}{% endif %}"
   >   path: "{{ reponame }}/debian"
   > - action: git
   >   repo: git://git.gitano.org.uk/gp-packaging-tools.git
   >   ref: master
   >   path: gp-packaging-tools
   > 
   > would be approximately what I'd need for my git checkout pipeline.
   > 
   > If "optional" parameters is not a thing, then the debianrefname parameter would
   > just have to be provided at all times, defaulting to master as refname would.
   > 
   > If, on the other hand, the ref was irrelevant because it was expected that
   > there'd be a stage doing `git checkout ${refname}` or similar later, then I can
   > handle the case of debianrefname being empty/not-provided in shell.
   > 
   > My "out-there" use-case is what I call, in Gitano, system branches.  That works
   > by having branches of the same name in any/all of the above repos, and then
   > they're checked out together (along with master for any repo without the
   > branch) and all built together for test purposes.  This project would not
   > produce any debs because it purely exists to prove out the test suite on a
   > system other than my development system, and I have tooling to produce the
   > checkouts and build the code, providing that it can run somewhere with network
   > access.  We'd need something like:
   > 
   > - action: git
   >   repo: "git://git.gitano.org.uk/gitano.git"
   >   ref: "{{ sysbranchref }}"
   >   fallback-ref: master
   >   path: gitano
   > 
   > Where fallback-ref is checked out if ref isn't available in the remote.
   > 
   > I don't know how much of this you, or others, would want in Ick, but there you
   > have it.
   > 
   > For me, the obviously desirable option is (a) that the git action can take
   > repo, ref, and path directly in the pipeline stage, and (b) that we can use
   > some kind of syntax to interpolate parameters into those strings (using jinja2
   > seems obvious, though requiring that at the ick controller end might be a
   > smidge iffy for maintaining the possibility of replacing the python
   > implementation in the future).
   > 
   > I look forward to your take on all the above,
   > 
   > D.
   > 
   > -- 
   > Daniel Silverstone                         http://www.digital-scurf.org/
   > PGP mail accepted and encouraged.            Key Id: 3CCE BABE 206C 3B69
   > 
   > _______________________________________________
   > ick-discuss mailing list
   > ick-discuss@ick.liw.fi
   > https://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/ick-discuss-ick.liw.fi
   >
   
   Thanks, Daniel. That's illuminating and interesting.
   
   I'm going to rephrase what you wrote, to show how I understood it, and
   to suggest something.
   
   First, a comment on adding a templating language on top of YAML. I
   first encountered this with Ansible, and I've since used in my own
   vmdb2 project. It's a powerful concept, but I find it to sometimes be
   problematic: the input files aren't actually YAML anymore, and they
   become harder to understand and harder to debug. Sometimes that's OK,
   the added power is worth it. However, I'd like to avoid that for ick,
   for the time being.
   
   (Also, as you say, using jinja2 for templating might become
   problematic if and when ick gets rewritten in another language.)
   
   Thus, I'll write out your examples without templating. They'll be more
   verbose, and more repetitive, but should be more obvious as to
   meaning.
   
   That also means I'm, at this time, not ready to have action attributes
   that vary from project to project. Perhaps that'd be a useful addition
   later on, but it seems to require some form of (templating) language
   on top of YAML.
   
   Your first use case seems to me to be like this:
   
   * all your projects live on the same git server, next to each other
   * your main repository for a component is foo.git, branch master
   * its Debian packaging is in foo-debian.git, branch master for unstable
   * additionally, you have some common build tooling in
   * gp-packaging-tools.git
   
   I note that this has a fairly tight coupling between the repositories
   for the main code and the Debian packaging. For ick, I'd prefer to
   avoid mandating that: one use case might be that I want to package
   your code and I put the Debian packaging on my server, while your code
   remains on yours.
   
   My suggestion would be to change the git action to look for a project
   parameter "git" (or "sources"?), which would be a *list* of
   url/ref/dir tuples. For you, each project would have a parameter like
   this:
   
       - project: foo
         parameters:
           git:
           - url: git://git.gitano.org.uk/foo.git
             ref: master
             dir: src
           - url: git://git.gitano.org.uk/foo-debian.git
             ref: debian/unstable
             dir: src/debian
           - url: git://git.gitano.org.uk/gp-packaging-tools.git
             ref: master
             dir: gp-packaging-tools
   
   Which would result in a directory tree like this:
   
       /workspace
       /workspace/src
       /workspace/src/.git
       /workspace/src/debian
       /workspace/src/debian/.git
       /workspace/gp-packaging-tools
       /workspace/gp-packaging-tools/.git
   
   I admit the git parameter gets a bit repetitive. If that's enough of a
   problem, I'd be willing to make the git action understand the optional
   parameter "git_server" (or "git_base_url"):
   
       - project: foo
         parameters:
           git_server: git:///git.gitano.org.uk
           git:
           - repo: foo.git
             ref: master
             dir: src
           - repo: foo-debian.git
             ref: debian/unstable
             dir: src/debian
           - repo: gp-packaging-tools.git
             ref: master
             dir: gp-packaging-tools
   
   In the tuples in the git parameter, url would still work and would be
   the full URL to the repository. Otherwise, the repo field would be
   appended to the git_server value.
   
   Also, we could make "master" be the default branch. Actually, that
   suggests to me that a mechanism for providing defaults might be
   useful. Maybe something like this:
   
       - project: foo
         parameters:
           git_defaults:
             server: git:///git.gitano.org.uk
             ref: master
           git:
           - repo: foo.git
             dir: src
           - repo: foo-debian.git
             ref: debian/unstable
             dir: src/debian
           - repo: gp-packaging-tools.git
             dir: gp-packaging-tools
           - server: git://git.liw.fi
             repo: liw-stuff.git
   
   In thie example, the url/ref/dir tuple is replaced with a
   server/repo/ref/dir tuple and the full repo URL is built by combining
   server and repo fields. (Though it might be helpful to provide a url
   field that can be used instead of server+repo.) See below for
   overriding values in triggers.
   
   Handling a list of dicts in shell with jq is not necessarily the kind
   of thing I enjoy doing. I'd do it in Python instead, where it's not as
   awkward.
   
   Your system branch use case is interesting. My initial suggestion for
   that would be change ick to allow a trigger call, which starts a new
   build, to override project parameters for a specific build. Currnetly
   it's just a simple GET:
   
       GET /projects/foo/+trigger
   
   We could add another way to do triggers:
   
       POST /projects/foo/+trigger
       Content-Type: application/json
   
       {
           "parameters": {
               "git_defaults": {
                   "ref": "liw/fix"
               }
           }
       }
   
   This would set the "ref" field in the "git_defaults" parameter to
   "liw/fix". The "server" field would not be changed. The
   foo-debian.git still use the debian/unstable branch.
   
   While this would not be as powerful as full jinja2 support, it would,
   I think, be a useful feature, entirely separate from templating.
   
   What I suggest here isn't really what you suggested, but what do you
   think? Would this be workable?
   
   -- 
   I want to build worthwhile things that might last. --joeyh
From: Daniel Silverstone <dsilvers@digital-scurf.org>
Date: Thu, 5 Jul 2018 11:35:31 +0100

   On Wed, Jul 04, 2018 at 10:29:14 +0300, Lars Wirzenius wrote:
   > Thanks, Daniel. That's illuminating and interesting.
   
   I'm glad :-)
   
   > I'm going to rephrase what you wrote, to show how I understood it, and
   > to suggest something.
   
   Fair, I'll respond inline here :-)
   
   > First, a comment on adding a templating language on top of YAML. I
   > first encountered this with Ansible, and I've since used in my own
   > vmdb2 project. It's a powerful concept, but I find it to sometimes be
   > problematic: the input files aren't actually YAML anymore, and they
   > become harder to understand and harder to debug. Sometimes that's OK,
   > the added power is worth it. However, I'd like to avoid that for ick,
   > for the time being.
   > 
   > (Also, as you say, using jinja2 for templating might become
   > problematic if and when ick gets rewritten in another language.)
   
   Yes, absolutely.  At the "worst" we can come up with a convention by which
   icktool expands things before it sends them to the controller, but let's avoid
   templating languages for now.
   
   > Your first use case seems to me to be like this:
   > 
   > * all your projects live on the same git server, next to each other
   > * your main repository for a component is foo.git, branch master
   > * its Debian packaging is in foo-debian.git, branch master for unstable
   > * additionally, you have some common build tooling in
   > * gp-packaging-tools.git
   
   Yep (though foo/debian.git) :-)
   
   > I note that this has a fairly tight coupling between the repositories
   > for the main code and the Debian packaging. For ick, I'd prefer to
   > avoid mandating that: one use case might be that I want to package
   > your code and I put the Debian packaging on my server, while your code
   > remains on yours.
   
   Yes, the coupling is tight for the packages I have.
   
   > My suggestion would be to change the git action to look for a project
   > parameter "git" (or "sources"?), which would be a *list* of
   > url/ref/dir tuples. For you, each project would have a parameter like
   > this:
   > 
   >     - project: foo
   >       parameters:
   >         git:
   >         - url: git://git.gitano.org.uk/foo.git
   >           ref: master
   >           dir: src
   >         - url: git://git.gitano.org.uk/foo-debian.git
   >           ref: debian/unstable
   >           dir: src/debian
   >         - url: git://git.gitano.org.uk/gp-packaging-tools.git
   >           ref: master
   >           dir: gp-packaging-tools
   > 
   > Which would result in a directory tree like this:
   > 
   >     /workspace
   >     /workspace/src
   >     /workspace/src/.git
   >     /workspace/src/debian
   >     /workspace/src/debian/.git
   >     /workspace/gp-packaging-tools
   >     /workspace/gp-packaging-tools/.git
   
   That looks acceptable (presumably there'd need to be an action:git pipeline
   step somewhere too?)
   
   > I admit the git parameter gets a bit repetitive. If that's enough of a
   > problem, I'd be willing to make the git action understand the optional
   > parameter "git_server" (or "git_base_url"):
   > 
   >     - project: foo
   >       parameters:
   >         git_server: git:///git.gitano.org.uk
   >         git:
   >         - repo: foo.git
   >           ref: master
   >           dir: src
   >         - repo: foo-debian.git
   >           ref: debian/unstable
   >           dir: src/debian
   >         - repo: gp-packaging-tools.git
   >           ref: master
   >           dir: gp-packaging-tools
   
   I'd call it repo_base and expect that repos which aren't absolute are urljoin'd
   onto it as you suggest.
   
   > Also, we could make "master" be the default branch. Actually, that
   > suggests to me that a mechanism for providing defaults might be
   > useful. Maybe something like this:
   [snip]
   
   Sounds good, though again I'd call it default_ref so that it's clear what its
   purpose is.
   
   > In thie example, the url/ref/dir tuple is replaced with a
   > server/repo/ref/dir tuple and the full repo URL is built by combining
   > server and repo fields. (Though it might be helpful to provide a url
   > field that can be used instead of server+repo.) See below for
   > overriding values in triggers.
   
   As I suggested, I'd just go with "if repo is absolute, use it, otherwise
   urljoin" as a non-surprising behaviour.
   
   > Handling a list of dicts in shell with jq is not necessarily the kind
   > of thing I enjoy doing. I'd do it in Python instead, where it's not as
   > awkward.
   
   Fair.
   
   > Your system branch use case is interesting. My initial suggestion for
   > that would be change ick to allow a trigger call, which starts a new
   > build, to override project parameters for a specific build. Currnetly
   > it's just a simple GET:
   > 
   >     GET /projects/foo/+trigger
   > 
   > We could add another way to do triggers:
   > 
   >     POST /projects/foo/+trigger
   >     Content-Type: application/json
   > 
   >     {
   >         "parameters": {
   >             "git_defaults": {
   >                 "ref": "liw/fix"
   >             }
   >         }
   >     }
   > 
   > This would set the "ref" field in the "git_defaults" parameter to
   > "liw/fix". The "server" field would not be changed. The
   > foo-debian.git still use the debian/unstable branch.
   
   So long as you have a mechanism for removing values as well, that'd work quite
   well for the most part.  It still lacks the fallback for repos which lack the
   ref, sadly, but perhaps that can't be usefully managed generically?
   
   > While this would not be as powerful as full jinja2 support, it would,
   > I think, be a useful feature, entirely separate from templating.
   > 
   > What I suggest here isn't really what you suggested, but what do you
   > think? Would this be workable?
   
   For my primary use-case this would be perfectly fine.  For the system branch
   use-case it's still not quite enough to prevent a need for an additional
   pipeline step to handle the checkouts.
   
   I think, for now, we should go with the primary-use case stuff, and worry about
   the complex trigger solution in another iteration in the future.  Do you want
   to combine all of the stuff into a specification and then I'll review that so
   we can close this work item off?
   
   D.
   
   -- 
   Daniel Silverstone                         http://www.digital-scurf.org/
   PGP mail accepted and encouraged.            Key Id: 3CCE BABE 206C 3B69
   
   _______________________________________________
   ick-discuss mailing list
   ick-discuss@ick.liw.fi
   https://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/ick-discuss-ick.liw.fi
From: Lars Wirzenius <liw@liw.fi>
Date: Sat, 7 Jul 2018 12:43:50 +0300

   On Thu, Jul 05, 2018 at 11:35:31AM +0100, Daniel Silverstone wrote:
   > I think, for now, we should go with the primary-use case stuff, and worry about
   > the complex trigger solution in another iteration in the future.  Do you want
   > to combine all of the stuff into a specification and then I'll review that so
   > we can close this work item off?
   
   I wrote up my understanding of the consensus and put it on the website:
   
   https://ick.liw.fi/blog/2018/07/07/ick_git_action_multiple_repository_support/
   
   Does that look OK? If it does, I'll start changing the worker
   manager's git action to implement that. After that, we should probably
   hone the additional "update checkout" step together, possibly based on
   http://git.liw.fi/liw-ci/tree/ci-prod.ick#n704 .
   
   At some point we should add a "standard library of ick pipelines" that
   comes with ick. That'll be exciting. At that time we should, for
   example, consider if pipeline is a good term for ick to use.
   
   -- 
   I want to build worthwhile things that might last. --joeyh