URL Parser - libvcs.url#

We all love urllib.parse, but what about VCS systems?

Also, things like completions and typings being in demand, what of all these factories? Good python code, but how to we get editor support and the nice satisfaction of types snapping together?

If there was a type-friendly structure - like writing our own abstract base class - or a dataclasses - while also being extensible to patterns and groupings, maybe we could strike a perfect balance.

If we could make it ready-to-go out of the box, but also have framework-like extensibility, it could satisfy the niche.

Validate and detect VCS URLs#

libvcs.url.git.GitURL.is_valid()

>>> from libvcs.url.git import GitURL

>>> GitURL.is_valid(url='https://github.com/vcs-python/libvcs.git')
True
>>> from libvcs.url.git import GitURL

>>> GitURL.is_valid(url='[email protected]:vcs-python/libvcs.git')
True

libvcs.url.hg.HgURL.is_valid()

>>> from libvcs.url.hg import HgURL

>>> HgURL.is_valid(url='https://hg.mozilla.org/mozilla-central/mozilla-central')
True
>>> from libvcs.url.hg import HgURL

>>> HgURL.is_valid(url='[email protected]:MyProject/project')
True

libvcs.url.svn.SvnURL.is_valid()

>>> from libvcs.url.svn import SvnURL

>>> SvnURL.is_valid(
... url='https://svn.project.org/project-central/project-central')
True
>>> from libvcs.url.svn import SvnURL

>>> SvnURL.is_valid(url='[email protected]:MyProject/project')
True

Parse VCS URLs#

Compare to urllib.parse.ParseResult

libvcs.url.git.GitURL

>>> from libvcs.url.git import GitURL

>>> GitURL(url='[email protected]:vcs-python/libvcs.git')
GitURL([email protected]:vcs-python/libvcs.git,
        user=git,
        hostname=github.com,
        path=vcs-python/libvcs,
        suffix=.git,
        rule=core-git-scp)

libvcs.url.hg.HgURL

>>> from libvcs.url.hg import HgURL

>>> HgURL(
...     url="http://hugin.hg.sourceforge.net:8000/hgroot/hugin/hugin")
HgURL(url=http://hugin.hg.sourceforge.net:8000/hgroot/hugin/hugin,
        scheme=http,
        hostname=hugin.hg.sourceforge.net,
        port=8000,
        path=hgroot/hugin/hugin,
        rule=core-hg)

libvcs.url.svn.SvnURL

>>> from libvcs.url.svn import SvnURL

>>> SvnURL(
...     url='svn+ssh://svn.debian.org/svn/aliothproj/path/in/project/repository')
SvnURL(url=svn+ssh://svn.debian.org/svn/aliothproj/path/in/project/repository,
       scheme=svn+ssh,
       hostname=svn.debian.org,
       path=svn/aliothproj/path/in/project/repository,
       rule=pip-url)

Export usable URLs#

pip knows what a certain URL string means, but git clone won’t.

e.g. pip install git+https://github.com/django/django.git@3.2 works great with pip.

$ pip install git+https://github.com/django/[email protected]
...
Successfully installed Django-3.2

but git clone can’t use that:

$ git clone git+https://github.com/django/[email protected]  # Fail
...
Cloning into [email protected]''...'
git: 'remote-git+https' is not a git command. See 'git --help'.

It needs something like this:

$ git clone https://github.com/django/django.git --branch 3.2

But before we get there, we don’t know if we want a URL yet. We return a structure, e.g. GitURL.

  • Common result primitives across VCS, e.g. GitURL.

    Compare to a urllib.parse.ParseResult in urlparse

    This is where fun can happen, or you can just parse a URL.

  • Allow mutating / replacing parse of a vcs (e.g. just the hostname)

  • Support common cases with popular VCS systems

  • Support extending parsing for users needing to do so

Scope#

Out of the box#

The ambition for this is to build extendable parsers for package-like URLs, e.g.

  • Vanilla VCS URLs

    • any URL supported by the VCS binary, e.g. git(1), svn(1), hg(1).

  • pip-style urls [1]

    • branches

    • tags

  • NPM-style urls[2]

    • branches

    • tags

Extendability#

Patterns can be registered. Similar behavior exists in urlparse (undocumented).

  • Any formats not covered by the stock

  • Custom urls

    • For orgs on , e.g:

      • python:mypy -> git@github.com:python/mypy.git

      • inkscape:inkscape -> git@gitlab.com:inkscape/inkscape.git

    • For out of domain trackers, e.g.

      Direct to site:

      Direct to site + org / group:

      • gnome:gedit -> git@gitlab.gnome.org:GNOME/gedit.git

      • openstack:openstack -> https://opendev.org/openstack/openstack.git

      • mozilla:central -> https://hg.mozilla.org/mozilla-central/

From there, GitURL can be used downstream directly by other projects.

In our case, libvcss’ own Commands - libvcs.cmd and Sync - libvcs.sync, as well as a $ vcspull · configuration, will be able to detect and accept various URL patterns.

Matchers: Defaults#

When a match occurs, its defaults will fill in non-matched groups.

Matchers: First wins#

When registering new matchers, higher weights are checked first. If it’s a valid regex grouping, it will be picked.

Explore#