Framework: Add and extend URL parsers - libvcs.parse.base#

class libvcs.parse.base.Matcher#

Bases: libvcs._internal.dataclasses.SkipDefaultFieldsReprMixin

Structure for a matcher

description#

Human readable description

is_explicit = False#
label#

Computer readable name / ID

pattern#

Regex pattern

pattern_defaults#

Is the match unambiguous with other VCS systems? e.g. git+ prefix

class libvcs.parse.base.MatcherRegistry#

Bases: libvcs._internal.dataclasses.SkipDefaultFieldsReprMixin

Pattern matching and parsing capabilities for URL parsers, e.g. GitURL

register(cls)#
>>> from dataclasses import dataclass
>>> from libvcs.parse.git import GitURL, GitBaseURL

GitBaseURL - the git(1) compliant parser - won’t accept a pip-style URL:

>>> GitBaseURL.is_valid(url="git+ssh://[email protected]/tony/AlgoXY.git")
False

GitURL - the “batteries-included” parser - can do it:

>>> GitURL.is_valid(url="git+ssh://[email protected]/tony/AlgoXY.git")
True

But what if you wanted to do github:org/repo?

>>> GitURL.is_valid(url="github:org/repo")
True

That actually works, but look, it’s caught in git’s standard SCP regex:

>>> GitURL(url="github:org/repo")
GitURL(url=github:org/repo,
   hostname=github,
   path=org/repo,
   matcher=core-git-scp)
>>> GitURL(url="github:org/repo").to_url()
'[email protected]:org/repo'

Eek. That won’t work, can’t do much with that one.

We need something more specific so usable URLs can be generated. What do we do?

Extending matching capability:

>>> class GitHubPrefix(Matcher):
...     label = 'gh-prefix'
...     description ='Matches prefixes like github:org/repo'
...     pattern = r'^github:(?P<path>.*)$'
...     pattern_defaults = {
...         'hostname': 'github.com',
...         'scheme': 'https'
...     }
...     # We know it's git, not any other VCS
...     is_explicit = True
>>> @dataclasses.dataclass(repr=False)
... class GitHubURL(GitURL):
...    matchers: MatcherRegistry = MatcherRegistry(
...        _matchers={'github_prefix': GitHubPrefix}
...    )
>>> GitHubURL.is_valid(url='github:vcs-python/libvcs')
True
>>> GitHubURL.is_valid(url='github:vcs-python/libvcs', is_explicit=True)
True

Notice how pattern_defaults neatly fills the values for us.

>>> GitHubURL(url='github:vcs-python/libvcs')
GitHubURL(url=github:vcs-python/libvcs,
    scheme=https,
    hostname=github.com,
    path=vcs-python/libvcs,
    matcher=gh-prefix)
>>> GitHubURL(url='github:vcs-python/libvcs').to_url()
'https://github.com/vcs-python/libvcs'
>>> GitHubURL.is_valid(url='gitlab:vcs-python/libvcs')
False

GitHubURL sees this as invalid since it only has one matcher, GitHubPrefix.

>>> GitURL.is_valid(url='gitlab:vcs-python/libvcs')
True

Same story, getting caught in git(1)’s own liberal scp-style URL:

>>> GitURL(url='gitlab:vcs-python/libvcs').matcher
'core-git-scp'
>>> class GitLabPrefix(Matcher):
...     label = 'gl-prefix'
...     description ='Matches prefixes like gitlab:org/repo'
...     pattern = r'^gitlab:(?P<path>)'
...     pattern_defaults = {
...         'hostname': 'gitlab.com',
...         'scheme': 'https',
...         'suffix': '.git'
...     }

Option 1: Create a brand new matcher

>>> @dataclasses.dataclass(repr=False)
... class GitLabURL(GitURL):
...     matchers: MatcherRegistry = MatcherRegistry(
...         _matchers={'gitlab_prefix': GitLabPrefix}
...     )
>>> GitLabURL.is_valid(url='gitlab:vcs-python/libvcs')
True

Option 2 (global, everywhere): Add to the global GitURL:

>>> GitURL.is_valid(url='gitlab:vcs-python/libvcs')
True

Are we home free, though? Remember our issue with vague matches.

>>> GitURL(url='gitlab:vcs-python/libvcs').matcher
'core-git-scp'

Register:

>>> GitURL.matchers.register(GitLabPrefix)
>>> GitURL.is_valid(url='gitlab:vcs-python/libvcs')
True

Example: git URLs + pip-style git URLs:

This is already in GitURL via PIP_DEFAULT_MATCHERS. For the sake of showing how extensibility works, here is a recreation based on GitBaseURL:

>>> from libvcs.parse.git import GitBaseURL
>>> from libvcs.parse.git import DEFAULT_MATCHERS, PIP_DEFAULT_MATCHERS
>>> @dataclasses.dataclass(repr=False)
... class GitURLWithPip(GitBaseURL):
...    matchers: MatcherRegistry = MatcherRegistry(
...        _matchers={m.label: m for m in [*DEFAULT_MATCHERS, *PIP_DEFAULT_MATCHERS]}
...    )
>>> GitURLWithPip.is_valid(url="git+ssh://[email protected]/tony/AlgoXY.git")
True
>>> GitURLWithPip(url="git+ssh://[email protected]/tony/AlgoXY.git")
GitURLWithPip(url=git+ssh://[email protected]/tony/AlgoXY.git,
    scheme=git+ssh,
    user=git,
    hostname=github.com,
    path=tony/AlgoXY,
    suffix=.git,
    matcher=pip-url)
Parameters:

cls (Matcher) –

Return type:

None

unregister(label)#
Parameters:

label (str) –

Return type:

None

values()#
Return type:

dict_values[str, Matcher]

class libvcs.parse.base.URLProtocol(url)#

Bases: Protocol

Common interface for VCS URL Parsers.

Parameters:

url (str) –

is_valid(url, is_explicit=None)#
Parameters:
  • url (str) –

  • is_explicit (Optional[bool]) –

Return type:

bool

to_url()#
Return type:

str