Framework: Add and extend URL parsers - libvcs.url.base¶

Foundational tools to detect, parse, and validate VCS URLs.

class libvcs.url.base.URLProtocol(url)[source]¶

Bases: Protocol

Common interface for VCS URL Parsers.

Parameters:

url (str)

to_url()[source]¶

Output to a command friendly URL for VCS.

Return type:

str

classmethod is_valid(url, is_explicit=None)[source]¶

Return True if URL is valid for this parser.

Return type:

bool

Parameters:
  • url (str)

  • is_explicit (bool | None)

_abc_impl = <_abc._abc_data object>¶
_is_protocol = True¶
class libvcs.url.base.Rule(label, description, pattern, defaults=<factory>, is_explicit=False, weight=0)[source]¶

Bases: SkipDefaultFieldsReprMixin

A Rule represents an eligible pattern mapping to URL.

Parameters:
label: str¶

Computer readable name / ID

description: str¶

Human readable description

pattern: Pattern[str]¶

Regex pattern

defaults: dict[str, str]¶

Is the match unambiguous with other VCS systems? e.g. git+ prefix

is_explicit: bool = False¶

Higher is more likely to win

Type:

Weight

weight: int = 0¶
class libvcs.url.base.RuleMap(_rule_map=<factory>)[source]¶

Bases: SkipDefaultFieldsReprMixin

Pattern matching and parsing capabilities for URL parsers, e.g. GitURL.

Parameters:

_rule_map (dict[str, Rule])

_rule_map: dict[str, Rule]¶
register(cls)[source]¶

Add a new URL rule.

Return type:

None

Parameters:

cls (Rule)

>>> from dataclasses import dataclass
>>> from libvcs.url.git import GitURL, GitBaseURL

GitBaseURL - the git(1) compliant parser - won’t accept a pip-style URL:

>>> GitBaseURL.is_valid(url="git+ssh://[email protected]/tony/AlgoXY.git")
False

GitURL - the “batteries-included” parser - can do it:

>>> GitURL.is_valid(url="git+ssh://[email protected]/tony/AlgoXY.git")
True

But what if you wanted to do github:org/repo?

>>> GitURL.is_valid(url="github:org/repo")
True

That actually works, but look, it’s caught in git’s standard SCP regex:

>>> GitURL(url="github:org/repo")
GitURL(url=github:org/repo,
   hostname=github,
   path=org/repo,
   rule=core-git-scp)
>>> GitURL(url="github:org/repo").to_url()
'git@github:org/repo'

Eek. That won’t work, can’t do much with that one.

We need something more specific so usable URLs can be generated. What do we do?

Extending matching capability:

>>> class GitHubPrefix(Rule):
...     label = 'gh-prefix'
...     description ='Matches prefixes like github:org/repo'
...     pattern = r'^github:(?P<path>.*)$'
...     defaults = {
...         'hostname': 'github.com',
...         'scheme': 'https'
...     }
...     # We know it's git, not any other VCS
...     is_explicit = True
...     weight = 50
>>> @dataclasses.dataclass(repr=False)
... class GitHubURL(GitURL):
...    rule_map = RuleMap(
...        _rule_map={'github_prefix': GitHubPrefix}
...    )
>>> GitHubURL.is_valid(url='github:vcs-python/libvcs')
True
>>> GitHubURL.is_valid(url='github:vcs-python/libvcs', is_explicit=True)
True

Notice how defaults neatly fills the values for us.

>>> GitHubURL(url='github:vcs-python/libvcs')
GitHubURL(url=github:vcs-python/libvcs,
    scheme=https,
    hostname=github.com,
    path=vcs-python/libvcs,
    rule=gh-prefix)
>>> GitHubURL(url='github:vcs-python/libvcs').to_url()
'https://github.com/vcs-python/libvcs'
>>> GitHubURL.is_valid(url='gitlab:vcs-python/libvcs')
False

GitHubURL sees this as invalid since it only has one rule, GitHubPrefix.

>>> GitURL.is_valid(url='gitlab:vcs-python/libvcs')
True

Same story, getting caught in git(1)’s own liberal scp-style URL:

>>> GitURL(url='gitlab:vcs-python/libvcs').rule
'core-git-scp'
>>> class GitLabPrefix(Rule):
...     label = 'gl-prefix'
...     description ='Matches prefixes like gitlab:org/repo'
...     pattern = r'^gitlab:(?P<path>)'
...     defaults = {
...         'hostname': 'gitlab.com',
...         'scheme': 'https',
...         'suffix': '.git'
...     }

Option 1: Create a brand new rule

>>> @dataclasses.dataclass(repr=False)
... class GitLabURL(GitURL):
...     rule_map = RuleMap(
...         _rule_map={'gitlab_prefix': GitLabPrefix}
...     )
>>> GitLabURL.is_valid(url='gitlab:vcs-python/libvcs')
True

Option 2 (global, everywhere): Add to the global GitURL:

>>> GitURL.is_valid(url='gitlab:vcs-python/libvcs')
True

Are we home free, though? Remember our issue with vague matches.

>>> GitURL(url='gitlab:vcs-python/libvcs').rule
'core-git-scp'

Register:

>>> GitURL.rule_map.register(GitLabPrefix)
>>> GitURL.is_valid(url='gitlab:vcs-python/libvcs')
True

Example: git URLs + pip-style git URLs:

This is already in GitURL via PIP_DEFAULT_RULES. For the sake of showing how extensibility works, here is a recreation based on GitBaseURL:

>>> from libvcs.url.git import GitBaseURL
>>> from libvcs.url.git import DEFAULT_RULES, PIP_DEFAULT_RULES
>>> @dataclasses.dataclass(repr=False)
... class GitURLWithPip(GitBaseURL):
...    rule_map = RuleMap(
...        _rule_map={m.label: m for m in [*DEFAULT_RULES, *PIP_DEFAULT_RULES]}
...    )
>>> GitURLWithPip.is_valid(url="git+ssh://[email protected]/tony/AlgoXY.git")
True
>>> GitURLWithPip(url="git+ssh://[email protected]/tony/AlgoXY.git")
GitURLWithPip(url=git+ssh://[email protected]/tony/AlgoXY.git,
    scheme=git+ssh,
    user=git,
    hostname=github.com,
    path=tony/AlgoXY,
    suffix=.git,
    rule=pip-url)
unregister(label)[source]¶

Remove a URL rule.

Return type:

None

Parameters:

label (str)

values()[source]¶

Return list of URL rules.

Return type:

dict_values[str, Rule]