How to convert an integer to the shortest url-safe string in Python?

Question

This answer is similar in spirit to Douglas Leeder’s, with the following changes:

It doesn’t use actual Base64, so there’s no padding characters

Instead of converting the number first to a byte-string (base 256), it converts it directly to base 64, which has the advantage of letting you represent negative numbers using a sign character.

import string
ALPHABET = string.ascii_uppercase + string.ascii_lowercase + \
           string.digits + '-_'
ALPHABET_REVERSE = dict((c, i) for (i, c) in enumerate(ALPHABET))
BASE = len(ALPHABET)
SIGN_CHARACTER = '$'

def num_encode(n):
    if n < 0:
        return SIGN_CHARACTER + num_encode(-n)
    s = []
    while True:
        n, r = divmod(n, BASE)
        s.append(ALPHABET[r])
        if n == 0: break
    return ''.join(reversed(s))

def num_decode(s):
    if s[0] == SIGN_CHARACTER:
        return -num_decode(s[1:])
    n = 0
    for c in s:
        n = n * BASE + ALPHABET_REVERSE[c]
    return n

    >>> num_encode(0)
    'A'
    >>> num_encode(64)
    'BA'
    >>> num_encode(-(64**5-1))
    '$_____'

A few side notes:

You could (marginally) increase the human-readibility of the base-64 numbers by putting string.digits first in the alphabet (and making the sign character ‘-‘); I chose the order that I did based on Python’s urlsafe_b64encode.
If you’re encoding a lot of negative numbers, you could increase the efficiency by using a sign bit or one’s/two’s complement instead of a sign character.
You should be able to easily adapt this code to different bases by changing the alphabet, either to restrict it to only alphanumeric characters or to add additional “URL-safe” characters.
I would recommend against using a representation other than base 10 in URIs in most cases—it adds complexity and makes debugging harder without significant savings compared to the overhead of HTTP—unless you’re going for something TinyURL-esque.

Leave a Comment Cancel reply