Reading quoted/escaped arguments correctly from a string

A Few Introductory Words

If at all possible, don’t use shell-quoted strings as an input format.

  • It’s hard to parse consistently: Different shells have different extensions, and different non-shell implementations implement different subsets (see the deltas between shlex and xargs below).
  • It’s hard to programmatically generate. ksh and bash have printf '%q', which will generate a shell-quoted string with contents of an arbitrary variable, but no equivalent exists to this in the POSIX sh standard.
  • It’s easy to parse badly. Many folks consuming this format use eval, which has substantial security concerns.

NUL-delimited streams are a far better practice, as they can accurately represent any possible shell array or argument list with no ambiguity whatsoever.


xargs, with bashisms

If you’re getting your argument list from a human-generated input source using shell quoting, you might consider using xargs to parse it. Consider:

array=( )
while IFS= read -r -d ''; do
  array+=( "$REPLY" )
done < <(xargs printf '%s\0' <<<"$ARGS")

swap "${array[@]}"

…will put the parsed content of $ARGS into the array array. If you wanted to read from a file instead, substitute <filename for <<<"$ARGS".


xargs, POSIX-compliant

If you’re trying to write code compliant with POSIX sh, this gets trickier. (I’m going to assume file input here for reduced complexity):

# This does not work with entries containing literal newlines; you need bash for that.
run_with_args() {
  while IFS= read -r entry; do
    set -- "$@" "$entry"
  done
  "$@"
}
xargs printf '%s\n' <argfile | run_with_args ./swap

These approaches are safer than running xargs ./swap <argfile inasmuch as it will throw an error if there are more or longer arguments than can be accommodated, rather than running excess arguments as separate commands.


Python shlex — rather than xargs — with bashisms

If you need more accurate POSIX sh parsing than xargs implements, consider using the Python shlex module instead:

shlex_split() {
  python -c '
import shlex, sys
for item in shlex.split(sys.stdin.read()):
    sys.stdout.write(item + "\0")
'
}
while IFS= read -r -d ''; do
  array+=( "$REPLY" )
done < <(shlex_split <<<"$ARGS")

Leave a Comment