executable tar archives

So an internet person said something about making tarballs executable, in the same way as e.g. AppImages, and I got Inspired.

There's prior art for this, of course, in the form of AppImage (which works by mounting the file as a filesystem) and shar (which generates a shell script that when executed creates a bunch of files).

These are intended to be completely serious tools that people actually use, which is unacceptable for my purposes.

first pass: a shell script with a tarball attached

We'll start with this shell script, main, as the application we want to package:

#!/bin/bash
echo "hi I'm a tarball"
echo "argv: $*"
$ ./main hello world
hi I'm a tarball
argv: hello world

We can just pack this up in a tarball, then have the end user extract it and execute it. We can automate that by prefixing it with a shell script:

#!/bin/bash
tempdir=$(mktemp -d)
sed '1,/^exit/d' <$0 |tar -x -C $tempdir
$tempdir/main $*
rm -rf $tempdir
exit

The sed command on line 3 has the effect of skipping every line up to, and including, exit on line 6 (which is there both as a convenient "start of archive" marker and to ensure that we don't go off trying to execute the archive as shell commands after the wrapped application exits). We then pass the remaining file contents (the archive) to tar -x.

Note that, since the file suddenly switches from text to binary at this point, there must be exactly one newline at the end to make it work correctly (zero would cause sed to cut off the start of the archive thinking it's part of the exit line, and two or more would mean n - 1 of them get left in before the archive):

$ hexdump header -C -s 0x60
00000060  24 74 65 6d 70 64 69 72  0a 65 78 69 74 0a        |$tempdir.exit.|

We can then just concenate that with an archive and execute it:

$ cp header executable.tar
$ tar -c entry >>executable.tar
$ chmod +x executable.tar
$ ./executable.tar hello world
hi I'm a tarball
argv: hello world

but that's not a tarball

The disadvantage is that tar no longer understands this archive on its own, so it must be executed to extract it:

$ tar -xf executable.tar
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

So, can we do better? The tar file format has no archive-wide header, only per-file headers; i.e. the first thing in the archive is a file header. If we can construct a tar file header that is also a valid shell script, both methods should work.

A file header is 500 bytes, where the first 100 are a file name and the last 155 are a prefix (prepended to the name to make 255 total characters). Additionally, there's a 100-character linkname at offset 157, used for symlinks.

struct posix_header
{                              /* byte offset */
  char name[100];               /*   0 */
  char mode[8];                 /* 100 */
  char uid[8];                  /* 108 */
  char gid[8];                  /* 116 */
  char size[12];                /* 124 */
  char mtime[12];               /* 136 */
  char chksum[8];               /* 148 */ /* sum of all header bytes, with this field set to ASCII spaces, then 6-char octal + null + space */
  char typeflag;                /* 156 */
  char linkname[100];           /* 157 */
  char magic[6];                /* 257 */ /* for GNU tar, version+magic = "ustar  \0" */
  char version[2];              /* 263 */
  char uname[32];               /* 265 */
  char gname[32];               /* 297 */
  char devmajor[8];             /* 329 */
  char devminor[8];             /* 337 */
  char prefix[155];             /* 345 */
                                /* 500 */
};

(All these fields are ASCII; all arrays but version are null-terminated. Even the numbers, which are written in octal. It's a weird format.)

Our existing header script from above is 110 bytes. We can reduce it to under 100 by using a shorter variable name for $tempdir. We also now know exactly how many bytes to skip (512, because tar headers are 512-byte aligned), so we don't need to do the sed thing:

#!/bin/sh
t=$(mktemp -d)
tail -c+513 $0|tar -x -C $t
$t/main $*
rm -rf $t
exit

This is 79 bytes, which can fit in the name field of a placeholder entry:

00000000 23 21 2f 62 69 6e 2f 73 68 0a 74 3d 24 28 6d 6b |#!/bin/sh.t=$(mk| 00000010 74 65 6d 70 20 2d 64 29 0a 74 61 69 6c 20 2d 63 |temp -d).tail -c| 00000020 2b 35 31 33 20 24 30 7c 74 61 72 20 2d 78 20 2d |+513 $0|tar -x -| 00000030 43 20 24 74 0a 24 74 2f 6d 61 69 6e 20 24 2a 0a |C $t.$t/main $*.| 00000040 72 6d 20 2d 72 66 20 24 74 0a 65 78 69 74 0a 00 |rm -rf $t.exit..| 00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000060 00 00 00 00 30 30 30 30 36 34 34 00 30 31 37 37 |....0000644.0177| 00000070 37 37 36 00 30 31 37 37 37 37 36 00 30 30 30 30 |776.0177776.0000| 00000080 30 30 30 30 30 30 30 00 30 30 30 30 30 30 30 30 |0000000.00000000| 00000090 30 30 30 00 30 32 31 30 36 32 00 20 56 00 00 00 |000.021062. V...| 000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000100 00 75 73 74 61 72 20 20 00 00 00 00 00 00 00 00 |.ustar ........| 00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

The file type (offset 0x9C) is set to 'V', a GNU extension (volume header, normally set with -V) that has the side effect of being skipped during extraction without causing an error (source). (GNU tar prints it in tar -tvf output; BSD tar silently ignores it.)

And then we can prepend this to any tar file to make it executable:

$ tar -cf normal.tar main
$ cat executable-tar-header normal.tar >executable.tar
$ chmod +x executable.tar

$ file executable.tar
executable.tar: POSIX tar archive (GNU)

$ tar -tvf executable.tar
Vrw-r--r-- 65534/65534       0 1969-12-31 19:00 #!/bin/sh\nt=$(mktemp -d)\ntail -c+513 $0|tar -x -C $t\n$t/main $*\nrm -rf $t\nexit\n--Volume Header--
-rwxr-xr-x emily/emily      50 2022-04-04 15:33 main

$ ./executable.tar hello world
hi I'm a tarball
argv: hello world

I take no responsibility for whatever horrifying thing you do with this information.

But here's an example file, containing:

Vrw-r--r-- 65534/65534 0 1969-12-31 19:00 #!/bin/sh\nt=$(mktemp -d)\ntail -c+513 $0|tar -x -C $t\n$t/main $*\nrm -rf $t\nexit\n--Volume Header-- -rw-r--r-- emily/emily 79 2022-04-04 17:38 header -rw-r--r-- emily/emily 512 2022-04-04 18:07 executable-tar-header -rwxr-xr-x emily/emily 50 2022-04-04 15:33 main -rw-r--r-- emily/emily 10240 2022-04-04 18:16 normal.tar -rw-r--r-- emily/emily 10752 2022-04-04 18:23 executable.tar