Bryan

Bryan

twitter
medium

Embedding Binary Files in Shell Scripts

Preface#

When building Linux/Unix installation packages, in addition to packaging them into standard software packages suitable for various distributions, we may also want to provide a shell script for program installation, simplifying the installation steps into two: download script + run script.

Typically, most installation scripts download the required resources from the internet again, which minimizes the script's size and ensures that the latest version is always installed. However, this also results in the downloaded "installation package" essentially being an "installer," which cannot be installed offline.

This article will introduce a solution that has been validated in a production environment to dynamically embed URLs in the installation package.

Due to certain constraints, this article will focus more on the principles and cannot provide a complete code solution at this time. Thank you for your understanding.

Additionally, the following code is written based on the principles discussed in this article. Although the principles have been validated in production, the code used has not undergone strict production validation. Please report any bugs.

Script Composition#

The entire script consists of two parts: head + embed-bin; embed-bin is embedded without modification, while head is a dynamically generated script used to extract embed-bin from the current script and execute it.

Although the head script is dynamically generated, it is presented here in a template format for simplicity of maintenance.

#!/bin/sh
#
# NAME:     {{ .Name }}
# PLATFORM: { .Platform }}
# DIGEST:   {{ .MD5 }}, @LINES@

THIS_DIR=$(DIRNAME=$(dirname "$0"); cd "$DIRNAME"; pwd)
THIS_FILE=$(basename "$0")
THIS_PATH="$THIS_DIR/$THIS_FILE"
EXTRACT={{ if .AutoExtract }}1{{ else }}0{{ end }}
FORCE_EXTRACT=0
PREFIX={{ .DefaultPrefix }}
EXECUTE={{ if .AutoExecute }}1{{ else }}0{{ end }}
{{- end }}

USAGE="{{ .Opts.Usage }}"

while getopts ":h{{ .Opts.FlagNames }}" flag; do
 case "$flag" in
    h)
        printf "%s" "$USAGE"
        exit 2
        ;;
    {{- range .Opts.All }}
    {{ .Name }})
        {{ range .Action.DoIfSet }}{{ . }}
        {{ end }};;{{ end }}
    *)
        printf "ERROR: did not recognize option '%s', please try -h\\n" "$1"
        exit 1
        ;;
 esac
done

# Verify MD5
printf "%s\\n" "Verifying file..."
MD5=$(tail -n +@LINES@ "$THIS_PATH" | md5sum)
if ! echo "$MD5" | grep {{ .MD5 }} >/dev/null; then
    printf "ERROR: md5sum mismatch of tar archive\\n" >&2
    printf "expected: {{ .MD5 }}\\n" >&2
    printf "     got: %s\\n" "$MD5" >&2
    exit 3
fi

{{ if .Archive -}}
if [ -z "$PREFIX" ]; then
    PREFIX=$(mktemp -d -p $(pwd))
fi

if [ "$EXTRACT" = "1" ]; then
    if [ "$FORCE_EXTRACT" = "1" ] || [ ! -f "$PREFIX/.extract-done" ] || [ "$(cat "$PREFIX/.extract-done")" != "{{ .MD5}}" ]; then
        printf "Extracting archive to %s ...\\n" "$PREFIX"

        {
            dd if="$THIS_PATH" bs=1            skip=@ARCHIVE_FIRST_OFFSET@ count=@ARCHIVE_FIRST_BYTES@  2>/dev/null
            dd if="$THIS_PATH" bs=@BLOCK_SIZE@ skip=@ARCHIVE_BLOCK_OFFSET@ count=@ARCHIVE_BLOCKS_COUNT@ 2>/dev/null
            dd if="$THIS_PATH" bs=1            skip=@ARCHIVE_LAST_OFFSET@  count=@ARCHIVE_LAST_BYTES@   2>/dev/null
        } | tar zxf - -C "$PREFIX"

        echo -n {{ .MD5 }} > "$PREFIX/.extract-done"
    else
        printf "Archive has already been extracted to %s\\n" "$PREFIX"
    fi
fi

if [ "$EXECUTE" = "1" ]; then
    echo "Run Command:" {{ .Command }}
    cd "$PREFIX" && {{ .Command }}
fi
{{- end }}

exit 0
## --- DATA --- ##

This script template contains two types of variables: {{ XX }} and %XX%, with the main difference being that the entire template rendering is divided into two steps: first rendering all {{ XX }} variables, and then rendering the remaining %XX% variables; there are no special requirements for rendering the former, while rendering the latter requires ensuring that the length and number of lines of the text remain unchanged before and after rendering.

This script will extract embed-bin as a compressed package, mainly because the related data we use internally may be large (hundreds of megabytes or even over a GB). If you only need a small script, you can remove the compression-related code.

Additionally, this script performs an MD5 check before execution, mainly to prevent issues caused by incomplete script downloads in some cases. However, since embed-bin itself is already a compressed package, the code related to the check can be removed to speed up installation (the reason we keep it internally is partly because the content we embed is not just a compressed package and may involve multiple files, and partly to provide better error messages).

This script also provides the ability to pass parameters and specify some default values, as in certain cases, related steps may be abnormal, and executing all steps may be time-consuming. In actual use, you can modify the script parameters according to your needs.

The script parameters are provided by the template rendering engine, mainly for maintainability. If you prefer to write related content directly in the script, you can modify the relevant parts.

Rendering Script#

Without further ado, here is the code.

//go:embed "header.sh.tmpl"
var headerTemplate string

type headerOptions struct {
	Name string
	MD5  string

	Opts *Opts

	*ArchiveOptions
}

type ArchiveOptions struct {
	DefaultPrefix string
	AutoExtract   bool
	AutoExecute   bool
	Command       string // Use $PREFIX to reference prefix

	Filename string // For builder use, will not be included in the final file
}

func (o *ArchiveOptions) QuotedCommand() string {
	return shells.Quote(o.Command)
}

func renderHeaders(o *headerOptions) ([]byte, error) {
	t := template.New("")

	tt, err := t.Parse(headerTemplate)
	if err != nil {
		return nil, ee.Wrap(err, "invalid template")
	}

	b := bytes.Buffer{}

	err = tt.Execute(&b, o)
	if err != nil {
		return nil, err
	}

	return b.Bytes(), nil
}

func getHeaders(o *headerOptions) ([]byte, error) {
	tmpl, err := renderHeaders(o)
	if err != nil {
		return nil, err
	}

	lines := bytes.Count(tmpl, []byte("\n")) + 1

	tmpl = bytes.ReplaceAll(tmpl, []byte("@LINES@"), []byte(strconv.Itoa(lines)))

	replaceAndFillSpace(tmpl, "@BLOCK_SIZE@", blockSize)

	return tmpl, nil
}

func replaceAndFillSpace(data []byte, old string, new int64) {
	oldBytes := []byte(old)
	newString := strconv.FormatInt(new, 10)

	newWithExtraSpace := append([]byte(newString), bytes.Repeat([]byte{' '}, len(old)-len(newString))...)

	// assert len(old) == len(newWithExtraSpace)

	// Apply replacements to buffer.
	start := 0
	for {
		i := bytes.Index(data[start:], oldBytes)
		if i == -1 {
			return // stop
		}

		start += i
		start += copy(data[start:], newWithExtraSpace)
	}
}

type Opts struct {
	All []*Opt
}

func (opts *Opts) FlagNames() string {
	b := strings.Builder{}
	for _, opt := range opts.All {
		b.WriteString(opt.Name)
		if len(opt.Arg) != 0 {
			b.WriteString(":")
		}
	}

	return b.String()
}

func (opts *Opts) Usage() string {
	b := strings.Builder{}

	b.WriteString("Usage: $0 [options]\n\n")

	all := make([][2]string, 0, 1+len(opts.All))

	nameLen := 2

	all = append(all, [2]string{"-h", "Print this help message and exit"})

	for _, opt := range opts.All {
		bb := strings.Builder{}
		bb.WriteString("-")
		bb.WriteString(opt.Name)

		if opt.Arg != "" {
			bb.WriteString(" [")
			bb.WriteString(opt.Arg)
			bb.WriteString("]")
		}

		name := bb.String()

		if len(name) > nameLen {
			nameLen = len(name)
		}

		all = append(all, [2]string{name, opt.Help})
	}

	for _, a := range all {
		b.WriteString(a[0])
		b.WriteString(strings.Repeat(" ", nameLen-len(a[0])))
		b.WriteString("\t")
		b.WriteString(a[1])
		b.WriteString("\n")
	}

	return b.String()
}

type Opt struct {
	Name   string
	Arg    string
	Help   string
	Action OptAction
}

type OptAction interface {
	DoIfSet() []string
}

type DoAndExitAction struct {
	Do       []string
	ExitCode int
}

func (a *DoAndExitAction) DoIfSet() []string {
	r := append([]string{}, a.Do...)
	r = append(r, "exit "+strconv.Itoa(a.ExitCode))
	return r
}

type DoAndContinueAction struct {
	Do []string
}

func (a *DoAndContinueAction) DoIfSet() []string {
	return a.Do
}

func SimpleSetEnvAction(envName string, envValue interface{}) *DoAndContinueAction {
	return &DoAndContinueAction{
		Do: []string{fmt.Sprintf("%s=%v", envName, envValue)},
	}
}

type Builder struct {
	Name string

	ArchiveOptions *ArchiveOptions
}

func openAndWrite(filename string, w io.Writer) (int64, error) {
	f, err := os.Open(filename)
	if err != nil {
		return 0, err
	}
	defer f.Close()

	return io.Copy(w, f)
}

func fillAndSetHeader(prefix, filename string, f io.Writer, headers []byte, offset int64) (int64, error) {

	fileLength, err := openAndWrite(filename, f)
	if err != nil {
		return 0, ee.Wrap(err, "cannot append data for "+prefix)
	}

	firstOffset := offset
	firstBytes := blockSize - (firstOffset % blockSize)
	replaceAndFillSpace(headers, fmt.Sprintf("@%s_FIRST_OFFSET@", prefix), firstOffset)
	replaceAndFillSpace(headers, fmt.Sprintf("@%s_FIRST_BYTES@", prefix), firstBytes)

	copy2Start := firstOffset + firstBytes
	copy2Skip := copy2Start / blockSize
	copy2Blocks := (fileLength - copy2Start + firstOffset) / blockSize
	replaceAndFillSpace(headers, fmt.Sprintf("@%s_BLOCK_OFFSET@", prefix), copy2Skip)
	replaceAndFillSpace(headers, fmt.Sprintf("@%s_BLOCKS_COUNT@", prefix), copy2Blocks)

	copy3Start := (copy2Skip + copy2Blocks) * blockSize
	copy3Size := fileLength - firstBytes - (copy2Blocks * blockSize)
	replaceAndFillSpace(headers, fmt.Sprintf("@%s_LAST_OFFSET@", prefix), copy3Start)
	replaceAndFillSpace(headers, fmt.Sprintf("@%s_LAST_BYTES@", prefix), copy3Size)

	return fileLength, nil
}

func (b *Builder) Build(saveTo string) error {
	header := &headerOptions{
		Name:           b.Name,
		ArchiveOptions: b.ArchiveOptions,
		Opts:           &Opts{},
	}

	fileMD5 := md5.New()

	var dataSize int64

	if header.ArchiveOptions != nil {
		if header.ArchiveOptions.AutoExtract {
			header.Opts.All = append(header.Opts.All, &Opt{
				Name:   "E",
				Help:   "Do not extract archive",
				Action: SimpleSetEnvAction("EXTRACT", 0),
			})
		} else {
			header.Opts.All = append(header.Opts.All, &Opt{
				Name:   "e",
				Help:   "Also extract archive",
				Action: SimpleSetEnvAction("EXTRACT", 1),
			})
		}

		header.Opts.All = append(header.Opts.All, &Opt{
			Name:   "f",
			Help:   "Force extract archive",
			Action: SimpleSetEnvAction("FORCE_EXTRACT", 1),
		})

		prefixOpt := &Opt{
			Name: "d",
			Arg:  "DIR",
			Help: "Extract to directory",
			Action: &DoAndContinueAction{
				Do: []string{`PREFIX="${OPTARG}"`},
			},
		}
		if header.ArchiveOptions.DefaultPrefix != "" {
			prefixOpt.Help += fmt.Sprintf(" (default: %s)", header.ArchiveOptions.DefaultPrefix)
		}

		header.Opts.All = append(header.Opts.All, prefixOpt)

		if header.ArchiveOptions.Command != "" {
			if header.ArchiveOptions.AutoExecute {
				header.Opts.All = append(header.Opts.All, &Opt{
					Name:   "X",
					Help:   "Do not execute command",
					Action: SimpleSetEnvAction("EXECUTE", 0),
				})
			} else {
				header.Opts.All = append(header.Opts.All, &Opt{
					Name:   "x",
					Help:   "Also execute the command",
					Action: SimpleSetEnvAction("EXECUTE", 1),
				})
			}
		}

		n, err := openAndWrite(header.ArchiveOptions.Filename, fileMD5)
		if err != nil {
			return ee.Wrap(err, "failed to read archive file to get md5")
		}
		dataSize += n
	}

	_ = dataSize

	header.MD5 = hex.EncodeToString(fileMD5.Sum(nil))

	headers, err := getHeaders(header)
	if err != nil {
		return ee.Wrap(err, "failed to get headers")
	}

	f, err := os.OpenFile(saveTo, os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0755)
	if err != nil {
		return ee.Wrap(err, "failed to write file")
	}
	defer f.Close()

	// write header
	headersLen, err := f.Write(headers)
	if err != nil {
		return ee.Wrap(err, "failed to write headers")
	}

	currentOffset := int64(headersLen)

	//  embed archive
	if header.ArchiveOptions != nil {
		n, err := fillAndSetHeader("ARCHIVE", header.ArchiveOptions.Filename, f, headers, currentOffset)
		if err != nil {
			return ee.Wrap(err, "failed to embed installer")
		}
		currentOffset += n
	}

	_ = currentOffset

	// rewrite headers
	_, err = f.Seek(0, 0)
	if err != nil {
		return ee.Wrap(err, "failed to seek file")
	}
	newHeadersLen, err := f.Write(headers)
	if err != nil {
		return ee.Wrap(err, "failed to rewrite headers")
	}
	if headersLen != newHeadersLen {
		return ee.New("headers unexpected change after rewrite")
	}

	return nil
}

Usage is as follows:

b := &Builder{
    Name: name,
    ArchiveOptions: &binbundler.ArchiveOptions{
        DefaultPrefix: "/path/to/extract",
        AutoExtract:   true,
        AutoExecute:   true,
        Command:       "bash $PREFIX/install.sh", // Installation command, simple can be executed directly, complex can use an additional script
        Filename:      "/path/to/embed",
    },
}
err = b.Build("/path/to/script-save-to.sh")

In the entire script, relevant template variables are dynamically inserted, and related offsets are calculated.

Postscript#

This article mainly provides an idea (using dd to extract, dynamically generating opts to control the execution process), which is more efficient and easier to maintain compared to more common methods online that use grep and other means to locate binary content.

On this basis, even more can be achieved (dependency verification, installing multiple files, etc.), and you are welcome to try.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.