MediaGoblin in Docker

Since version 0.14.0, Mediagoblin natively supports Docker. We push release versions of Mediagoblin as Docker images to the project’s Docker Hub account. This makes it easy for anyone to spin up a new service in Docker with no more prerequisite than the Docker runtime itself.

You can start a single standalone container using the official images published to Docker hub. For real deployments, it is however recommended to deploy a multi-container stack using, e.g., Docker Compose.

This page documents how to do either of those things.

Quickstart

A standalone container in charge of both serving and processing content can simply be started with

docker run --interactive --tty \
  --publish=6543:6543 --volume=/PATH/TO/YOUR/DATA:/srv \
  mediagoblin/mediagoblin:0.14.0.dev

This will download the official image from Docker Hub, and create a container running Mediagoblin. It will be accessible at http://localhost:6543.

The --publish option (or -p for short) makes the container’s port 6543 available to the host.

The --volume option (-v for short) mount a path from local filesystem (/PATH/TO/YOUR/DATA, in this example) into the container. This is where Mediagoblin will store all its data. It can be empty initially, or have been previously initialised.

The --interactive --tty (or -it) options are not strictly needed, but they should allow you to terminate the running process by sending it a Ctrl+C, rather than having to use docker kill.

Note

See further down in this section for more details on data persistence.

On first run of the container, the administrator’s password will be autogenerated, and shown (once, and only once) in the log output.

===============================================================================
NEW ADMINISTRATOR ACCOUNT CREATED

ADMIN_USER=admin
ADMIN_PASSWORD=<AUTOGENERATED PASSWORD>
ADMIN_EMAIL=admin@example.com

===============================================================================

Note

See further down in this section to learn how to choose or change the admin’s username, password or email.

First run

If all goes well, you should see the following output on first run.

usermod: no changes
Creating missing configuration file paste.ini ...
Creating missing configuration file mediagoblin.ini ...
Creating empty database mediagoblin.db ...
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 52bf0ccbedc1, initial revision
INFO  [alembic.runtime.migration] Running upgrade 52bf0ccbedc1 -> a98c1a320e88, Image media type initial migration
INFO  [alembic.runtime.migration] Running upgrade 52bf0ccbedc1 -> 101510e3a713, #5382 Removes graveyard items from collections
INFO  [alembic.runtime.migration] Running upgrade 101510e3a713 -> 8429e33fdf7, Remove the Graveyard objects from CommentNotification objects
INFO  [alembic.runtime.migration] Running upgrade 8429e33fdf7 -> 4066b9f8b84a, use_comment_link_ids_notifications
INFO  [alembic.runtime.migration] Running upgrade 4066b9f8b84a -> 3145accb8fe3, remove tombstone comment wrappers
INFO  [alembic.runtime.migration] Running upgrade 3145accb8fe3 -> 228916769bd2, ensure Report.object_id is nullable
INFO  [alembic.runtime.migration] Running upgrade 228916769bd2 -> cc3651803714, add main transcoding progress column to MediaEntry
INFO  [alembic.runtime.migration] Running upgrade 228916769bd2 -> afd3d1da5e29, Subtitle plugin initial migration
Laying foundations for __main__:
   + Laying foundations for Privilege table
Cannot link theme... no theme set
Linked asset directory for plugin "coreplugin_basic_auth":
  /opt/mediagoblin/lib/python3.11/site-packages/mediagoblin/plugins/basic_auth/static
to:
  /srv/user_dev/plugin_static/coreplugin_basic_auth
Creating admin user ...
User created (and email marked as verified).
The user admin is now an admin.

===============================================================================
NEW ADMINISTRATOR ACCOUNT CREATED

ADMIN_USER=admin
ADMIN_PASSWORD=<AUTOGENERATED PASSWORD>
ADMIN_EMAIL=admin@example.com

===============================================================================
Running /opt/mediagoblin/lazyserver.sh -c ./paste.ini --server-name=broadcast ...
Using paster config: ./paste.ini
Using paster from $PATH
+ export CELERY_ALWAYS_EAGER=true
+ paster serve ./paste.ini --server-name=broadcast --reload
Starting subprocess with file monitor
2024-07-14 08:09:30,760 INFO    [mediagoblin.app] GNU MediaGoblin 0.14.0.dev main server starting
2024-07-14 08:09:31,054 INFO    [mediagoblin.app] Setting up plugins.
2024-07-14 08:09:31,054 INFO    [mediagoblin.init.plugins] Importing plugin module: mediagoblin.plugins.geolocation
2024-07-14 08:09:31,054 INFO    [mediagoblin.init.plugins] Importing plugin module: mediagoblin.plugins.processing_info
2024-07-14 08:09:31,054 INFO    [mediagoblin.init.plugins] Importing plugin module: mediagoblin.plugins.basic_auth
2024-07-14 08:09:31,054 INFO    [mediagoblin.init.plugins] Importing plugin module: mediagoblin.media_types.image
2024-07-14 08:09:31,114 INFO    [mediagoblin.init.celery] Setting celery configuration from object "mediagoblin.init.celery.dummy_settings_module"
Starting server in PID 58.
2024-07-14 08:09:31,122 INFO    [waitress] Serving on http://0.0.0.0:6543

It will be terser on subsequent runs, because configuration and databases already exist, and data migrations aren’t necessary (unless upgrading to a new version of the container).

You can confirm that the container is running happily with the docker ps command, which will show the running containers, ports and health status (if configured).

CONTAINER ID   IMAGE                                          COMMAND                  CREATED          STATUS                    PORTS                                       NAMES
541710f616d5   mediagoblin/mediagoblin:0.14.0.dev                         "/opt/mediagoblin/en…"   37 seconds ago   Up 36 seconds (healthy)   0.0.0.0:6543->6543/tcp, :::6543->6543/tcp   vibrant_germain

At this point, you should be able to point your browser to http://locahost:6543 and be greeted by the Mediagoblin landing page.

Data persistence

Data in a Docker image is read-only. Any change in a live container remains until the container is destroyed, and is lost thereafter. This includes all changes made in /srv, where de Mediagoblin data resides. This is obviously not desirable for media storage.

Docker has support for various types of storage mechanisms for this purpose. We saw in the previous section how to start the container in such a way that a local path is bind-mounted onto /srv.

-v /PATH/TO/YOUR/DATA:/srv

This means that any data written by the containers will be written to /PATH/TO/YOUR/DATA in the host filesystem. As it is outside of Docker’s control, this data will persist even if the Mediagoblin container is destroyed.

A new container instance can then be restarted with the same bind-mount volume option. It will resume serving the data transparently. This is useful for backups, as well as as an upgrade path between subsequent versions of Mediagoblin without losing data.

Starting with an empty data directory, the container will create the configuration and the database on first run. You can confirm it with ls /PATH/TO/YOUR/DATA outside of the container.

$ ls /PATH/TO/YOUR/DATA
mediagoblin  mediagoblin.db  mediagoblin.ini  paste.ini  user_dev

You can also make manual changes to the data if needed.

Warning

The argument to the –volume option must be an absolute path, otherwise it will be interpreted as the name of a Docker volume.

Using a Docker volume is another way to ensure data persistence across container recreation. Rather than writing data out into the specified host filesystem, Docker will manage the volume (volume-name, in the following example) internally.

-v volume-name:/srv

While this offers the same data persistence benefits, management of the data should be done with the docker volume command. Moreover, it may not be as straightforward to access and back-up without more Docker knowledge. Using this approach is therefore only recommended to users already familiar with it.

Administrator account

A default administrator account is created by the entrypoint script. The login is admin, and the password is automatically generated if unspecified. The details of the admin account are output in the logs the very first time a new instance is initialised.

You can override both those values on first run, by passing overrides via the environment.

docker run --p 6543 -v /PATH/TO/YOUR/DATA:/srv \
   mediagoblin/mediagoblin:0.14.0.dev \
   -e ADMIN_USER=myadmin -e ADMIN_PASSWORD=generateme \
   mediagoblin/mediagoblin:0.14.0.dev

Note

If the ADMIN_PASSWORD is set to generateme (the default), it will be auto-generated on first run, i.e., when no database exists in the data directory yet. The generated password will be output, once, in the startup logs.

Alternatively, you can change the current admin password after at anytime by using the gmg tool.

docker run --p 6543 -v /PATH/TO/YOUR/DATA:/srv \
   mediagoblin/mediagoblin:0.14.0.dev \
   gmg changepw admin <GOOD STRONG PASSWORD>

You can, of course, use gmg in this way for any other task you would generally perform in non-containerised environments.

Configuring plugins

By default, no plugin is enabled in the example configuration file. As for non-containerised deployments of Mediagoblin, plugins can be enabled by adding relevant sections to the mediagoblin.ini configuation file.

However, plugins can be preconfigured when a new containerised environment is initialised, by passing a snippet of configuration file, with embedded newlines, for the [plugins] section via the PLUGINS environment variable.

docker run --interactive --tty \
  --p 6543 -v /PATH/TO/YOUR/DATA:/srv \
  -e PLUGINS='[[mediagoblin.media_types.audio]]n[[mediagoblin.media_types.video]]navailable_resolutions = 144p,240pn'
  mediagoblin/mediagoblin:0.14.0.dev

This mechanism is only active on first initialisation of an empty data directory. It can however be forced by setting the FORCE_RECONFIG environment variable to true .

... -e FORCE_RECONFIG=true ...

Warning

Force-reconfiguration has not been thoroughly tested, and may not behave flawlessly.

Docker Compose stack

Docker Compose allows to encode more details about how to run a container, such as volumes, ports and environments variables. This is done via configuration file instead of the command line. It also allows spinning up more that one container at a time, and setting up the necessary network environment so they can communicate with each other.

Multiple configurations files can be used at the same time, to selectively configure or various aspect of the desired stack. Mediagoblin takes this approach, in providing a basic docker-compose.yml, which contains shared options.

Note

Historically, docker-compose was a command separate to docker itself, but functionality has now been merged and extended. This guide therefore uses the docker compose subcommand.

Standalone service

Prior to delving into multi-container stacks, you can have a look at the standalone docker-compose.standalone.yml which does very little more than the docker commands in the previous section. There are however two noteworthy differences.

version: '3'

services:
  lazyserver:
    image: mediagoblin/mediagoblin:0.14.0.dev
    ports:
      - "6543:6543"
    healthcheck:
      test: [ "CMD", "curl", "-sf", "http://localhost:6543" ]
      timeout: 30s
      interval: 10s
      retries: 5
    volumes:
      - mediagoblin-data:/srv:rw
    env_file: docker-compose.env

volumes:
  mediagoblin-data:
    driver_opts:
      # Present the local ./data directory as a named volume
      type: none
      o: bind
      device: ${PWD}/data

First, in the volumes section, a named docker volume, mediagoblin-data is created for /srv. As discussed before, the volume will be reused every time a stack is brought up. At the end of the file, in the volumes section, additional parameters are provided so the mediagoblin-data volume is actually mapped to a bind mount. It is configured to use the data subdirectory of the current path where the stack was started.

Second, it uses an env_file, which allows to conveniently pass a number of environment variables to the container. Those can include the parameters for of the ADMIN_PASSWORD, or PLUGINS, as discussed previously.

These changes will be carried over through the next few sections.

Note

docker compose uses file docker-compose.yml by default, which we’ll discuss later. To use the standalone variation, the -f option can be used.

docker compose -f docker-compose.standalone.yml up

Note

By default, docker will keep hold of the terminal, and output logs from the application. To regain use of the terminal, you can add the -d flag at the end of this command. To see the logs, you can then use docker compose logs -f.

As before, this will make the Mediagoblin instance available at http://localhost:6543/. You can log in as the admin, and upload a file before moving on to the next section.

You can shut the container down with

docker compose -f docker-compose.standalone.yml down

Multi-container stack

The previous section was a light introduction into docker-compose.yml files, but didn’t achieve much. We can now move on to defining more than one service in the stack: separate Paste and Celery containers, with a side of RabbitMQ and Nginx.

The basic docker-compose.yml file does just that.

version: '3'

services:
  paste:
    image: mediagoblin/mediagoblin:0.14.0.dev
    # build:
    #   context: .
    #   dockerfile: Dockerfile-debian-12-sqlite
    #   args:
    #     build_doc: no
    #     run_tests: no
    depends_on:
      rabbitmq:
        condition: service_started
    volumes:
      - mediagoblin-data:/srv:rw
    ports:
      - "6543:6543"
    env_file: docker-compose.env
    environment:
      - 'CELERY_ALWAYS_EAGER=false'
      - 'BROKER_URL=amqp://rabbitmq:5672'

    # XXX need host = 0.0.0.0 for server:main, or a way to select
    # server-name=broadcast
    # command: /opt/mediagoblin/bin/gmg -cf /srv/mediagoblin.ini serve /srv/paste.ini
    command: /opt/mediagoblin/bin/paster serve /srv/paste.ini --server-name=broadcast
    healthcheck:
      test: [ "CMD", "curl", "-sf", "http://localhost:6543" ]

  celery:
    image: mediagoblin/mediagoblin:0.14.0.dev
    depends_on:
      rabbitmq:
        condition: service_healthy
      paste:
        condition: service_started
    volumes:
      - mediagoblin-data:/srv:rw
    env_file: docker-compose.env
    environment:
      - 'CELERY_CONFIG_MODULE=mediagoblin.init.celery.from_celery'
      - 'MEDIAGOBLIN_CONFIG=/srv/mediagoblin.ini'
      - 'SKIP_MIGRATE=true'
      - 'BROKER_URL=amqp://rabbitmq:5672'
    # command: /opt/mediagoblin/bin/gmg -cf /srv/mediagoblin.ini celery
    command: /opt/mediagoblin/bin/celery worker
    healthcheck:
      test: [ "CMD", "/opt/mediagoblin/bin/celery", "inspect", "ping" ]

  rabbitmq:
    image: rabbitmq
    expose:
      - "5672"
    healthcheck:
      test: [ "CMD", "rabbitmq-diagnostics", "-q", "ping" ]

volumes:
  mediagoblin-data:
    driver_opts:
      # Present the local ./data directory as a named volume
      type: none
      o: bind
      device: ${PWD}/data

It is fairly similar to the standalone setup, except it defines all three services. Both paste and celery are essentially the same, except for the command that is executed. Some additional environment variables are set in the environment section, most notably where to find RabbitMQ. The healthcheck of the Celery container is also adjusted to remain useful.

One last service is started, based on the official RabbitMQ images, to support communication between both containers, and some start-up order rules are defined via the depends_on sections.

As this configuration is in the default docker-compose.yml file, starting the stack up is fairly straight forward.

docker compose up

As before, this stack uses the mediagoblin-data named volume, which is mounted in both Paste and Celery containers. If you started a fresh lazyserver before, and uploaded some test data, you should still be able to access it now.

Nginx reverse-proxy

When running a non-test instance, it is not recommended to expose the application straight to the public internet. Instead, it is good practice to put a reverse-proxy in between, to handle the fine details of the HTTP protocol. Nginx tends to be a good choice.

As discussed in the deployment documentation, the Nginx configuration needs to be adjusted to best work with Mediagoblin. For ease of use, we build and publish a pre-configured Nginx image to Docker hub alongside the Mediagoblin one.

You can extend your Compose stack from docker-compose.yml by also including the Nginx service defined in docker-compose.nginx.yml.

docker compose -f docker-compose.yml -f docker-compose.nginx.yml up

For simplicity in your own deployment, you can include all services in a single file.

Note

As the nginx container is added via an override, the paste container continues to expose it own port to the rest of the system.

Third-party cloud providers

Containerisation of Mediagoblin offers a new way to run the service on third-party hosts. However, as the containerisation of Mediagoblin is still very recent, we haven’t explored the various cloud providers and deployments methods.

If you’ve had success with this type of deployment, please consider contributing your experience to the documentation!

Dockerised Build

It is possible (and perhaps even preferred) to build Mediagoblin within a container.

This will create a Docker image suitable to run on its own (using lazyserver), or as part of a Docker Compose stack with separate containers for Paste, Celery, and RabbitMQ, as well as the optional pre-configured Nginx.

Core container

Unlike a local build, the only dependency required by a Docker build is the docker tool itself. When present, the configure script will prefer this approach (unless --without-docker is explicitely passed).

The steps to perform a build nonetheless follow the familiar incantation.

./configure && make

This will create a build stage with the necessary build dependencies, such as bower and -dev packages, create a final image containing the built package, and run the tests within a container started from that image.

The name of the image will be mediagoblin/mediagoblin:<VERSION>, where <VERSION> is set from the configure.ac e.g., mediagoblin/mediagoblin:0.14.0.dev.

When building this way, the dependencies for most plugins (media types and core plugins) are included. Two notable exceptions are support of Documents (but not PDFs), and STL files. Their dependencies (unoconv and blender, respectively) were deemed too large to include by default.

While the make-based build is the simplest, it is possible to build custom containers, with a preferred set of dependencies, directly with docker build .. Detailing this process is beyond the scope of this chapter. However you can have a look at the Dockerfile to see what build arguments (ARG, configurable via --build-arg), are supported.

Python wheel and documentation

It is also possible to build the Python Wheel and the docs out of the image, with

make dist
# and
make docs

respectively.

Note

While the wheel is getting built successfully, it is still a work in progress, and it has not been tested yet.

Preconfigured Nginx image

As part of the Docker-based build process, a dedicated Dockerfile.nginx is also created. This allows us to build the pre-configured Nginx Docker image which gets pushed to Docker hub.

docker build -f Dockerfile.nginx . -t mediagoblin/nginx:0.14.0.dev