Superset 101

What is Superset?

Apache Superset is an open-source, enterprise-ready business intelligence and data exploration platform. Originally developed at Airbnb, it was later donated to the Apache Software Foundation. Superset provides a feature-rich set of data visualization tools, an interface for SQL queries, and advanced features for data analytics and statistics.

Why Superset?

Superset unlike other BI tools, comes with a truly overwhelming level of customization. A similar tool like Metabase provides everything needed for data intelligence out of the box in a user friendly package, Superset gives that and everything needed for hyper optimized, use case specific setups.

This of course comes with a couple of drawbacks, chief of which is the complexity in the setup.

How to set up?

Local set up is not as easy as I would like.

What is required for a very minimal setup is a docker-compose file and a setup-script.

The following is the content of the docker-compose file.

version: '3'
services:
  postgres:
	image: postgres:15
	environment:
	  POSTGRES_USER: myuser
	  POSTGRES_PASSWORD: mypassword
	  POSTGRES_DB: mydatabase
	ports:
	  - "5432:5432"
	volumes:
	  - ./postgres_data:/var/lib/postgresql/data
	  - ./init:/docker-entrypoint-initdb.d
	  - ./data:/data
	
	healthcheck:
	  test: ["CMD-SHELL", "pg_isready -U myuser -d mydatabase"]
	  interval: 10s
	  timeout: 5s
	  retries: 5
	  
  superset:
    init: true
    build:
      context: ./superset
      dockerfile: Dockerfile
      container_name: superset
    volumes:
      - ./superset_data:/app/superset_home
    environment:
      - DATABASE_DB=superset
      - DATABASE_HOST=db
      - DATABASE_PASSWORD=superset
      - DATABASE_USER=superset
      - DATABASE_PORT=5432
    ports:
      - '8080:8088'

  setup:
    image: docker:dind
    container_name: superset_setup
    privileged: true
    depends_on:
      superset:
        condition: service_healthy
    volumes:
      - ./setup.sh:/setup.sh
      - /var/run/docker.sock:/var/run/docker.sock
    command: sh -c "chmod +x /setup.sh && /setup.sh"

volumes:
  postgres_data:
  superset_data:

With this we get an instance of Superset with a local volume for persistence.

This is not a simple setup, so let's walk through it.

The Postgres instance contains:

The most interesting part here is the volumes.

The /var/lib/postgresql/data volume contains the database, this allows the db to be stored outside of the container, allowing it to persist between docker runs.

The ./init:/docker-entrypoint-initdb.d volume is linked to a local init folder, which contains a series of sql files. These run on init, allowing us to add stuff like tables and users automatically. You can see the exact files here.

The ./data:/data volume is also connected to a local data folder, this is where we can store data files, like csv files. These can be used by the init scripts to automatically load data into the database.

For the purposes of this example I used some public domain datasets from kaggle, specifically the MoMa collection inventory.

The Superset instance contains:

The build step is needed because the Superset images isn't ready to run on it's own, it requires a bit of tweaking before we can run it locally. This is done by running a small script inside of the container before running it.

Place these locally in a folder named superset in the project root.

The dockerfile.

FROM apache/superset:3.0.2

USER root
COPY --chmod=777 superset_config.py /app/
ENV SUPERSET_CONFIG_PATH /app/superset_config.py
USER superset

ENTRYPOINT [ "/usr/bin/run-server.sh" ]

The python script, superset_config.py.

ENABLE_PROXY_FIX = True
SECRET_KEY = "MyVerySecretKey"
PREVENT_UNSAFE_DB_CONNECTIONS = False
TALISMAN_ENABLED = False

The setup container contains:

The docker-in-docker, a way to run scripts inside of the docker environment, not recommended for production deployments. You can read a more comprehensive explanation here.

The setup script is the following:

# Wait for Superset to be ready
sleep 10

docker exec superset superset fab create-admin \
--username admin \
--firstname Superset \
--lastname Admin \
--email admin@localhost.com \
--password secret

docker exec superset superset db upgrade &&
docker exec superset superset init

echo "Done"

After you copy the docker compose file, run:

docker compose up -d

And once it has finished starting, go to http://localhost:3000/.

How to connect to Database?

Let's start by connecting to the database. We only need to do this once, the connection will persist with the local setup.

We do this by clicking the "+" button on the top right corner and selecting "Data > Connect a database".
Connect to Database

Once the "Connect a database" popups select the database type, in this case PostgreSQL. After enter the credentials from the docker-compose file.
Database Connection Details

How to create Dashboard?

To create a dashboard, to do this we need to navigate to the dashboard tab and click the "+ Dashboard" button.
Create Dashboard

The new dashboard will looks like this.
New Dashboard Screen
Now that the dashboard exists we can fill it with charts. But before we can do that we need to create a dataset.

How to create a dataset?

Datasets are essentially views that get used as sources for analytics and visualizations.

To create a dataset navigate to the "Datasets" tab and click "+ Dataset".
Create Dataset

Once on the dataset creating page, enter the details for the 'artists' table.
Create Artists Dataset
Afterwards, press the "Create Dataset and Create Chart" button on the bottom right.

How to create charts?

Creating a chart is likewise pretty simple. Especially since the last section left us on the chart creating page.
Create New Chart

This first one will be super simple, we will only be looking at the total number of artists. For this we need the "Big Number" chart type and press "Create Chart".
Create Big Number Chart

The next screen is the Chart building screen, this is where we can edit every parameter. For this specific case we only need to chose the metric.
Big Number Chart Details

The just press 'Save' and choose to add it directly to the dashboard we created earlier.
Save Chart Screen

And there we have out first chart.
Dashboard with Big Number

Now let's add two more charts, gender ratio and nationality.

Gender Ratio

To do this just go to create a chart, select the artist table and select the "Pie Chart" chart type and input the following parameters.
Gender Ratio

Nationality

Nationality is going to be a bit more tricky, in the artists dataset the nationality is represented by a demonym, which can't be used to assign values to a map type chart. Luckily we have a demonyms table in the database we can use.
Once we create a national_demonyms dataset we can merge it with the artists dataset to create a new dataset.
To do this, first navigate to the "SQL Lab".
SQL Lab Tab

And then enter this query into the editor.
Custom Dataset

Once this is done, click "Save dataset" and you will be taken to the chart creating screen, here pick "World Map" chary type and enter the following parameters.
World Map Chart

How to share charts?

After following the previous steps you should be left with a dashboard that looks something like this.
Completed Dashboard

Wonderfull, isn't it.

Now we want to share out wonderful dashboard with our colleagues and customers. We can do that simply by using the share button.
Share permalink to dashboard
This will give up a link to a page where the dashboard can be viewed.

Unfortunately since we are running this on our local machine, only people on out local network will be able to access it. And to enable that you will need to find the local ip address and substitute it for the localhost in the share link.

You can get the local ip address with the following commands based on your OS:

# Windows 
ipconfig | findstr IPv4 

# Linux 
hostname -I 

# Mac 
ipconfig getifaddr en0

The result will looks something like this:

http://193.169.10.11:8080/public/dashboard/79974b67-273d-6723-816d-3640ff0dcffb

If you correctly substituted the ip address you should able to open the dashboard on your phone.

To enable it to be shared over the internet, you will need to deploy it to the cloud and set up the network to allow for sharing. But that is outside of the scope of this tutorial.