Configuring the Adapters and Seeds

Seeds and adapters are contracts defined by the Core and implememented by the cssXX.dasboards_store project. If you don't implement them, the ETL will fail. Please refer to the Marts documentation to know which seeds and adapters are required by each one.

Both the human_resources and the education_services marts use adapters and seeds to configure the ETL. As explained in the "Architecture" page, both are used to provided Schoolboards-specific business rules to the ETL. Seeds are csv files used to create dimensions later used to filter or map data. Adapters are SQL files with arbitrary SQL used to create dimensions or facts. We tend to prefer seeds over adapters as they are easier to maintain and to understand. But sometimes, you just need to write some SQL like it's the 80's again.

Seeds

Implementing a seed

The Core provide you with the seed's definition. It's up to you to populate-it. The seed definition is located in the core.dashboards_store/seeds/**/*/schema.yml project, in the data folder.

To implement a seed, you have to create a csv, with columns as per the seed's definition. The csv file must be located in the cssXX.dashboards_store/seeds/**/*/data folder. The csv file must be named as per the seed's definition. The definition file gives a lot of information about the expected types of the seed's column, as well as a description of each field to help you populating it. Seeds schould be put under the same path as the one used by the seeds's definition yaml.

Once populated, you can load in the database and test the seed with the command :

dbt build --full-refresh --select <the name of the seed>

Exemple :

Let's say I want to implement a seed named exemple_seed and defined by core.dashboards_store/seeds/marts/exemple/schema.yml. The content of the schema.yml file is :

seeds:
  - name: exemple_seed
    description: >
      This is the exemples seed. 
      This seed is a completely useless seed. It consists of a set any integer you can think of.
    # The fields' description
    columns:
      - name: seq_value
        description: The random integer
        tests:
          - not_null  # Null values are not allowed
          - unique  # The integer must be unique
    # The seed's data type
    config:
      column_types:
        value: int

I first create an empty csv and save-it under cssXX.dashboards_store/seeds/marts/exemple/exemple_seed.csv. Note that the seed is saved in the cssXX project folder, not in the core project. The name of the file is the one indicated in the second line of the seed's definition (exemple_seed). From the seed definition, I know that my implementation schould have only one column named seq_value filled with a set of unique, not null, integers. I can now populate the csv file with the following content :

seq_value
1
2
3
4
6
7
9

I then load, and test, my seed in the database. A table will be automatically created and populated with the content of the csv file. The table will be named exemple_seed. To do so, I just enter the following command :

dbt build --full-refresh --select exemple_seed

Adatpters

Mandatory adapters

Theese adapters must be populated for the ETL to work.

The next table show the required adapters's definition and the path where the sql schould be put.

Implementing an adapter

The Core can only provide you with the adapter's definition only. The concrete implementation is usually quite specific to your schoolboard.

Adatpers are described in the adapters.yml files. Thoose files are located in the core.dashboards_store/adapters/**/*/adapters.yml project.

To implement an adapter, you have to create a sql file, with arbitrary SQL. The sql file schould be saved under in cssXX.dashboards_store/models/**/*/<adapter_name>.sql folder. The exact path file's path schould be the one from the definition yml suffixed by the name of the adapter. The sql file must be named as per the adapter's definition.

The definition file gives a lot of information about the expected types of the seed's column, as well as a description of each field to help you populating it. Please refer to the definition file to get the column specification and description.

We try to add a dbt analysis as an exemple of how to implement an adapter. When available, the analysis is saved under core.dashboards_store/analysis/**/*/<adapter_name>.sql where **/*/<adapter_name>.sql is the path from the adapter's definition file. The analysis is named as per the adapter's definition.

Exemple :

Let's say I have to implement the stg_ele_prescolaiore adapter. This adapter is defined in core.dashboard_store/models/marts/education_services/adapters.yml. The adapter's definition is :

# core.dashboards_store/models/marts/education_services/adapters.yml
sources:
  - name: populations
    description: >
      The populations identify groups of students with consistent profiles, such as regular students, students in special education or students in general adult education.
    schema: "{{ target.schema }}_educ_serv_staging"
    tables:
      - name: stg_ele_prescolaire
        description: >
          Identification of the population in 4-year-old kindergarten / passe partout / 5-year-old kindergarten.
        columns:
          - name: code_perm
            description: The specific unique identifier for a student
          - name: id_eco
            description: The specific unique identifier for a school by year
          - name: annee
            description: The years the student was active
models:
  - name: stg_ele_prescolaire
    config:
      schema: "educ_serv_staging"
      <<: *x-common-tags
    description: >
      Identification of the population in 4-year-old kindergarten / passe partout / 5-year-old kindergarten.
    tests:
      - resolution:
          combination_of_columns:
            - code_perm
            - id_eco
            - annee
  1. I first create a file under cssXX.dashboards_store/models/marts/education_services/stg_ele_prescolaire.sql. Note that the file is created in cssXX with the same name as the one in the definition file.
  2. I then check for the definition :
    1. From the definition (source key), I know that the adapter schould have the three columns : code_perm, id_eco, annee
    2. From the definition (models key), I know the table schould pass a resolution test on the three columns, meaning that my table schouln't have any duplicate on the three columns.
  3. I then check under the core.dashboards_store/analysis/marts/education_services/stg_ele_prescolaire.sql file to see if there's an analysis available. In this case, there's one. I copy-paste the analysis in my sql file, to bootstrap the adapter's implementation.
-- cssXX.dashboards_store/models/marts/education_services/stg_ele_prescolaire.sql
select distinct ele.code_perm, eco.id_eco, eco.annee
from {{ ref("i_gpm_e_dan") }} as eledan
left join {{ ref("i_gpm_t_eco") }} as eco on eledan.id_eco = eco.id_eco
left join
    {{ ref("i_gpm_e_ele") }} as ele on eledan.fiche = ele.fiche
    /*WHERE 
    eledan.statut_don_an = 'A' AND (
        (
            eledan.ordre_ens = '1'
            AND eledan.grp_rep IN ('MA4','MA5','M41','M42')
        )
        OR (
            eledan.ordre_ens = '2'
            AND (eledan.grp_rep NOT LIKE '9%' OR eledan.grp_rep IS NULL)
            )
    )*/
  1. I then fine tune (or completely rewrite) the code I have just copy past to match my Schoolboard's reality.