Conventions

Getting Started

Conventions and development guidelines

Committing

sqlfmt

We use sqlfmt as an SQL formatter. Whether we love it or not, we have to use it, so everybody's code will look the same, making collaboration easier. Some linting rules might be controversial, but at least they are consistent and explicit.

Before committing, make sure to run the following command

sqlfmt .

... You might want to rerun a dbt build after formatting everything. sqlfmt shouldn't break anything, but we are better safe than sorry.

Commit message

Commit messages use the Commitizen convention.

Please make sure all of your commit messages start with a type. The following types are available:

feat : A new feature
fix : A bug fix
docs : Documentation only changes
style : Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
refactor : A code change that neither fixes a bug nor adds a feature
perf : A code change that improves performance
test : Adding missing or correcting existing tests
chore : Changes to the build process or auxiliary tools and libraries such as documentation generation
revert : Reverts a previous commit

Folder structure and convention

All the SQL / Python code lives in the models folder.
All dashboards and reports live in the reporting folder.

About the models folder

The models folder is organized as follows:

.
└── models/
    ├── interfaces/
    │   └── database_spame/
    │       └── mart_foobar
    ├── marts/
    │   └── mart_foobar
    └── dashboards/
        ├── spam/
        │   ├── features/
        │   │   └── fact_absences.sql
        │   └── pbi_tables/
        │      └── fact_absences.sql
        └── egg/
            ...

Where :

dashboards: each subfolder of models/dashboards should be named after the corresponding dashboard it belongs to (one dashboard, one folder containing its SQL code).
interfaces: contains mappings to the interface tables. Each interface table can be overridden to add custom connection logic to the underlying database.
marts: contains the marts. A mart is a collection of tables reused/shared across dashboards.

Integration test and the nightly build

The nightly build is an automated check on the repo happening at the end of each day.

When introducing a new NON-OPTIONAL seed in the core.dashboards_store repo, you must add it into core.dashboards_store/nightly/dbt/seeds folder, so the next nightly build won't fail because of a missing seed. The seed must be populated with data from CSSVDC, as the CSSVDC is used as a target database for the integration tests. You can also run the integration suite locally. Please, refer to the core.dashboards_store/nightly/README.md file for more details.

Marts

`educ_serv`

This mart gathers all the data related to the education service.

Populations

Populations are sets of students used as a filter by various dashboards. You can refer to the analyses/marts/educ_serv/staging/populations folder and use the population template to build/define your populations.

The following populations are mandatory (cf adapters) and should be defined :

stg_ele_prescolaire
stg_ele_primaire_reg
stg_ele_primaire_adapt
stg_ele_secondaire_reg
stg_ele_secondaire_adapt

The integrator can add new populations by overriding the custom_fgj_populations.sql model. To do so:

Create a new file in cssXX.dashboards_store/models/marts/educ_serv/staging/populations named custom_fgj_populations.sql
Your custom_fgj_populations model should be implemented as a union of your own custom populations.
Disable the core's placeholder in the cssXX.dashboards_store:

# cssXX.dashboards_store/dbt_project.yml

models:
  core_dashboards_store:
    marts:
      educ_serv:
        staging:
          populations:
            custom_fgj_populations:
              +enabled: false

Developers: when creating a new dashboard using the population mechanism, you must register its tag in the marts/educ_serv/adapters.yml file so it triggers the population computation.

Human resources

This mart gathers all the data related to the human resources department.

Populating the marts seed

This dashboard requires the specification of the seeds in the human_resources mart.

The seed must be populated in cssXX.dashboards_store/seeds/marts/human_resources/ and as per the definition of the core.dashboards_store/seeds/marts/human_resources/schema.yml mart.

Please refer to the core.dashboards_store/seeds/marts/human_resources/schema.yml mart documentation to get the concrete implementation.

Do not forget to refresh your seeds with the dbt seed --select tag:human_resources --full-refresh command.

Exposing the freshness of the data into the dashboard

The core provides a mechanism to expose the freshness of the data in the dashboard. This mechanism is called the stamper and can be enabled and used through macros.

Enabling the `stamper`

Must be done ONCE in your cssXX.dashboards_store/dbt_project.yml.

The stamper is a table collecting metadata about your ETL runs. To enable data collection, you first enable it in your dbt_project.yml by adding the following two hooks:

# cssXX.dashboards_store/dbt_project.yml
# Hooks
on-run-start:
  - "{{ core_dashboards_store.init_metadata_table() }}"
on-run-end:
  - "{{ core_dashboards_store.purge_metadata_table() }}"

`stamping` my new dashboard

A good practice is to only stamp the reporting tables.

Only successful runs will be stamped. This means that taking the MIN(run_ended_at) will give you the last time your ETL run was successful. This is the worst-case freshness scenario.

To add a stamp to your dashboard, you can either:

Add the following post_hook into your model :

# model.sql
{{ config(
    post_hook='{{ core_dashboards_store.stamp_model("my_dashboard") }}',
) }}

Stamp multiple models at once by adding the hook directly into the core/dbt_project

models:
  core_dashboards_store:
    dashboards:
      my_dashboard:
        +tags: ["my_dashboard"]
        +schema: dashboard_my_dashboard
        pbi_tables:
          +post_hook: ["{{ core_dashboards_store.stamp_model('my_dashboard') }}"]

The second option is preferred if all of your report models are under a common folder.

Using the `stamper` in your dashboard

In Power BI, you can easily fetch the last run of your ETL by filtering on the argument provided to the stamp_model macro.

Variable conventions

Don't use spaces in your variable names
Please, stick to a snake_case naming convention
- Use the interface tables to set the right names for the variables.
Reserved keywords should be written in caps.

dbt_project.yml conventions

The dbt_project.yml has to be updated every time a new dashboard is added to core.dashboards_store.

Each dashboard (i.e. each subfolder of the models folder) should be given a tag and a schema. The following example shows the minimal lines to add to the dbt_project.yml file to add dummy_dashboard to core.dashboards_store. The rationale behind this convention is to ease filtering information in either the database or the documentation.

models: # Already here, for reference only
  core_dashboards_store: # Already here, for reference only
    +enabled: False # Already here, for reference only
    dummy:
      +tags: ["dummy"]
      +schema: "dummy"

Naming conventions

Use snake_case to name your variables.

Table conventions

Use snake_case naming conventions.
Use a prefix to indicate the high-level objective of the table. The following table prefixing conventions should be used.

Table type	Description	Prefix	Example
fact	Contains tables of facts	fact_	fact_eleve
dimension	A map between an arbitrary ID and a friendly name	dim_	dim_subject_category
bridge	A mapping of primary-ish keys between systems	bridge_	NA
base	A base table is a skeleton table used to build fact tables	base_	NA
staging	A staging table is a by-product of the construction of a fact table. The table kind of acts as a fact table, but is not queried by itself. Staging tables are generally combined together or joined on a base table to create a fact table	stg_	stg_droppers_raw
interface	The almighty. Interfaces are tables mapping to the raw data from the operational system. It's basically a select clause followed by a list of the fields used in the downstream tasks. Those tables can be overridden in the inherited package to map the CSS requirements. Please, only add the columns you need.	target_	target_perseverance
reporting	Reporting table. Used as an eye-catcher to easily detect the tables we need to plug the dashboard on.	rprt_	rprt_emp65_ann_bdgt

How to

Conflicting table names between projects

Some table names are quite generic (spine, dim_school) and can be used in various contexts without referring to the same underlying table. To avoid confusion, please use the following pattern to disambiguate the tables.

Example

Let's consider the following two dashboards: employees_absences and dummy.

.
└── core.store/
    ├── reporting/
    │   └── employees_absences.pbit
    └── models/
        ├── employees_absences/
        │   └── fact_absences.sql
        └── dummy/
            └── fact_absences.sql

models:
  +employees_absences:
    +schema: "employees_absences"
    +tags: ["employees_absences"]
  +dummy:
    +schema: "dummy"
    +tags: ["dummy"]

The two dashboards are using a table named fact_absences but the SQL code is not the same, so I need both of these tables. Unfortunately, dbt will raise an error as each name has to be unique.

An easy fix is to rename one of the fact_absences tables. Let's say we rename models/dummy/fact_absences to dummy_fact_absences. dbt will now be able to compile the code and any downstream task of the dummy project could refer to the table using {{ ref('dummy_fact_absence') }}.

The issue with this fix is that dbt will output a dummy_fact_absences table in the dummy schema. This is redundant, as the conflict happens between schemas, and not within the dummy schema. Fortunately, we can override the output name in the SQL code by adding the following line into dummy_fact_absences.sql.

{{ config(alias='fact_absences') }}

dbt error message

In such a case, dbt would output an error message similar to the following one.

Compilation Error
  dbt found two models with the name "<foobar>".

  Since these resources have the same name, dbt will be unable to find the correct resource
  when looking for ref("foobar").

  To fix this, change the name of one of these resources:
  - model.core_dashboards_store.removeme (models/prospectif_cdp/features/foobar.sql)
  - model.core_dashboards_store.removeme (models/emp_conge/feature/foobar.sql)

Pattern

The generic pattern to solve the between-project-conflicting-tables-names issue is the following:

Prefix your table with the (unique) friendly name of your dashboard.
- The friendly name should be short. Maybe 3-to-10 letters. dummy could become dmy
Add a dbt directive into your table code to output the table under its original name by setting the alias property

Edit this pageorReport an issue

Getting Started

TODO

Components

Discover every component you can use in your content.