CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 22
Testing and Comparing Your Model
On this page:
SDEverywhere includes extensive QA (quality assurance) packages and tools that are collectively known as "model-check". The model-check tool can run as you develop your model, either locally on your machine or in the cloud in a continuous integration environment (or both).
With model-check, there are two kinds of tests:
-
Checks are objective tests of a model's behavior.
- These are "objective" in the sense that they always provide a yes/no or right/wrong answer.
- Check tests are good for verifying that a model conforms to some expectations or ground truths.
- They can help catch bugs and unintentional changes that might otherwise go undetected.
- Here are a few examples of useful checks defined for En-ROADS (but there are countless other examples that will vary from model to model):
- Stocks should never be negative
- The population variable values should be within +/- 5% of the historical population data for the years 1900-2025
- The population variable values should be between 8 billion and 12 billion for all defined input scenarios
- The temperature variable values should always be lower with input scenario X (e.g., with a carbon tax) than with input scenario Y (e.g., a baseline scenario with no carbon tax)
-
Comparisons are subjective tests of the behavior of two versions of the same model.
- These are "subjective" in the sense that they don't usually provide a right/wrong answer and are subject to interpretation by the modelers.
- Comparison tests are good for making sense of how a change to the model impacts the output values of that model under a wide variety of input scenarios.
- Comparison tests allow for exercising a model under many different scenarios in a short amount of time.
- The model-check report orders the results so that the most significant changes are at the top, and the results are color coded to help you see at a glance what outputs have changed the most compared to the base/reference/previous version of the model.
- Here are a few examples of useful comparisons defined for En-ROADS (and as with check tests, there are countless other examples depending on your model):
- Baseline scenario (all inputs at default)
- All inputs at their {min,max}imum values (all at once)
- All main sliders at the {min,max}imum values (all at once)
- Each individual input at its {min,max}imum value (while others are at default)
- Low, medium, and high carbon price (for testing values between "min" and "max")
- Fossil fuel phase out (multiple "reduce new infrastructure sliders" set together)
Both checks and comparisons are typically defined in text files in YAML format, though it is possible to define them in JSON format or in TypeScript/JavaScript code if needed.
YAML files are designed to be read and edited by a human, but note that indentation is significant, so you need to be careful. We recommend using VS Code to edit these files and installing the YAML extension. The YAML files that are provided in the Quick Start templates are set up with a reference to the schema at the top (which the YAML extension uses) so that you will get some syntax highlighting (and red squiggles to indicate when the syntax is incorrect).
If you follow the Quick Start instructions, the generated template will include sample checks.yaml and comparisons.yaml files to get you started. Refer to the Creating a Web Application page for an overview of where these files reside in the recommended project structure.
Read the following two subsections for more details on how to define checks and comparisons.
The following is an example of a group of check tests, taken from the SIR example project.
- describe: Population Variables
tests:
- it: should be between 0 and 10000 for all input scenarios
scenarios:
- preset: matrix
datasets:
- name: Infectious Population I
- name: Recovered Population R
- name: Susceptible Population S
predicates:
- gte: 0
lte: 10000
- The "describe" and "it" naming convention comes from unit testing frameworks in the software development world. This convention encourages naming tests in natural language that describes how the model should behave. For example, the test above is basically saying "population variables should be within a certain range across all input scenarios".
- A group of tests starts with a
describe
field. This is used to group related tests together. - A
describe
group should contain one or more items in thetests
field. - Each test starts with an
it
field that describes the expected behavior in plain language. The text usually begins with "should" (for example, this variable "should always be positive" or "should be close to historical values"). - Each test includes 3 essential parts --
scenarios
,datasets
, andpredicates
. - You are not limited to a single
describe
group or a singleyaml
file. You can put multipledescribe
groups in a single file, or you can spread out and define manyyaml
files under yourchecks
folder (for example, you can havepopulation.yaml
andtemperature.yaml
and more).
The scenarios
field should contain one or more input scenarios for which the expectations hold true.
Click to reveal examples
-
A single scenario that includes a single input at a specific value:
scenarios: - with: Input A at: 50
-
A single scenario that includes a single input at its defined extreme (minimum or maximum) value:
scenarios: - with: Input A at: max
-
A single scenario that includes multiple input values set at the same time:
scenarios: - with: - input: Input A at: 50 - input: Input B at: 20
-
Multiple (distinct) scenarios that have the same expected behavior:
scenarios: - with: Input A at: max - with: Input B at: max
-
A special "matrix" preset that will execute the test once for each input variable at its minimum, and again at its maximum:
scenarios: - preset: matrix
The datasets
field should contain one or more datasets (output variables or external datasets) for which the expectations hold true.
Click to reveal examples
-
A single dataset referenced by name:
datasets: - name: Output X
-
Multiple datasets referenced by name (one model output and one external dataset):
datasets: - name: Output X - name: Historical Y source: HistoricalData
-
Multiple datasets in a predefined group:
datasets: - group: Key Outputs
The predicates
field should contain one or more predicates, i.e., the behavior you expect to be true for the given scenario/dataset combinations.
Click to reveal examples
-
A predicate that says "greater than 0":
predicates: - gt: 0
-
A predicate that says "greater than 10 and less than 20 in the year 1900":
predicates: - gt: 10 lt: 20 time: 1900
-
A predicate that says "approximately 5 in the years between 1900 and 2000":
predicates: - approx: 5 tolerance: .01 time: [1900, 2000]
-
A predicate that says "approximately 5 for the year 2000 and beyond":
predicates: - approx: 5 tolerance: .01 time: after_incl: 2000
-
A predicate that says "within the historical data bounds for all years up to and including the year 2000":
predicates: - gte: dataset: name: Historical X confidence lower bound lte: dataset: name: Historical X confidence upper bound time: before_incl: 2000
For more examples of different kinds of check tests (including various predicates, combinations of inputs, time ranges, etc), refer to the checks.yaml file in the sample-check-tests
example.
The following is a screenshot of the "Checks" tab in a sample model-check report, which shows two expanded test results, one that is failing (note the red X's) and one that is passing (note the green checkmarks).

The following is an example of a comparison scenario definition, taken from the SIR example project.
- scenario:
title: Custom scenario
subtitle: with avg duration=4 and contact rate=2
with:
- input: Average Duration of Illness d
at: 4
- input: Initial contact rate
at: 2
- A
comparisons.yaml
file will typically have at minimum one or morescenario
definitions, but you can also havescenario_group
,graph_group
, andview_group
definitions in the same file. - You are not limited to a single
yaml
file to hold your comparisons. You can put multiple definitions in a single file, and you can spread out and define manyyaml
files under yourcomparisons
folder (for example, you can haverenewables.yaml
andeconomy.yaml
and more).
A scenario
definition represents an input scenario for which each output variable for the two models will be compared.
The format of a scenario
is similar to that of a check test (see above), except that it can contain:
- a
title
andsubtitle
(for keeping similar scenarios grouped together in the model-check report) - an optional
id
(that allows for the scenario to be referenced in ascenario_group
orview_group
definition)
Click to reveal examples
-
A scenario that includes a single input at a specific value:
- scenario title: Input A subtitle: at medium growth with: Input A at: 50
-
A scenario that includes a single input at its defined extreme (minimum or maximum) value:
- scenario title: Input A subtitle: at maximum with: Input A at: max
-
A scenario that includes multiple input values set at the same time:
- scenario title: Inputs A+B subtitle: at medium growth with: - input: Input A at: 50 - input: Input B at: 20
-
A "baseline" scenario that sets all inputs to their default values:
- scenario title: All inputs subtitle: at default with_inputs: all at: default
-
A special "matrix" preset that will generate comparisons for each input variable at its minimum, and again at its maximum:
- scenario: preset: matrix
TODO: This section is under construction. See "More Examples" below for a link to an example of scenario groups.
TODO: This section is under construction. See "More Examples" below for a link to an example of view groups.
For more examples of different kinds of comparison definitions (including different ways to define scenarios, scenario groups, views, etc), refer to the comparisons.yaml file in the sample-check-tests
example.
The model-check report includes two separate tabs for viewing comparisons.
The "Comparisons by scenario" tab summary view lists all the input scenarios that were compared:
Clicking on a scenario will take you to a detail view that shows graphs of all output variables under that input scenario:
The "Comparisons by dataset" tab summary view lists all the datasets (output variables and external datasets) that were compared:
Clicking on a dataset will take you to a detail view that shows graphs of that dataset for each tested input scenario:
Every model-check report includes a table summarizing the size and run time (speed) of the two versions of your generated model being compared.
For example:
If you click on the blue and red "heat map" to the right side of that table, it will open a performance testing page:
Click on the "Run" button a few times to get a sense of how the run times compare for the two versions of the model. (Note that it's currently a somewhat hidden feature, so the UI is not fully polished.)
The heat map display is useful for seeing the average time and distribution of outlying samples. To ensure consistent results, it is recommended to run performance tests when your computer is "quiet" (idle).