Contributing

This Python package is developed in the context of the Getting The Data Right project.

The package is the central place to store and maintain the classification files that are used in the Bonsai database. These classification files change during the development of the database and the main development will be under the directory src/classifications/data (e.g., by adding/revising csv files and revising the metadata.yaml). To keep track of this development, we use a version number v.<major>.<minor>.<patch>, which also correponds with the tag of the repository. Thus, each new release of the classification has a tag.

To update the classification, two general cases may occur:

  1. Adding new codes to an existing tree_bonsai csv table (disaggregation). In this case, add a new row to the corresponding tree_bonsai table with for the new code and point to the parent_code that you want to disaggregate. Please follow the naming convention. There should be at least two new codes added, if the aim is disaggregating an existing code. The new codes added to a tree_bonsai should be also added to the existing concordance tables!

  2. Adding a new table (e.g. a tree table that represents an external classification schema; or a concordance table that maps the Bonsai classification schema to an external schema)

In both cases, you need to install the package. You should install the package in editable mode in a separate Python environment. Furthermore, please use tox and pre-commit already locally to check if things work properly. If ready, create a merge request.

Note

If you do not want to install, but still to want to have changes implemented, please reach out. You can also create an issue and describe your ideas (or contribute to an existing discussion).

Installation

To install a local version of the repository execute in the command line using ssh:

git clone git@gitlab.com:bonsamurais/bonsai/clean/classifications.git

Or using https:

git clone https://gitlab.com/bonsamurais/bonsai/clean/classifications.git

Create a python environment and activate it (if you use conda, install pip via conda install pip). Then, cd to the folder of the repository classifications/; and type in your console:

pip install -e .

After that you can continue working on a specific branch.

Note

To execute locally, you also need to install tox and graphviz.

Testing

To make sure that the datapackages are valid, run tox (there are tests implemented to make this sure). Please create no merge-request in case the tests are failing.

Make sure that added files are also mentioned in the existing resources.csv of the corresponding folder (this allows for automated tests).

Required table headers

The tree_ , conc_ and dim_ csv files (currently) require the following headers.

conc_<classification_A classification_B>:

tree_<classification_A>

tree_<classification_B>

comment

skos_uri

Note

Only map codes that belong to the most detailed levels of each category. In principle it would be possible to map codes for all levels. However, this complicates things unnecessarily, since we are only interested in the most detailed representation.

tree_<classification_A>:

code

parent_code

name

level

comment

dim_<something>:

code

name

description

Automated generation

The package includes srcipts to add/fill columns of existing csv files.

  • run python src/classifications/_level.py to add the corresponding level of codes in tree_files (the fields for code and parent_code must already exist)

  • run python src/classifications/_mapping_type.py to add the corresponding comment and skos_uri of codes in conc_ files (the fileds tree_<classification_A> and tree_<classification_B> must already exist)

Naming convention (column “name”)

For the names of the objects in the name column, we follow a specific convention, which is based on Weidema et al. 2013.

  • lower case

  • singular (e.g. barley grain, not “barley grains”)

  • the simplest form of an activity isa production; which is added after the product (e.g. lime producion)

  • the term construction is used for activities that have buildings, transport infrastructure, factories and facilities as their product outputs (e.g. bridge construction)

  • if the activity has multiple products, the activity can instead be named after the nature of the process, e.g. air separation, cryogenic with the products oxygen, nitrogen and argon

  • when an activity is described in terms of the process of converting a raw material to a product, the order process, raw material, detail of process is preferred, e.g. leaching of spodumene with sulphuric acid

  • the ending “-ing” is preserved for services

  • for infrastructure, the name factory or facility is preferred to “plant”, except in traditional combinations such as power plant

  • treatment activities are named treatment of <material>, <nature or output of the treatment>

  • market activities start with market for

  • market activities, production mixes, supply mixes, export and re-export activities have the same products as inputs and outputs, e.g. market for barley grain has barley grain as input and barley grain as output

  • activity datasets with the term operation as part of their name signifies activities that use specific infrastructures, e.g. mine operation as opposed to mine construction

  • product names begin with the most generic form of the product that is generally recognized as a product, e.g. cement, blast furnace slag rather than “blast furnace slag cement”, but avoiding artificial names, e.g. not “fertiliser, nitrogen” but nitrogen fertiliser.

  • indication of the production route or specific product characteristics are only included if this is part of the marketable product properties, i.e. if there is a market or market niche where the production route or property is a part of the obligatory product properties. For example, the product straw is named as such, not with separate names for “barley straw” and “wheat straw”, since the market for straw does not distinguish between these two products

  • for dissolved chemicals, the traditional nomenclature of the chemical industry is to indicate the active substance and then add the water separately, so that e.g. 1 kg of sodium hydroxide, in 50% solution state, measured as 100% NaOH, refers to the production of 2 kg NaOH solution with a water content of 50%, i.e., 1 kg pure NaOH plus 1 kg pure H2O

  • treatment activities provide services to other activities to treat their material outputs, in particular wastes. Since the service and the input are intimately linked, the service output is named by the treated material. Thus, the activity treatment of blast furnace gas has as its determining (reference) product treating blast furnace gas

  • the name for a chemical element or a compound is the same for all environmental compartments, the list of compartments is the same as in ecoinvent Table 9.1

Code convention (columns “code”)

The codes shall use prefixes followed by numbers: <prefix>_<number>.

flowobject

prefix

industry_product

fi

material_for_treatment

ft

market_product

fm

government_product

fg

household_product

fhp

needs_satisfaction

fhc

emission

fe

direct_physical_change

fp

natural_resource

fn

economic_flow

fe

social_flow

fs

activitytype

prefix

industry_activity

ai

government_activity

ag

treatment_activity

at

non_profit_institution_serving_household

anp

household_production

ahp

household_consumption

ahc

market_activity

am

natural_activity

ana

auxiliary_production_activity

aa

schange_in_stock_activity

ast

Note

Currently, we use Exiobase code convention (A_, C_, M_). This needs to be revised later.