Contributing
This Python package is developed in the context of the Getting The Data Right
project.
The package is the central place to store and maintain the classification files that are used in the Bonsai database.
These classification files change during the development of the database and the main development will be under the directory src/classifications/data
(e.g., by adding/revising csv files and revising the metadata.yaml).
To keep track of this development, we use a version number v.<major>.<minor>.<patch>
, which also correponds with the tag
of the repository.
Thus, each new release of the classification
has a tag
.
To update the classification, two general cases may occur:
Adding new codes to an existing
tree_bonsai
csv table (disaggregation). In this case, add a new row to the correspondingtree_bonsai
table with for the new code and point to theparent_code
that you want to disaggregate. Please follow the naming convention. There should be at least two new codes added, if the aim is disaggregating an existing code. The new codes added to atree_bonsai
should be also added to the existing concordance tables!Adding a new table (e.g. a tree table that represents an external classification schema; or a concordance table that maps the Bonsai classification schema to an external schema)
In both cases, you need to install the package. You should install the package in editable mode in a separate Python environment. Furthermore, please use tox and pre-commit already locally to check if things work properly. If ready, create a merge request.
Note
If you do not want to install, but still to want to have changes implemented, please reach out. You can also create an issue and describe your ideas (or contribute to an existing discussion).
Installation
To install a local version of the repository execute in the command line using ssh:
git clone git@gitlab.com:bonsamurais/bonsai/clean/classifications.git
Or using https:
git clone https://gitlab.com/bonsamurais/bonsai/clean/classifications.git
Create a python environment and activate it (if you use conda, install pip via conda install pip
).
Then, cd
to the folder of the repository classifications/
; and type in your console:
pip install -e .
After that you can continue working on a specific branch.
Testing
To make sure that the datapackages are valid, run tox
(there are tests implemented to make this sure).
Please create no merge-request in case the tests are failing.
Make sure that added files are also mentioned in the existing resources.csv
of the corresponding folder (this allows for automated tests).
Required table headers
The tree_
, conc_
and dim_
csv files (currently) require the following headers.
conc_<classification_A classification_B>
:
tree_<classification_A> |
tree_<classification_B> |
comment |
skos_uri |
---|---|---|---|
… |
… |
… |
… |
Note
Only map codes that belong to the most detailed levels of each category. In principle it would be possible to map codes for all levels. However, this complicates things unnecessarily, since we are only interested in the most detailed representation.
tree_<classification_A>
:
code |
parent_code |
name |
level |
comment |
---|---|---|---|---|
… |
… |
… |
… |
… |
dim_<something>
:
code |
name |
description |
---|---|---|
… |
… |
… |
Automated generation
The package includes srcipts to add/fill columns of existing csv files.
run
python src/classifications/_level.py
to add the correspondinglevel
of codes intree_
files (the fields forcode
andparent_code
must already exist)run
python src/classifications/_mapping_type.py
to add the correspondingcomment
andskos_uri
of codes inconc_
files (the filedstree_<classification_A>
andtree_<classification_B>
must already exist)
Naming convention (column “name”)
For the names of the objects in the name
column, we follow a specific convention, which is based on Weidema et al. 2013.
lower case
singular (e.g.
barley grain
, not “barley grains”)the simplest form of an activity isa production; which is added after the product (e.g.
lime producion
)the term construction is used for activities that have buildings, transport infrastructure, factories and facilities as their product outputs (e.g.
bridge construction
)if the activity has multiple products, the activity can instead be named after the nature of the process, e.g.
air separation, cryogenic
with the productsoxygen
,nitrogen
andargon
when an activity is described in terms of the process of converting a raw material to a product, the order
process
,raw material
,detail of process
is preferred, e.g.leaching of spodumene with sulphuric acid
the ending “-ing” is preserved for services
for infrastructure, the name
factory
orfacility
is preferred to “plant”, except in traditional combinations such aspower plant
treatment activities are named
treatment of <material>, <nature or output of the treatment>
market activities start with
market for
market activities, production mixes, supply mixes, export and re-export activities have the same products as inputs and outputs, e.g.
market for barley grain
hasbarley grain
as input andbarley grain
as outputactivity datasets with the term
operation
as part of their name signifies activities that use specific infrastructures, e.g.mine operation
as opposed tomine construction
product names begin with the most generic form of the product that is generally recognized as a product, e.g.
cement, blast furnace slag
rather than “blast furnace slag cement”, but avoiding artificial names, e.g. not “fertiliser, nitrogen” butnitrogen fertiliser
.indication of the production route or specific product characteristics are only included if this is part of the marketable product properties, i.e. if there is a market or market niche where the production route or property is a part of the obligatory product properties. For example, the product
straw
is named as such, not with separate names for “barley straw” and “wheat straw”, since the market for straw does not distinguish between these two productsfor dissolved chemicals, the traditional nomenclature of the chemical industry is to indicate the active substance and then add the water separately, so that e.g. 1 kg of
sodium hydroxide, in 50% solution state, measured as 100% NaOH
, refers to the production of 2 kg NaOH solution with a water content of 50%, i.e., 1 kg pure NaOH plus 1 kg pure H2Otreatment activities provide services to other activities to treat their material outputs, in particular wastes. Since the service and the input are intimately linked, the service output is named by the treated material. Thus, the activity
treatment of blast furnace gas
has as its determining (reference) producttreating blast furnace gas
the name for a chemical element or a compound is the same for all environmental compartments, the list of compartments is the same as in ecoinvent Table 9.1
Code convention (columns “code”)
The codes shall use prefixes followed by numbers: <prefix>_<number>
.
flowobject |
prefix |
---|---|
industry_product |
fi |
material_for_treatment |
ft |
market_product |
fm |
government_product |
fg |
household_product |
fhp |
needs_satisfaction |
fhc |
emission |
fe |
direct_physical_change |
fp |
natural_resource |
fn |
economic_flow |
fe |
social_flow |
fs |
activitytype |
prefix |
---|---|
industry_activity |
ai |
government_activity |
ag |
treatment_activity |
at |
non_profit_institution_serving_household |
anp |
household_production |
ahp |
household_consumption |
ahc |
market_activity |
am |
natural_activity |
ana |
auxiliary_production_activity |
aa |
schange_in_stock_activity |
ast |
Note
Currently, we use Exiobase code convention (A_
, C_
, M_
). This needs to be revised later.