Uncertainty representations in KIDA
How are rate coefficient uncertainties defined in KIDA?
 Type (default=logn)
 describes the statistical distribution representing the uncertainty.
The implemented distributions are Normal (norm), Uniform (unif), Lognormal (logn) and Loguniform (logu);  F_{0} (no default)
 is an uncertainty parameter, which meaning and units depend on the Type (see Table 1);
 g (default=0; units: Kelvin)
 is used to parametrize a possible temperaturedependence of the uncertainty.
Type  Distribution  F_{0} meaning  F_{0} unit 

norm  Normal  stdev^{*}: Pr(k_{0}  F_{0} ≤ k ≤ k_{0} + F_{0}) ≅ 68 %  as rate coefficient 
unif  Uniform  half range: Pr(k_{0}  F_{0} ≤ k ≤ k_{0} + F_{0}) = 100 %  as rate coefficient 
logn  Lognormal  geometric stdev^{*}: Pr(k_{0} ⁄ F_{0} ≤ k ≤ k_{0}*F_{0}) ≅ 68 %  no units 
logu  Loguniform  geometric half range: Pr(k_{0} ⁄ F_{0} ≤ k ≤ k_{0}*F_{0}) = 100 %  no units 
^{*} stdev = standard deviation  
Table 1: The types of distributions and uncertainty factors used in KIDA. 
How to choose a Type of distribution and the value of the uncertainty factor when submitting data to KIDA ?
When you submit data to KIDA as an experimentalist or an expert, you should provide relevant information to fill the uncertaintyrelated fields.
The adapted uncertainty representation depends on the available knowledge:
a rate coefficient k_{0} and its standard (1σ) uncertainty Δk:
 if you suspect that the relative uncertainty might exceed 20% at some temperature,
particularly for lowT extrapolation, then use a Lognormal distribution:
Type=logn, F _{0} = exp(Δ k ⁄ k _{0}) (≅ 1 + Δ k ⁄ k _{0}); this is the preferred method.  if the relative uncertainty is small (Δ
k ⁄
k
_{0} ≤ 0.2) over the whole temperature range
(including a possible temperature extrapolation) you might use a Normal distribution:
Type=norm, F _{0} = Δ k
(but you might as well conform to the preferred Lognormal model, as described above!);
 if you suspect that the relative uncertainty might exceed 20% at some temperature,

a rate coefficient k_{0} known "up to a multiplicative factor X":
use a Lognormal distribution
Type=logn, F _{0} = X. 
the lower (k_{1}) and upper (k_{2}) limits of the rate coefficient:
 if
k
_{2} ⁄
k
_{1}≤10 (over the whole range of temperature), you can use a Uniform distribution:
Type=unif, k _{0} = ( k _{2} + k _{1})/2, F _{0} = ( k _{2}  k _{1})/2.  otherwise, you should use a Loguniform distribution:
Type=logu, k _{0} = ( k _{1} * k _{2}) ^{1⁄2}, F _{0} = ( k _{1} ⁄ k _{2}) ^{1⁄2}.
 if
k
_{2} ⁄
k
_{1}≤10 (over the whole range of temperature), you can use a Uniform distribution:
For more details on these distributions, see the Appendix. Keep also in mind that " The assignment of these uncertainties is a subjective assessment of the evaluators. They are not determined by a rigorous, statistical analysis of the database, which is generally too limited to permit such an analysis. Rather, the uncertainties are based on a knowledge of the techniques, the difficulties of the experimental measurements, the potential for systematic errors, and the number of studies conducted and their agreement or lack thereof."[IUPAC01]
Temperaturedependence of the uncertainty factor F(T)
The temperaturedependence of the uncertainty factor is described by the following function
where g is a positive parameter (in Kelvin), and T_{0} is a reference temperature (in KIDA the value of T_{0} is fixed at 300 K).
This expression is used to model a monotonous increase of uncertainty when temperature gets farther from the reference temperature T_{0}. This is commonly the case when one extrapolates to lowT rate constants measured at room temperature and above [Hébrard09].
How to estimate g for submission to KIDA ?
The simplest method is to use two estimations of uncertainty factors at two different temperatures F_{0} ≡ F(T_{0}) and F_{1} ≡ F(T_{1}). Inversion of the equation
A more accurate/complex method is to use the variance/covariance matrix for parameters of the Kooij/Arrhenius expression, from which standard uncertainty propagation by combination of variances enables to determine the temperaturedependent uncertainty u_{ln k}(T) ≡ ln F(T). This curve can then be fitted by the F_{0}*exp(g  1/T  1/T_{0}) expression to estimate F_{0} and g [Hébrard09, Nagy11]. Please contact the KIDA team if you need help to implement this procedure.
How to deal with branching ratios?
The case of partial rate constants for multipathway reactions is intricate because, in order to define a correct uncertainty representation, one has to link the reported values to their experimental origin: when measured directly, the partial rate constants can be treated as independent variables and their respective uncertainties can be represented by one of the implemented distributions, as described above;
 when they are obtained by the product of a global reaction rate coefficient k and a set of branching ratios b_{i}, the partial rate constants k_{i} cannot be considered any more as independent variables. At the moment, KIDA does not manage partial rate constants correlations, but we discuss below the best methods to handle the situation.
The accurate method
For an accurate and reliable uncertainty propagation, the best solution is to generate Monte Carlo samples of correlated branching ratios to be used for uncertainty propagation. It is therefore necessary to design an unbiased probability density function that accounts for the available data and for the correlation pattern of branching ratios [Carrasco07a,Plessis10,Pernot11].Please contact the KIDA team if you wish to implement this procedure.
The "least worse" method
Here we explain how to calculate the uncertainty factor for a lognormal representation of the partial rate constants, using the uncertainties on a global rate constant k and a set of branching ratios b_{i}.The uncertainty on the product k_{i} = k*b_{i} is obtained by standard propagation of variances
from which one derives the uncertainty factor F_{i} attached to the partial rate constant k_{i} as F_{i} = exp(σ(k_{i}) / k_{i}).
Again, remain aware that this method ignores the statistical correlations between the partial rate constants.
Why and how to use rate coefficients uncertainty in models?
Using uncertainty information in chemical modeling is vital, notably for extreme environments targeted by KIDA, where most reaction rate coefficients are poorly known, either estimated or extrapolated. Uncertainty management has two goals:
 Uncertainty Propagation (UP): to estimate the precision of the model outputs; and
 Sensitivity Analysis (SA): to identify key reactions, i.e. those contributing notably to the uncertainty of model outputs and for which better experimental or theoretical estimations are needed [Dobrijevic10].
The simplest way to implement these methods is through Monte Carlo sampling [Thompson91, Dobrijevic98, Wakelam05, Carrasco07, Wakelam10]. Random draws for a rate coefficient at a any temperature are generated using the formulae in Table 2.
Distribution  Formula 

Normal  k(T) = k_{0}(T) + F(T)*N(0,1) 
Uniform  k(T) = k_{0}(T) + F(T)*(U(0,1)0.5)*2 
Lognormal  k(T) = exp( ln k_{0}(T) + ln F(T)*N(0,1) ) 
Loguniform  k(T) = exp( ln k_{0}(T) + ln F(T)*(U(0,1)0.5)*2 ) 
* U(0,1) is a standard uniform random numbers generator (between 0 and 1)
* N(0,1) is a standard normal/gaussian random numbers generator (centered at 0; variance 1) 

Table 2: Generating random samples. 
For UP, the code is run for N random draws of the m rate constants of the chemical scheme { k_{i} ^{(j)}; i = 1, m; j = 1, N} (all parameters vary simultaneously), and the N sets of outputs are stored for statistical analysis (mean value, uncertainty factor, input/output correlation...).
A convenient way to perform SA is to calculate the correlation coefficients between the inputs (k_{i}) and outputs of the model. Large correlation coefficients reveal strong influences of inputs on outputs [Dobrijevic10]. Variations of this method include using rank correlation coefficients, or the logarithm of inputs and/or outputs [Helton06,Saltelli04].
Important
In reaction networks, where multiple uncertain rate coefficients are managed simultaneously, care has to be taken that the same random number is used for the whole temperature range of a single reaction.
Appendix
Although the lognormal distribution is the preferred uncertainty representation in KIDA, provision has been set for alternative representations. These are briefly presented here.The Lognormal distribution (default)
The default approach to specify uncertainty for reaction rate coefficients is to use a Lognormal (logn) distribution characterized by a multiplicative uncertainty factor F_{0} defining a ``1σ'' confidence interval around the reference value k_{0} [JPL06], i.e.Pr(k_{0} ⁄ F_{0} ≤ k ≤ k_{0} * F_{0}) ≅ 68 % 
Pr(k_{0} ⁄ F_{0} ^{2} ≤ k ≤ k_{0} * F_{0} ^{2}) ≅ 95 % 
Pr(k_{0} ⁄ F_{0} ^{3} ≤ k ≤ k_{0} * F_{0} ^{3}) ≅ 99 % 
... 
Notes
 In the KIDA data sheets, uncertainty might appear as Δlog k (the decimal logarithm is used), from which F_{0} is readily obtained as F_{0} = 10 ^{Δlog k}.
 When a relative uncertainty Δk ⁄ k_{0} is available, one can get a quick estimate of the uncertainty factor as F_{0} ≅ 1 + Δk ⁄ k_{0}, but it is more accurate to use F_{0} = exp(Δk ⁄ k_{0}), deriving from the relations (valid for not too large relative uncertainty) ln F_{0} = Δ(ln k) = Δk ⁄ k_{0}.
The Normal distribution
The uncertainty factor has the meaning of a standard deviation, and the uncertainty model is normal additive, i.e.The normal distribution has to be used with care for positive variables as rate constants. For small relative uncertainties (F_{0} ⁄ k_{0} << 0.2), there is generally no problem, but the probability to get negative values of k with a normal uncertainty distribution is Pr( k≤0) ≅ 0.5*erfc(0.7*k_{0} ⁄ F_{0}). This probability increases with F_{0} ⁄ k_{0} (between 0 and 50%) as shown in Table 3.
F_{0} ⁄ k_{0}  0.2  0.5  1  2  5  10 

k≤0)  2E7  0.02  0.16  0.38  0.42  0.46 
Table 3: Probability to get negative values of rate constants when using a normal distribution, as a function of relative uncertainty. 
The Uniform distribution
This representation is used when the rate coefficient is defined by extreme values, k_{1} < k_{2}, with no recommended value in the interval. It can be used when the geometric range of the interval covers less than one order of magnitude (k_{2} ⁄ k_{1}≤10). The limits are related to the KIDA parameters by k_{1} = k_{0}  F_{0} and k_{2} = k_{0} + F_{0}.The Loguniform distribution
This representation is used when the rate coefficient is defined by extreme values, k_{1} < k_{2} , with no recommended value in the interval. The loguniform distribution is to be preferred to the uniform distribution when the range of the interval covers more than one order of magnitude ( k_{2} ⁄ k_{1}≥10). It avoids to overweight the larger values. The mean rate is the geometric mean of the limits k_{0} = (k_{1} * k_{2}) ^{1⁄2} and the uncertainty factor is designed to cover the whole interval: F_{0} = (k_{1} ⁄ k_{2}) ^{1⁄2}. The limits are recovered from the KIDA parameters by k_{1} = k_{0} ⁄ F_{0} and k_{2} = k_{0}*F_{0}.
Bibliographic References
 [Carrasco07] Carrasco, N. et al. (2007) Planet. Space Sci. 55:141157. doi:10.1016/j.pss.2006.06.004
 [Carrasco07a] Carrasco, N. & Pernot, P. (2007) J. Phys. Chem. A 111:35073512. doi:10.1021/jp067306y
 [Carrasco08] Carrasco, N. et al. (2008) Planetary and Space Science 56:16441657. doi:10.1016/j.pss.2008.04.007
 [Dobrijevic98] Dobrijevic, M. & Parisot, J. (1998) Planet. Space Sci. 46:491505.
 [Dobrijevic10] Dobrijevic, M. et al. (2010) Adv. Space Res. 45:7791. doi:10.1016/j.asr.2009.06.005
 [Hébrard06] Hébrard, E. et al. (2006) J. Photochem. Photobiol. A 7:211230.
 [Hébrard09] Hébrard, E. et al. (2009) J. Phys. Chem. A 113:1122711237. doi:10.1021/jp905524e
 [Helton06] Helton, J.C. et al. (2006) Rel. Eng. Sys. Safety 91:11751209. doi:10.1016/j.ress.2005.11.017
 [IUPAC01] IUPAC: Subcommittee for Gas Kinetic Data Evaluation (2001) Guide to the datasheets. Download
 [JPL06] Sander, S.P. et al. (2006) JPL Publication 062. Download
 [Nagy11] Nagy, T. & Turányi T. (2011) Int. J. Chem. Kin. 43:359378. doi:10.1002/kin.20551
 [Pernot11] Pernot, P. et al. (2011) J. Phys.: Conf. Ser. 300:012027. doi:10.1088/17426596/300/1/012027
 [Plessis10] Plessis, S. et al. (2010) J. Chem. Phys. 133:134411. doi:10.1063/1.3479907
 [Saltelli04] Saltelli, A. et al. (2004) Chem. Rev. 105:2811–2828. doi:10.1021/cr040659d
 [Thompson91] Thompson, A. & Stewart, R. (1991) J. Geophys. Res. 96:1308913108; Stewart, R. & Thompson, A. (1996) J. Geophys. Res. 101:2093520964.
 [Wakelam05] Wakelam, V. et. al. (2005) A&A 444:883891.
 [Wakelam10] Wakelam, V. et. al. (2010) Space Sci. Rev. 156:1372. doi:10.1007/s1121401097125 / Preprint