NUSAP: Numeral Unit Spread Assessment Pedigree

 

NUSAP is a notational system proposed by Funtowicz and Ravetz (1990), which aims to provide an analysis and diagnosis of uncertainty in science for policy. It captures both quantitative and qualitative dimensions of uncertainty and enables one to display these in a standardized and self-explanatory way. It promotes criticism by clients and users of all sorts, expert and lay and will thereby support extended peer review processes.
The basic idea is to qualify quantities using the five qualifiers of the NUSAP acronym: Numeral, Unit, Spread, Assessment, and Pedigree. By adding expert judgment of reliability (Assessment) and systematic multi-criteria evaluation of the production process of numbers (Pedigree), NUSAP has extended the statistical approach to uncertainty (inexactness) with the methodological (unreliability) and epistemological (ignorance) dimensions. By providing a separate qualification for each dimension of uncertainty, it enables flexibility in their expression. By means of NUSAP, nuances of meaning about quantities can be conveyed concisely and clearly, to a degree that is quite impossible with statistical methods only.
 
We will discuss the five qualifiers. The first is Numeral; this will usually be an ordinary number; but when appropriate it can be a more general quantity, such as the expression "a million" (which is not the same as the number lying between 999,999 and 1,000,001). Second comes Unit, which may be of the conventional sort, but which may also contain extra information, as the date at which the unit is evaluated (most commonly with money). The middle category is Spread, which generalizes from the "random error" of experiments or the "variance" of statistics. Although Spread is usually conveyed by a number (either ±, % or "factor of") it is not an ordinary quantity, for its own inexactness is not of the same sort as that of measurements. Methods to address Spread can be statistical data analysis, sensitivity analysis or Monte Carlo analysis possibly in combination with expert elicitation.
 
The remaining two qualifiers constitute the more qualitative side of the NUSAP expression. Assessment expresses qualitative judgments about the information. In the case of statistical tests, this might be the significance level; in the case of numerical estimates for policy purposes, it might be the qualifier "optimistic" or "pessimistic". In some experimental fields, information is given with two ± terms, of which the first is the spread, or random error, and the second is the "systematic error" which must estimated on the basis of the history of the measurement, and which corresponds to our Assessment. It might be thought that the "systematic error" must always be less than the "experimental error", or else the stated "error bar" would be meaningless or misleading. But the "systematic error" can be well estimated only in retrospect, and then it can give surprises.
 
Finally there is P for Pedigree, which conveys an evaluative account of the production process of information, and indicates different aspects of the underpinning of the numbers and scientific status of the knowledge used. Pedigree is expressed by means of a set of pedigree criteria to assess these different aspects. Assessment of pedigree involves qualitative expert judgment. To minimize arbitrariness and subjectivity in measuring strength, a pedigree matrix is used to code qualitative expert judgments for each criterion into a discrete numeral scale from 0 (weak) to 4 (strong) with linguistic descriptions (modes) of each level on the scale. Each special sort of information has its own aspects that are key to its pedigree, so different pedigree matrices using different pedigree criteria can be used to qualify different sorts of information. Table 1 gives an example of a pedigree matrix for emission monitoring data. An overview of pedigree matrices found in the literature is given in the pedigree matrices section of http://www.nusap.net. Risbey et al. (2001) document a method to draft pedigree scores by means of expert elicitation. Examples of questionnaires used for eliciting pedigree scores can be found at http://www.nusap.net.
 

 

Table 1 Pedigree matrix for emission monitoring data (Risbey et al., 2001; adapted from Ellis et al.,
2000a, 2000b).
 
Score
Proxy representation
Empirical basis
Methodological rigour
Validation
4
An exact measure of the desired quantity
Controlled experiments and large sample direct measurements
Best available practice in well established discipline
Compared with independent measurements of the same variable over long domain
3
Good fit or measure
Historical/field data uncontrolled experiments small sample direct measurements
Reliable method common within est. discipline Best available practice in immature discipline
Compared with independent measurements of closely related variable over shorter period
2
Well correlated but not measuring the same thing
Modelled/derived data Indirect measurements
Acceptable method but limited consensus on reliability
Measurements not independent proxy variable limited domain
1
Weak correlation but commonalities in measure
Educated guesses indirect approx. rule of thumb est.
Preliminary methods unknown reliability
Weak and very indirect validation
0
Not correlated and not clearly related
Crude speculation
No discernible rigour
No validation performed
 
We will briefly elaborate the four criteria in this example pedigree matrix.
 
Proxy representation
Sometimes it is not possible to measure directly the thing we are interested in or to represent it by a parameter, so some form of proxy measure is used. Proxy refers to how good or close a measure of the quantity that we measure or model is to the actual quantity we seek or represent. Think of first order approximations, over simplifications, idealizations, gaps in aggregation levels, differences in definitions, non-representativeness, and incompleteness issues.
 
Empirical basis
Empirical basis typically refers to the degree to which direct observations, measurements and statistics are used to estimate the parameter. Sometimes directly observed data are not available and the parameter or variable is estimated based on partial measurements or calculated from other quantities. Parameters or variables determined by such indirect methods have a weaker empirical basis and will generally score lower than those based on direct observations.
 
Methodological rigour
Some method will be used to collect, check, and revise the data used for making parameter or variable estimates. Methodological quality refers to the norms for methodological rigour in this process applied by peers in the relevant disciplines. Well-established and respected methods for measuring and processing the data would score high on this metric, while untested or unreliable methods would tend to score lower.
 
Validation
This metric refers to the degree to which one has been able to crosscheck the data and assumptions used to produce the numeral of the parameter against independent sources. In many cases, independent data for the same parameter over the same time period are not available and other data sets must be used for validation. This may require a compromise in the length or overlap of the data sets, or may require use of a related, but different, proxy variable for indirect validation, or perhaps use of data that has been aggregated on different scales. The more indirect or incomplete the validation, the lower it will score on this metric.
 
 
Visualizing pedigree scores
In general, pedigree scores will be established using expert judgements from more than one expert. Two ways of visualizing results of a pedigree analysis are discussed here: radar diagrams and kite diagrams. (Risbey et al, 2001; Van der Sluijs et al, 2001a). An example of both representations is given in figure 1.
 

 

Figure 1 Example of representations of same results by radar diagram and kite diagram (Vander Sluijs et al, 2001a)

Both representations use polygons with one axis for each criterion, having 0 in the center of the polygon and 4 on each corner point of the polygon. In the radar diagrams a colored line connecting the scores represents the scoring of each expert, whereas a black line represents the average scores.

The kite diagrams follow a traffic light analogy. The minimum scores in each group for each pedigree criterion span the green kite; the maximum scores span the amber kite. The remaining area is red. The width of the amber band represents expert disagreement on the pedigree scores. In some cases the size of the green area was strongly influenced by a single deviating low score given by one of the experts. In those cases the light green kite shows what the green kite would look like if that outlier had been omitted. Note that the algorithm for calculating the light green kite is such that outliers are evaluated per pedigree criterion, so that outliers defining the light green area need not be from the same expert.
A web-tool to produce kite diagrams is available from http://www.nusap.net.
 
The kite diagrams can be interpreted as follows: the green colored area reflects the (apparent minimal consensus) strength of the underpinning of each parameter. The greener the diagram the stronger the underpinning is. The orange colored zone shows the range of expert disagreement on that underpinning. The remaining area is red. The more red you see the weaker the underpinning is (all according to the assessment by the group of experts represented in the diagram).
A kite diagram captures the information from all experts in the group without the need to average expert opinion. Averaging expert opinion is a controversial issue in elicitation methodologies. A second advantage is that it provides a fast and intuitive overview of parameter strength, preserving key aspects of the underlying information.
 
Propagation of pedigree in calculations
Ellis et al. (2000) have developed a pedigree calculator to assess propagation of pedigree in a calculation in order to establish pedigree scores for quantities calculated from other quantities. For more information we refer to http://www.esapubs.org/archive/appl/A010/006/default.htm 

 

Diagnostic Diagram
The method chosen to address the spread qualifier (typically sensitivity analysis or Monte Carlo analysis) provides for each input quantity a quantitative metric for uncertainty contribution (or sensitivity), for instance the relative contribution to the variance in a given model output. The Pedigree scores can be aggregated (by dividing the sum of the scores of the pedigree criteria by sum of the maximum attainable scores) to produce a metric for parameter strength. These two independent metrics can be combined in a NUSAP Diagnostic Diagram. The Diagnostic Diagram is based on the notion that neither spread alone nor strength alone is a sufficient measure for quality. Robustness of model output to parameter strength could be good even if parameter strength is low, provided that the model outcome is not critically influenced by the spread in that parameter. In this situation our ignorance of the true value of the parameter has no immediate consequences because it has a negligible effect on calculated model outputs. Alternatively, model outputs can be robust against parameter spread even if its relative contribution to the total spread in model is high provided that parameter strength is also high. In the latter case, the uncertainty in the model outcome adequately reflects the inherent irreducible uncertainty in the system represented by the model. In other words, the uncertainty then is a property of the modelled system and does not stem from imperfect knowledge on that system. Mapping model parameters in the assessment diagram thus reveals the weakest critical links in the knowledge base of the model with respect to the model outcome assessed, and helps in the setting of priorities for model improvement.
 
 
Sorts and locations of uncertainty addressed by NUSAP

The different qualifiers in the NUSAP system address different sorts of uncertainty. The Spread qualifier addresses statistical uncertainty (inexactness) in quantities, typically in input data and parameters. The Assessment qualifier typically addresses unreliability. The Pedigree criterion further qualifies the knowledge base in such a way that it explores the border with ignorance by providing detailed insights in specific weaknesses in the knowledge base that underpins a given quantity.
 
Most of the pedigree assessments in the literature so far have addressed uncertainties located in "inputs" and "parameters", thereby focusing on the internal strength of the knowledge base. Recently Corral (2000) in his Ph.D. thesis extended the pedigree scheme to also address uncertainties located in the "socio political context", focusing on the external strength (its relations with the worlds outside the science) of the knowledge base. Criteria that Corral used to assess the pedigree of the processes of knowledge utilization and institutional context of the analysts were inter alia: Accessibility, Terminology, Completeness, Source of information, Verification, Colleague consensus, Extended peer acceptance, Legitimation, Experience, Flexibility. NUSAP can also be used to assess issues of value ladenness (see the entry “A method for critical review of assumptions in model-based assessments” in his tool catalogue).
 
What resources are required to use the tool?
Resources required for assessing the Spread qualifier depend on the method chosen (some form of Sensitivity Analysis or Monte Carlo analysis usually in combination with expert elicitation will be needed).
 
For the assessment of Pedigree, many resources (pedigree matrices, pedigree calculator, kite diagram maker, elicitation protocol and questionnaires) are freely available from http://www.nusap.net. Basic interviewing skills and awareness of motivational bias that may occur in any expert elicitation are required. See also the section on expert elicitation in this toolbox.
 
If one uses an expert workshop, basic skills for facilitating structured group discussions are needed. In addition, skills are needed to arrive at a balanced composition of the workshop audience to minimize biases.
 
Time required per expert elicitation in a one to one interview depends on the number of parameters and the complexity of the case. It may typically vary between 1 and 5 hours. A substantial amount of time may be needed for a good preparation of the interviews.
 
Recommended length for a NUSAP expert elicitation workshop is between one and one and a half day.
 
Strengths and limitations
 
Typical strengths of NUSAP are:
  • NUSAP identifies the different sorts of uncertainty in quantitative information and enables them to be displayed in a standardized and self-explanatory way. Providers and users of quantitative information then have a clear and transparent assessment of its uncertainties.
  • NUSAP fosters an enhanced appreciation of the issue of quality in information. It thereby enables a more effective criticism of quantitative information by providers, clients, and also users of all sorts, expert and lay.
  • NUSAP provides a useful means to focus research efforts on the potentially most problematic parameters by identifying those parameters, which are critical for the quality of the information.
  • It is flexible in its use and can be used on different levels of comprehensiveness: from a 'back of the envelope' sketch based on self elicitation to a comprehensive and sophisticated procedure involving structured informed in-depth group discussions on a parameter by parameter format, covering each pedigree criterion combined with a full blown Monte Carlo assessment.
  • The diagnostic diagram provides a convenient way in which to view each of the key parameters in terms of two crucial attributes. One is their relative contribution to the sensitivity of the output, and the other is their strength. When viewed in combination on the diagram, they provide indications of which parameters are the most critical for the quality of the result. 
Typical weaknesses of NUSAP are:
  • The method is relatively new, with a limited (but growing) number of practitioners. There is as yet no system of quality assurance in its applications, nor settled guidelines for good practice.
  • The scoring of pedigree criteria is to a certain degree subjective. Subjectivity can partly be remedied by the design of unambiguous pedigree matrices and by involving multiple experts in the scoring. The choice of experts to do the scoring is also a potential source of bias.
  • The method is applicable only to simple calculations with small numbers of parameters. But it may be questioned whether complicated calculations with many parameters are capable of effective uncertainty analysis by any means. 
 
Guidance on the application
  • For guidance on the application of NUSAP we refer to http://www.nusap.net
  • When eliciting pedigree scores, always ask for motivation for the score given and document the motivation along with the pedigree scores.
  • Expert disagreement on pedigree scores for a parameter can be an indication of epistemic uncertainty about that parameter. Find out whether there are different paradigms or competing schools of thought on that parameter.

Pitfalls

Typical pitfalls of NUSAP are:
  • Misinterpreting low pedigree scores as indicating low-quality science. In relation to whole disciplines, this amounts to 'physics-envy'. Quality in science depends not on removing uncertainty but on managing it.
  • Misinterpreting pedigree scores as an evaluation of individual items of information, with low scores indicating bad research. The pedigree analysis is of the characteristic limits of knowledge of areas of inquiry. The quality of individual items of information depends crucially on the craftsmanship of the work, requiring a closer analysis, which the pedigree does not undertake.
  • Motivational bias towards high pedigrees in (self) elicitation, especially in case of numbers where one or one's institute was involved in the knowledge production. This pitfall is avoided by the use of trained facilitators in an open process for the construction and assignment of pedigrees.
  • Falsely thinking that pedigree and spread are correlated: In principle these are independent dimensions.
References
Key references:
Funtowicz, S.O. and Ravetz, J.R., 1990. Uncertainty and Quality in Science for Policy. Dordrecht: Kluwer.
 
J.P. van der Sluijs, M. Craye, S. Funtowicz, P. Kloprogge, J. Ravetz, and J. Risbey (2005), Combining Quantitative and Qualitative Measures of Uncertainty in Model based Environmental Assessment: the NUSAP System, Risk Analysis, 25 (2). p. 481-492
 
Websites:
A website devoted to the further development and dissemination of the NUSAP method with direct access to tutorials, tools, papers and the like.
 
Papers with a focus on methodological aspects
R. Constanza, S.O. Funtowicz and J.R. Ravetz, Assessing and communicating data quality in policy-relevant research. Environmental Management, 16, 1992, pp. 121-131.
 
Corral Quintana, Serafin A. 2000. Una Metodología integrada de exploración y compensión de los procesos de elaboración de políticas públicas. Ph.D. thesis, University of La Laguna
 
J.S. Risbey, J.P. van der Sluijs and J. Ravetz, 2001. Protocol for Assessment of Uncertainty and Strength of Emission Data, Department of Science Technology and Society, Utrecht University, report nr. E-2001-10, 22 pp.
 
J.P. van der Sluijs, Tuning NUSAP for its use in Integrated Model Quality Assurance the Case of Climate Change, Report in commission of European Commission Directorate General CCR, Joint Research Centre, Institute for Systems, Informatics and Safety, Ispra, Italy (contract no. 13970 – 1998 – 05 F1EI ISP NL), Department of Science Technology and Society, Utrecht University, Utrecht, March 1999. 36 pp.
 
J.P. van der Sluijs, M. Craye, S. Funtowicz, P. Kloprogge, J. Ravetz, and J. Risbey (2005), Experiences with the NUSAP system for multidimensional uncertainty assessment in Model based Foresight Studies, Water science and technology, 52 (6), 133–144.
 
Case studies:
Erle C. Ellis, Rong Gang Li, Lin Zhang Yang and Xu Cheng. 2000a. Long-term Change in Village-Scale Ecosystems in China Using Landscape and Statistical Methods. Ecological Applications 10:1057-1073.
 
Erle C. Ellis, Rong Gang Li, Lin Zhang Yang and Xu Cheng. 2000b. Long-term Change in Village-Scale Ecosystems in China Using Landscape and Statistical Methods. Ecological Applications 10:1057-1073. Supplement 1: Data Quality Pedigree Calculator. Ecological Archives A010-006-S1. (http://www.esapubs.org/archive/appl/A010/006/default.htm)
 
R. van Gijlswijk, P. Coenen, T. Pulles and J.P. van der Sluijs, Uncertainty assessment of NOx, SO2 and NH3 emissions in the Netherlands, 2004. TNO and Copernicus Institute Research Report (available from www.nusap.net).
 
ORNL and RFF. 1994. Estimating Fuel Cycle Externalities: Analytical Methods and Issues, Report 2, prepared by Oak Ridge National Laboratory and Resources for the Future for the U.S. Department of Energy.
 
M. Hongisto, 1997. Assessment of External Costs of Power Production, A Commensurable Approach? paper presented at Total Cost Assessement – Recent Developments and Industrial Applications, Invitational Expert Seminar, Nauvo, Finland, June 15-17.
 
J.P. van der Sluijs, J.S. Risbey and J. Ravetz, 2001, Uncertainty Assessment of VOC emissions from Paint in the Netherlands (available from www.nusap.net).
 
Jeroen P. van der Sluijs, Jose Potting, James Risbey, Detlef van Vuuren, Bert de Vries, Arthur Beusen, Peter Heuberger, Serafin Corral Quintana, Silvio Funtowicz, Penny Kloprogge, David Nuijten, Arthur Petersen, Jerry Ravetz. Uncertainty assessment of the IMAGE/TIMER B1 CO2 emissions scenario, using the NUSAP method. Dutch National Research Program on Climate Change, Bilthoven, 2002, 225 pp.
 
M. Craye, J.P. van der Sluijs and S. Funtowicz (2005), A reflexive approach to dealing with uncertainties in environmental health risk science and policy, International Journal for Risk Assessment and Management, 5 (2), p. 216-236 [Case: Health risks of waste incinerator]
 
 
M. Craye, E. Laes, J. van der Sluijs (2009). Re-negotiating the Role of External Cost Calculations in the Belgian Nuclear and Sustainable Energy Debate. In: A. Pereira Guimaraes and S. Funtowicz. Science for Policy, Oxford University Press, pp 272-290.
 
I. Boone, Y. Van der Stede, J. Dewulf, W. Messens, M. Aerts, G. Daube, K. Mintiens (2010) NUSAP: a method to evaluate the quality of assumptions in quantitative microbial risk assessment, Journal of Risk Research, 13: 3, 337 — 352.
 

Ciroth, A (2009) Cost data quality considerations for eco-efficiency measures, Ecological Economics 68(6) 1583-1590.

 

Weidema, B.P., 1998. Multi-user test of the data quality matrix for product life cycle inventory data. Int. J. LCA 3 (5), 259–265.

Weidema, B.P., Wesnæs, M.S., 1996. Data quality management for life cycle inventories — an example of using data quality indicators. J. Clean. Prod. 4, 167–174.

Frischknecht, R., et al., 2005. The ecoinvent database: overview and methodological framework. Int. J. LCA 10 (1), 3–9.

 

Other references:
J. van der Sluijs, P. Kloprogge, J. Risbey, and J. Ravetz, Towards a Synthesis of Qualitative and Quantitative Uncertainty Assessment: Applications of the Numeral, Unit, Spread, Assessment, Pedigree (NUSAP) System. in: International Workshop on Uncertainty, Sensitivity and Parameter Estimation for Multimedia Environmental Modeling, (proceedings) Interagency Steering Committee on Multi Media Environmental Modeling, August 19-21 2003, Rockville MD, USA. p.81-86. (Available from www.nusap.net)
 
J.P. van der Sluijs, Anchoring Amid Uncertainty, on the Management of Uncertainties in Risk Assessment of Anthropogenic Climate Change, PhD thesis, Utrecht University, 1997.