Pages

Climate Modeling Valid?

Revised draft version August 27 2011




What is modeling, anyhow? Good question. Generally, in the physical sciences and elsewhere, "Modeling" is a term having a specific meaning. Here is a simple definition.

Modeling is a procedure for numerical fitting and interpolation of existing sets of observational data by means of continuous functions having a collection of adjustable parameters. 

Models cannot predict anything in a causal sense.

The central aim of modeling is to provide a simplified analytic function or set of functions that match discrete data points and interpolate between them.  Models, therefore, do not predict anything in a causal sense.  Models simply generate sets of numbers that may be compared to sets of observations.

 In this discussion, we view data as a collection of discrete points embedded in an abstract continuum parameter space. Independent variables might include time, physical location, incident solar radiation flux, etc. Dependent variables are variables that can be identified with data. Examples of dependent variables are local temperatures, or the non-thermodynamic quantity "global average temperature" we hear about.  


A model is simply a function that maps independent variables to sets of numbers that may be compared to sets of observations, i.e. data sets.  


The model generates output. Model output consists of sets of values of the dependent variables. These numbers are the stuff the model generates. We say the observational data is "modeled" by sets of numbers generated by the model.  


Models of physical systems need not contain any physics.
Instead they contain hidden variables and adjustable parameters.

Besides independent variables, models contain a set of hidden variables. Hidden variables are usually of two kinds, fixed parameters and adjustable parameters. They are used to  formulate the functions that generate the output variables of the model.  


Fixed parameters  come from underlying laws of physics or other solidly trusted sources. Their values are taken as given. 


Adjustable parameters are hidden variables whose values can be specified  arbitrarily.   For a given set of specified values of the adjustable parameters, a specific model is obtained. Different models are easily obtained by changing the values of the adjustable parameter set.   


Notice there is no requirement that models obey the laws of physics. Rather, models are sets of functions that generate numbers that may be compared to observational data sets.


Modelers try to optimize their models by judicious choice of the set of adjustable parameters, by removing unnecessary adjustable parameters, etc. 
How do we know when the model is optimized?  One way is to validate it by comparison to a data set.


What is a validated model?

To validate a model the modeler first needs a data set of observations to model. This data set is necessarily a pre-existing set of observational data. This data set is sometimes called the base data set.  


Here's how the validation process goes....


To validate the model, the modeler goes through a tweeking process where various values of the adjustable parameters are tested, and model outputs are compared to the base data set. The comparison is usually made quantitative by some "goodness of fit" measure. Goodness of fit is a number or set of numbers that measure how well the model output emulates the actual observed data.  For example, the sum of mean square differences between model variables and the base data set could be a goodness of fit parameter.  The smaller the better. This fitting procedure is usually done numerically, but can be done by eye in simple cases. 

So far, we have the model output that is restricted be "close to" existing data, because that data is what we are trying to fit. Such models are very useful for data analysis. It is nice to have continuous curves  that fit discrete data points. If nothing else, it helps us visually examine data sets, spot trends, gain intuition about the data.  All great stuff.


Notice there is a range of independent variables that is comparable to the range of the independent variables of the base data set. The goodness of fit is done in this restricted range. That's where the existing data is.  

If you are given values of the population of California for each census year,  you will have a time dependent data set.  However, you will have no data for the year 2040. So it is not possible to fit the model to the year 2040.  Hence it is not possible to validate the model for this year.  To make progress, we would fit the population model, say a straight line, to existing data. The range of the time variable would be restricted to the existing data. 

Once a satisfactory set of values for the adjustable parameters has been found, the model may be considered validated within the range of the data set. Models are not considered valid outside their range of validation. 


When models are used for extrapolation, the extrapolation must be re-validated as new data becomes available. In this way, past extrapolations can be invalidated and identified as such.


Model differential equations and pseudo-causality.


Modelers often spice up the mix by invoking sets of model differential equations that may be solved numerically to propagate the model into the future. Thus models may contain time dependent differential equations having derivatives emulating causal behavior.  

Such model equations may have some physics in them, but inevitably they leave out important physical processes.  Hence, they are not truly causal because they do not obey the causality of the underlying laws of physics. Such time dependent  models may be termed pseudo-causal to distinguish them from the fully causal laws of physics.  More on causality later.


Numerical models that solve truncated sets of fluid equations such as General Circulation Models (GCMs) are examples of pseudo-causal models. Extrapolations of GCM's are not guaranteed to agree with future observations.  Rather the opposite, all extrapolations must diverge from future observations. These models are only approximately causal.


GCMs and other models require the same disclaimer as stock brokers:
   "Past performance is not a guarantee of future accuracy."

Can models provide "too good" a fit to the base data?


If a model has enough adjustable parameters it can fit any data set with great accurcy, e.g. John von Neumann's Elephant.  Excessively large sets of adjustable parameters provide deceptively pretty looking data plots. Actually it is considered bad practice to fit the data with too many parameters.  Over parameterized models have many problems, they tend to have more unreliable extrapolations, have derivatives that fluctuate between data points, exhibit rapidly growing instabilities. 


Paradoxically, models that produce impressive agreement with base data sets, tend to fail badly in extrapolation. 


If the fit to the basis data set is too good, it probably means the modeler has used too many adjustable parameters. A good modeler will find a minimal set of basis functions and a minimal set of adjustable parameters that skillfully fit the base data set to a reasonable accuracy and so minimize the amount of arbitrariness in the model. This will also tend to slow the rate of divergence upon extrapolation.  


What are the basis functions of models?


Models make use of a set of basis functions. For example, the functions X, X^2, X^3, X^4, ... are all divergent functions that are used in polynomial fits, also called non-linear regressions. The problem is, such functions tend to +/- infinity in the limit of large values of the independent variable X, and do so more rapidly for higher powers of X. The basis  functions are unbounded, and extrapolations always diverge.  


One approach is to choose bounded functions for the basis set. Periodic functions {C, sin(X), cos(X), sin(2X), cos(2X), ...} where C is the constant function, would be an example of a set of bounded basis functions. At least extrapolations of bounded functions will not diverge to infinity. Comforting. 


Periodic phenomena make modelers look good.


 Many natural phenomena are periodic or approximately periodic. If a time series data set repeats itself on a regular basis then it can be modeled accurately with a small collection of periodic functions, sines and cosines. We do not have to solve the orbital dynamics equations in real time to predict with great accuracy that the sun will come up tomorrow.  


Complex systems may also display quasi-periodic behavior. So-called non-linear phenomena may repeat with a slowly changing frequency and amplitude.  Simple periodic models tend to do very well in extrapolation over multiple periods into the future. Moreover, periodic models do not diverge upon extrapolation. They simply assert that the future is going to be a repeat of the past. 


When models extrapolate non-periodically, it's a red flag. Extrapolations of aperiodic (i.e. non-periodic) models are much more likely to be invalid, as discussed here.

Extrapolation of models is inherently unreliable.

What about extrapolation? Often, modelers are asked to extrapolate their models beyond the validated range of independent variables. Into the unknown future, or elsewhere. These extrapolations are notoriously unreliable for several reasons, among them are (1) the fact that models do not obey causality, (2) they may not properly conserve invariants of the underlying physical system, and (3) are often mathematically unstable and exhibit divergent behavior in the limit of large dependent variable, (4) non-linear regression fits used in climate modeling are especially prone to instability. Such models would inevitably “predict” catastrophic values of the dependent variables as an artifact of their instability. 


Of course, no actual predicting is going on in such models, merely extrapolation of the model beyond its validated domain.

Simulations obey causality, models do not.

If a model were to include the real physics of the complete system, it would be a simulation, not a model.  Simulations obey causality.  Simulations usually consist of sets of time dependent coupled partial differential equations, PDEs, subject to realistic boundary conditions. Simulations are numerically solvable rigorous formulations of underlying physical laws. 


Simulations are often used to examine the evolution of temperature in fluid systems.  If the temperature is non-uniform the system is far from true thermodynamic equilibrium.  However, fluids very often satisfy the requirements for local thermodynamic equilibrium. This simply means that a  local temperature can be defined in the medium. The temperature then is a scalar field that varies continuously with location and time. Such systems will exhibit thermal transport, a characteristic of atmospheres and oceans.


Subtle aspects of causality in physics lie beyond the scope of this discussion. But it's very interesting so... a few highlights.


In practice, most simulation codes solve formulations of the fluid equations and related field equations of classical physics.  In these cases the simple classical definition of causality is obeyed. 


Quantum mechanics experts know that quantum mechanical systems have a probabilistic nature. When quantum effects are important, some aspects of causality are lost.  However, even in quantum systems, the fundamental probability amplitudes, or wave functions of quantum theory, themselves obey differential equations that "propagate" these functions forward in time in a causal manner.  Roughly speaking, the wave functions evolve continuously and causally in time such that the statistical properties of quantum systems, expectation values of observable single and multi-particle operators, revert to classical causality in the limit of "large quantum numbers."  


Even classical systems can exhibit stochastic or chaotic behavior in some situations. For example, the so-called butterfly effect. The task of simulating many-particle systems subject to stochastic or chaotic behavior is challenging. However, for the important case of many-particle systems having sufficiently many degrees of freedom, chaotic effects often tend to be "washed-out" by other effects.  Perhaps this is an over simplification.  


A related and absolutely fascinating phenomenon of continuous fluid systems is the possibility of self-organization.  The microscopic behavior of self-organizing systems can conspire to generate large scale organized flows. The jet stream in the earth's atmosphere is an example of such an organized flow, sometimes called a zonal flow. The jet stream is a vast high speed wind current in the upper atmosphere that can persist and move around as an organized entity. The color bands in Jupiter's atmosphere and the great red spot appear to be such zonal flows. Simulating the formation and evolution of such large scale organized flows is a challenging problem addressed in various atmospheric and oceanic simulation codes.  Amazing stuff.


Now we are getting into specialized stuff that is way beyond the scope of this brief discussion. For more on this, consult the extensive popular literature.  


Now let's summarize our conclusions about models,  modeling, and the inherent unreliability of extrapolation. 


Summary and Conclusions about Models.


In most fields of physics, models are considered useful tools for data analysis, but their known limitations and range of validity are widely appreciated. There are just too many ways for extrapolations of models to go wrong. 


Models do not obey causality nor can they properly "predict" anything in the causal sense. Models provide sets of numbers that can be compared to sets of observational data. 


Models are not simulations. Models may contain: 1) none of the physics, 2) some of the physics, but not 3) all of the physics of the system.  


Extrapolation of a model inevitably takes the model outside it validated domain.  When extrapolation is necessary, it must be done conservatively and cautiously. Further, extrapolations must be validated against new data as it becomes available. Conservative extrapolations are more likely to be validated by future observations.


Is the methodology of climate modeling inherently unreliable?

Because of the inherent limitations of models in general, we question the methodology of climate modeling. 


It seems incorrect that weight is given to inherently unreliable extrapolations of climate models. Especially troubling are extrapolations of such models beyond the known range of their mathematical validity. 

Of course, most everyone in the hard sciences knows all of this. So my question might be reformulated as: 


Why are extrapolations of climate models given weight, when the methodology is known to be inherently unreliable in extrapolation? 


Models are not infallible and climate models are not infallible.  They are  known to be unreliable when extrapolated beyond their validated range. 

Maybe that's enough for the moment. Responses welcome. A little dialog is a good, but let's keep it on the top two levels of the Graham hierarchy.

4 comments:

  1. Very good post! The insights from this article express im many ways my own thoughts about the reliability of extrapolation of current climate models into the year 2100!
    In my opinion current climate models
    - have to less resp. too less understood physics (e.g. "clouds") in it and
    - the base data set (climate history) is too small resp. innacurately for longer extrapolation into the future.

    Even with extensive physics in the models nonlinearities, positive feedbacks and the complex 3D surface of the Earth ("air,land, sea, ice) and the mentioned "too many adjustable parameters" needed make forecasts very unreliable.

    BTW the same is valid for economic and financial models. If I could forecast reliably a bond listing I would be a rich man :=)

    ReplyDelete
  2. Ah...Excellent comment. I've been distracted, or absorbed by)other projects over the past few months, and have not posted here lately. TY for a thoughtful comment. More comments in the recent article in News New Mexico: http://newmexico.watchdog.org/15128/mit-scientist-disputes-man-made-global-warming-in-sandia-labs-presentation/comment-page-1/#comment-9056

    Here's my comment on the above article:

    Excellent article that cuts against the grain of popular mythology, bad science, political posturing, name calling, and general nastiness associated with the theory of Human Activity Caused Global Warming (HAGWa). I have faith that this period of ascendancy of bad science will not endure, and one day HAGWa will join Cold Fusion, N-Rays, Face-on-Mars, on the blooper reel of bad science.

    A few critical notes on the credibility of Climate Science:
    The field of "Climate Science" is not a particularly well established academic discipline. Its methods are derivative. They originated in other fields including physics, fluid dynamics, thermodynamics, geophysics, meteorology, and oceanography to mention a few. Further, Climate Science as a scientific discipline has no significant record of scientific accomplishment. Extravagant claims of its practitioners and promoters are deserving of skeptical criticism and rigorous cross-checking. They do not rise to the level of confidence needed for economic policy making. All this is evident to any good scientist.

    A few words about climate modeling used by the IPCC:
    We observe that climate models do not obey physical causality. They are based on model equations and parameter fitting. Such models are well known as excellent tools for data analysis in many fields of science. However, they are very bad at prediction. Typically, predictions from non-causal models exhibit secular or exponential divergence from new data. The widely viewed, and scary "climate catastrophe" graphs are typical of such non-causal models. They fit existing data impressively, but fail on extrapolation. Predictions of climate models of the 1990s have already diverged from new climate data. Given their inherent unreliability and evident divergence from new data, skepticism is justified. Extravagant claims of the accuracy and infallibility of climate models are just hot air.

    More on the subject of climate modeling here: http://syntheticinformation.blogspot.com/p/what-is-modeling-is-climate-modeling.html

    We are living in sad chapter in the history of science.
    TY for all the great comments and discussion from good scientists in NM.

    ReplyDelete
  3. While I agree that climate predictions are not sufficient for policy decisions, I think you misunderstand the way in which models (actually, simulations) are used in the AGW context.

    GCM type climate models (i.e. finite element simulations) are not used to "predict" the climate based on simulations starting from the current state. Rather, they are used to attempt to diagnose the sensitivity of the system by running simulations in the expected future atmosphere.

    Thus one would not take GCM and simulate 50 years in order to "predict" the climate 50 years in the feature. Rather, the GCM would be run for assumed realistic starting conditions but with a CO2 level of that expected 50 years in advance. Thus the model is a physical simulation, and it simulates the weather in an atmosphere with higher CO2, rather than simulating the weather through a continuous time period 50 years long.

    Such a model provides much more confidence than a continuous simulation (which, due to chaos and other effects, would have zero confidence). The problems I see with the approach include:

    -deducing valid initial conditions, given that they arise from an atmosphere in which CO2 has been rising

    -properly simulating the effect of higher CO2

    -lack of calibration. The model cannot be run against past data sets for comparison

    -lack of measurement of longer range effects. Even if the model accurately simulates the weather at initialization, it cannot simulate it with much reliability (even in ensembles) beyond a few weeks. Hence it cannot pick up the effects of longer term cycles and teleconnections, which are very important in both weather and climate.

    -adequacy of parameterization. All GCM's use a lot of parameterization as a substitute for highly detailed physical simulations. For example, at some level, topological variations are not simulated (they are sub-cell scale) but estimated (using parameterization). Since the parameters cannot be tested in a higher CO2 environment, they have to be arrived at by secondary processes such as the use of fine scale lesser area simulations (as separate experiments) to try to validate them.

    BTW, parameterization in this sense is not simply plugging in "parameters" as one might expect from the terminology. It may also refer to the use of non-physical models (ranging from a single number to complex equations) to feed the grid scale process.

    ReplyDelete
  4. Ah... interesting detailed comment.
    It is understood, and mentioned in the article above, that modeling is a great tool for the physical sciences. The activity you are describing was covered briefly in the above article. Part of that process includes sensitivity studies to adjustable parameters like transport coefficients, boundary and initial conditions, and all the rest. This may be a source of important insights into the complex physical system if the model is good enough. This is what you are describing.
    I am familiar with the use of time independent steady state flow/transport solutions for coupled sets of model PDEs. In my overview article above, I did not include material of this level of detail. TY for taking the time and effort to review the more detailed features of numerical climate models. Good stuff.
    A few thoughts and observations:
    In experimental physics one can check models and parameter sensitivities against controlled laboratory experiments. A correspondence can be established (or not) between the model and the real physical system. Climate modeling does not have this kind of testing against controlled experiments. Of course, we can’t restart the fluid system of the earth with new initial or boundary conditions and cross check that the code is getting it right. This is obvious but non-trivial.
    Here’s some related thoughts and observations:
    Of course, one must construct the best possible sets of coupled model PDEs and account for elaborate realistic boundary/initial conditions. This, along with all the other goodies that light up the lives of computational fluid dynamics guys.
    One of the problems with numerical solutions of PDEs for fluid systems is that small scale turbulence is not calculated rigorously because of the large (finite) grid size. This, among all the other physics and numerical stuff that is left out of such coupled PDE models leaves them in the model category and less of a strictly causal simulation. However, it is understood that one must work with the tools at hand. The study of such models is telling us about the model, but not necessarily the complex real physical system being modeled. What about the transport coefficients? Such complex models contain a large collection of transport coefficients each of which is itself dependent on various parameters and variables. Further, various physical and biophysical processes introduce the need for following reacting multi-species flows, as is well known. We rapidly become awash in complex details all of which have some degree of importance in the model.
    My own experience with this kind of modeling in the field of plasma physics was that we often had to go to molecular dynamics calculations where the motion of individual molecules became important. When we interacted with experimental physicists, a large degree of (usually tolerant) skepticism set the tone. Our beautiful graphic renderings often did not agree with real world data points supplied at great effort by our friends in the lab. We learn to be humble and occasionally triumphant.

    Ultimately, modeling informs us about the model, but not necessarily the real world.

    ReplyDelete