A Genetic Programming Framework To Measure Complexity


    A measure _ that may characterize the complexity of sequential data is introduced. Complexity here is in the sense that an observed sequence is difficult to predict. Measuring this complexity is important.
    Genetic programming (GP) is used to obtain _. GP searches for a best-fit model by randomly assembling thousands of equations that mimic regression models, evaluates their fitness, and reports the fittest one. Because equations are randomly assembled, they are accidental mathematical fits that cannot be expected to be meaningful. Formulation of _ is based on the assumption that GP can successfully reproduce the dynamics of simple processes while fail in the case of very complex ones (such as white noise).
    A GP software that fits time series data (TSGP) is employed to demonstrate how _ can be used. (TSGP is available at compumetrica.com.) TSGP computes the mean square error (MSE) for each equation it assembles. Hypothetically, if a sequence of pseudo-random data (considered complex) is scrambled MSEobserved ≈ MSEscrambled. MSEobserved < MSEscrambled if the sequence is deterministic. Heuristically, (MSEobserved/MSEscrambled) _ 1 for complex data, while (MSEobserved/MSEscrambled) _ 0 for totally predictable data. Obtaining a large number of independent ratios (MSEobserved/MSEscrambled) for the same data set is useful in testing the hypothesis: __= 1, __= 0, or any 0 £ __£ 1. Since the ratios are independent, the following test-statistic applies:

        t = 1/n_(gi - _i) / sgn –_,

with degrees of freedom = (n – 1), and where i = 1, …, n, gi = sample estimates of _i, and sg is the standard deviation of gi.
    Empirical testing of _ suggested that reasonable hints about the complexity of simulated data with known underlying data generating process can be obtained. Tested sequences were: pseudo-random, nonlinear-chaotic, and nonlinear-stochastic.



Using Genetic Programming to Forecast US Residential Electrical Energy


    An integrated statistical-genetic programming modeling algorithm is proposed to forecast electrical energy used by the US residential sector. Generally, the amount of electricity used over a given short period of time (an hour) is referred to as “demand” and is measured in Watts (W). The amount of electricity used over a lengthy period (say one month) is referred to as “energy”. Typically it is measured in kilo Watt-hours (kWH = 1,000 Watt-hours). Accurate forecasting of electrical energy provides reliable forecasts of demand thus reducing the risk of brownouts. Regression models are typically used in estimating energy equations. In a regression model, and for example, annual electrical energy is assumed to be a function of price of electricity, per capita income, heating degree days (a variable that measures severity of winter coldness), and cooling degree days (a variable used that measures the severity of summer hotness). Such model provides estimates of policy parameters utilities and government regulatory agencies rely upon in energy planning and policy formulation. Estimates of consumers’ responsiveness to electricity price changes (price elasticity) and income changes (income elasticity) are invaluable decision-making information a regression provides. However, accurate forecasts of the dependent variable (energy) are conditional upon accurate forecasts of the explanatory variables. Genetic programming (GP) is employed to provide their forecasts here. GP, a heuristic search technique, is an iterative algorithm that randomly assembles regression-like equations. These equations are used to breed fitter ones until an equation is reached that somehow manages to produce a reasonable forecast. However, it may not be helpful in planning or policy making. Integrating the two techniques then has the advantage of obtaining needed policy and planning parameters while securing accurate forecasts. To evaluate the efficacy of forecasts provided by GP, they are compared to forecasts obtained using ARIMA models as an alternative.


M.A. Kaboudan
University of Redlands
School of Business
http://newton.uor.edu/facultyfolder/mahmoud_kaboudan
Mahmoud_Kaboudan @ Redlands.edu